while processing in parallel I write results to a file i.e. CSV). I’ve noticed once a while, at different rows, even though I am processing the same data, noticed that the CSV structure gets broken because of “Too many data elements”.
Now my assumption is that the file / CSV writer notes run occasionally into conflict with each other. Anyone got an idea?
just to comfirm i understand you correctly.
You have a paralell loop which loads different csv files.
The loop execution stops with your mentioned error?
As far as I know thia error means the loop end cannot Concatenate the different input tables due to changing table structures (e.g. one file has 6 columns, the next one has 10 - it expects the same number of columns for each input with the default options)
Actually without having example files I would say that at least one file is read in with more columns then the rest (or the first loaded file has too few)
Could it be that e.g. due to missing/wrong quote characters some files have a different column structure?
Or that at least one file actually has a different column structure?
Can you try to enable the
“Allow changing table specifications” - Option in the loop end and see if that fixes your problem?
Else if you could provide example files, I could check what is wrong with them in more detail
not exactly. The parallel execution finished correctly and afterwards I load the results which were saved in one CSV during parallel execution.
There I noticed the inconsistent behavior. I also executed the workflow w/o parallelism and the CSV structure was pretty fine. I inspected the CSV w/o splitting into columns and can confirm that i.e. one line, somewhere in the middle, started with data which actually belong to another line.
With your last description, it seems that you have special char in your CSV maybe ? Like a blank character or maybe a END OF LINE or something ?
Have you tried to import the CSV into Excel to cross check the file structure ?
I can guarantee that this is not the case. First and almost because the CSV structure breaks occasionally and at different points in the entire data set.
Secondly because I checked the individual files and their processing and could not reproduce it. Control characters, line breaks, NULL (not missing) values were all eliminated before issuing this ticket.
Henceforth, my assumption that, due to the randomness of the effect only occurring during parallel execution, it’s due to the “cloned” CSV writer nodes writing the same file by accident.
Normally I’d assume the file / operating system prevents writing a file that is accessed by another process.
are you writing to the very same file? If that’s the case do some rows contain more elements than the number if columns of the tables you’re writing and some less? Do you have the append option active?