I am creating a CSV file by using the KNIME (CSV Writer) node.
When I see the data just before writing, it all looks perfect as expected. However, when I run the CSV Writer node, it generates the csv file in a corrupted manner and adding random rows which were unexpected.
Since the data is confidential, I can’t share it, but I know for certain that there is some issue when the CSV file is written it gets corrupted. I am using “|” as delimiter, not sure if that is the reason.
After the file is generated, I viewed it in excel (text to columns), and in one of the column (year) has text values from another column (Description). There is no association between them.
This happens for other columns as well.
So, the number of rows that are expected is 199031 before I generate the CSV, but in the resultant CSV file, I see there are 14 extra rows which definitely tells, there is some sort of bug happening once the file is written.
Often csv file problems (in general, not just with knime) are a result of line breaks in the data. When the file is written to an xlsx, do you șee any line breaks in any cells in or around the row in excel that is the equivalent place that you are seeing the extra rows appearing in the csv? Xlsx format would likely handle linebreaks fine even if csv doesn’t.
Thank you @bruno29a and @takbb
I agree the line breaks would be the possible reason for this.
However, I am using pipe as a delimiter to align with the previously created files (transferred to me).
The configuration is as under:
Hi @pranavgupta , in the Advanced Settings for Quotes, can you try “Always”?
Also, can you try opening the csv from some editor that supports regular expression, and see if you can search for “\r\n\r\n” or “\n\n” or “\n|”, the last one being searching for any breakline before a delimiter. This would tell you what kind of dirty data you have, and you would either correct the data from the source, or play around in Knime to tackle these.
But you need to identify them first.
It is hard for us to provide the exact solution when we can’t see the data…
In addition, another thing you can try is with the working xlsx file read into excel, can you save it from excel as a csv file, choosing similar settings (e.g pipe delimiter) that you have in knime and then see if that file reads back into excel correctly.
If it does, can you then compare both the knime produced csv, and the one produced from excel and see how they compare at the rows where you are seeing problems.
That difference might tell us something about what is wrong and what, if any, additional config might be needed, or if it’s another problem altogether.