CSV Writer adds random rows while creating the file

I am creating a CSV file by using the KNIME (CSV Writer) node.
When I see the data just before writing, it all looks perfect as expected. However, when I run the CSV Writer node, it generates the csv file in a corrupted manner and adding random rows which were unexpected.
Since the data is confidential, I can’t share it, but I know for certain that there is some issue when the CSV file is written it gets corrupted. I am using “|” as delimiter, not sure if that is the reason.

Thanks.

Hi @pranavgupta , and welcome to the Forum.

How do you determine that the csv file is generated “in a corrupted manner and adding random rows”? How are you viewing the csv file?

Welcome to the forum. The extra lines could be a result of unexpected encoding. Try to play with encoding type.

2 Likes

After the file is generated, I viewed it in excel (text to columns), and in one of the column (year) has text values from another column (Description). There is no association between them.
This happens for other columns as well.
So, the number of rows that are expected is 199031 before I generate the CSV, but in the resultant CSV file, I see there are 14 extra rows which definitely tells, there is some sort of bug happening once the file is written.

I tried playing with encoding, but no luck.
Thanks for the advice anyway.
Cheers!

Update:
On writing the file in xlsx format, the issue is not seen.
There’s definitely a bug in CSV writer.

Hi @pranavgupta , there’s probably some dirty data issue there, possible extra new line characters (\r\n).

Can you show your csv writer configuration? You may want to enclose the data with quotes. Also, any reason why you are using pipes (|) as delimiters?

3 Likes

Hi @pranavgupta, yes I think the same as @bruno29a.

Often csv file problems (in general, not just with knime) are a result of line breaks in the data. When the file is written to an xlsx, do you șee any line breaks in any cells in or around the row in excel that is the equivalent place that you are seeing the extra rows appearing in the csv? Xlsx format would likely handle linebreaks fine even if csv doesn’t.

3 Likes

Thank you @bruno29a and @takbb
I agree the line breaks would be the possible reason for this.
However, I am using pipe as a delimiter to align with the previously created files (transferred to me).
The configuration is as under:

Hi @pranavgupta , in the Advanced Settings for Quotes, can you try “Always”?

Also, can you try opening the csv from some editor that supports regular expression, and see if you can search for “\r\n\r\n” or “\n\n” or “\n|”, the last one being searching for any breakline before a delimiter. This would tell you what kind of dirty data you have, and you would either correct the data from the source, or play around in Knime to tackle these.

But you need to identify them first.

It is hard for us to provide the exact solution when we can’t see the data…

1 Like

In addition, another thing you can try is with the working xlsx file read into excel, can you save it from excel as a csv file, choosing similar settings (e.g pipe delimiter) that you have in knime and then see if that file reads back into excel correctly.

If it does, can you then compare both the knime produced csv, and the one produced from excel and see how they compare at the rows where you are seeing problems.

That difference might tell us something about what is wrong and what, if any, additional config might be needed, or if it’s another problem altogether.

1 Like

@pranavgupta you could try to use R to write the CSV file and see if there is a different outcome.

CSV files are not ideal when transferring data but I do understand why they are still being used.

Sometimes I used even more excotic characters as separators like the “flattened turtle”

¤

ASCII 164. And you definitely should check for some hidden line breaks in the original data.

2 Likes

“flattened turtle” lol, love the name :slight_smile:

1 Like

Me too… The poor turtle though :turtle:… Not so much :joy:

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.