CSV Writer adds new line in quoted string

Hi,

I have a table containing about 16M rows. I wan to export it in csv using the CSV writer. The node runs and does not return any warning.

The issue is when I read the file, I get an error:

"Execute failed: New line in quoted string (or closing quote missing). In line 21662."

Indeed, when I look at the content in nano:

"ZINC56683490","C(=N
1cnnc1-c1ccn[nH]1)c1ccccc1",35,"ZINC33332545",40020.0


I wrote twice the file and I got the same error at the same lines. I also try to write the files without quotes and replacing the separator by \t but the problem still occurs at the same position.

 

Did I miss something or the node is not able to write correctly a csv file?

 

Thank you in advance.

 

Nicolas

PS: Knime 3.1.2, Ubuntu 14.04 64bits

I have found a workaround which is a little dangereous but works in my case. It is to allow multi line quoted strings.

Hope this will help.

 

Nicolas

looks like the csv node is interpreting a part of the the chemical notation of *\n* (where both stars can be anything).
Traditionally \n means newline, so that is not that strange that it is doing that.

You could try to work around it by replacing \ with its escaped form \\ before exporting, but there you are opening a can of worms.

Allowing multilined quoted strings is not a good solution either becouse the compound was changed by replacing the \n with a newline, so a bond and an atom is missing.

 

Personally i always stick to sdf for compound data.

 

You are right, the \n contained in the string could explain this error and I did not think about it. However, in the example above there is no such character.

The problematic pattern seems to be c or n followed by 1 followed by lowercase c or n, but I did not look for every possibility.