CSV quotation marks being removed by File Reader

I am trying to read in a CSV file using the File Reader node. The CSV file has text fields with quotation marks, which have been escaped using double doublequotes (i.e. ""), as is the standard for CSV files. However, when they are read in, the file reader removes them entirely.

CSV Example:

story_id, text_string, count
"1290394", "His reply was, ""I don't think so"", and he turned away.", "23"

When read into the File Reader, the text_string becomes:  

His reply was, I don't think so, and he turned away.

Is there any way to keep these quotations in?

Hi,

I tried it as well and it is true that KNIME's CSV Reader node removes the double quotes altogether when the string delimiter is set to be a quote character. Your case is even trickier because you also have the column separator (namely a comma) inside the text_string column.

If that hadn't been the case, you could have removed the quote character as string delimiter from the configuration dialogue, which causes KNIME to import the whole string, including the single and double quotes. Then with a couple of String Manipulation nodes you could have removed the leading/trailing quote and replaced the double quote with a single one.

Any chance you can pre-process your CSV file to use a different string delimiter, say a @, before reading it into KNIME and turning the double quotes into single ones with String Manipulation? Or save it in a differnt format?

The other option is that the CSV Reader node is modified to handle the double quotes by leaving a single quote in the output whenever they are encountered in the input.

Cheers,
Marco.

 

Great, thanks for your reply. It looks like I'm going to have to use a different string delimiter, as you suggested.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.