CSV Reader / Writer: Escaped quotes causing problems

Dear all,

while saving a CSV file that escapes quotes contained in the data, i.e. in case fo a real quote being present, it causes significant issues for the CSV import node.

The whole table structure breaks down as escaped quotes are still interpreted as regular quotes.

Is this a bug or a feature?

Thanks
Mike

Hi remove left & right quote from csv writer. You will get this option in configuration option.

1 Like

Thanks @jaydeb91 for your suggestion. My description of the issue was not well written. Please allow me to rephrase it. In particular, there are two issues which are:

  1. If a quote is already present in the data, it is either replaced (i.e. by an escaped quote) or removed. This degrades data quality. If the quotes are kept and the data is read again via a CSV reader, the reader node can not cope with i.e. escaped quotes.
  2. If a separator is present in the data, it is being removed by the CSV writer node.

I created a sample workflow illustrating the issue. Results as follows at for a quick glance … as screenshots since table feature works with pipes too. Four different scenarios were tested with column comparison added too:

  1. Mode: Default setup with comma as column separator and quotes
  2. Mode: Default node setup with escaped quotes (breaks table structure!)
  3. Mode: Pipe separator with quotes
  4. Mode: Pipe separator and quotes prevented
  5. Mode: Pipe separator and quotes and replacement by "

Kind regards
Mike

Hi @mw -

Thanks for the feedback and the comprehensive workflow. The short answer here is that we generally recommend using the File Reader in most cases instead of the CSV Reader. (Note that when you drag-and-drop a CSV file from your KNIME Explorer to the canvas, KNIME chooses a File Reader node for you.) The CSV Reader still exists for backwards compatibility reasons, but it doesn’t handle edge cases for quoting and delimiting as gracefully as the File Reader does.

In this case, if you take the data in the second branch of the workflow where you have set up the CSV Writer to escape the quotes - which probably makes the most sense in terms of data preprocessing - you can use the File Reader instead of the CSV Reader and get expected results for all 6 rows.

Is that helpful?

2 Likes

Hi @ScottF,

thanks for your suggestion. Unfortunately it’s not only about reading files. The CSV writer node – worth to note that a common file writer node as an equivalent to the reader node seems to be missing – has some advantages like writing compressed files. The CSV Reader can also read from compressed files too which the common file reader node can not.

About the CSV file writer node. In case the separator, like a pipe, is already present in the data, it removes prior to writing the data which also causes sever data regressions.

While trying to evaluate results with the regular file reader node I also faced this issue (data preview worked):

WARN File Reader 3:25:0:15 Errors loading flow variables into node : Coding issue: Cannot create URL of data file from '/private/var/folders/jg/zcb9yqn90w54f3szwrsnqrl80000gn/T/knime_CSV write read35895/knime_tc_awjdlvx6aea2/default-setting-escaped-quote.csv' in filereader config

Edit: Above issue likely caused by Problem with the File Reader Node when using workflow variables (again)

Edit #2: I Found a super simple work around to prevent the CSV writer replacing already present quotes. It’s by simply replacing quotes by quotes. To summarize. The CSV writer issue of removing quotes is triggered by its default functionality. It always removes quotes even if no replace value was provided. Same goes for the separator.

The problem is, this input field is not accessible. Cursor won’t enter as it seems to be deactivated. Please not the slight difference in its Appearance.

50

Kind regards
Mike

Hi @mw -

I’ll try to speak briefly to some of the issues that you’ve brought up. I’d have to ask some other folks on the team about why the options in the node are setup a particular way, but in the meantime maybe I can give you some workarounds.

  1. The File Reader node can indeed read ZIP files - give it a try!

  2. The issue with the CSV Writer removing delimiters like pipes seems to arise when using the Quote Mode: never option. This can be dealt with, though, by including the pipe delimiter in the replace separator in data with box. (I think this also addresses your last comment, in that this particular box is only selectable when then Quote Mode: never option is chosen.)

  3. The File Reader error you mentioned occurs when passing a flow variable to the node that contains a file path without using a leading file:// protocol.

Hope this shines a bit more light on things.

1 Like

Hi @ScottF,

that helped a lot. Wasn’t even aware the the file reader can read compressed files. Nothing stated in its node documentation.

I tried instructing the file reader via variables to interpret all values as strings. I assume it’s the “FormatParamater” value. Unfortunately the global parameter, so not the one related to each column represented in the array, is not saved / been kept. What is my mistake?

Kind regards
Mike

Hi @mw -

I’m not aware of a way to force the File Reader to treat all variables as strings via a flow variable option. There are are couple of options in this thread that other users have suggested previously:

Force KNIME to Read CSV Columns As Strings

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.