while saving a CSV file that escapes quotes contained in the data, i.e. in case fo a real quote being present, it causes significant issues for the CSV import node.
The whole table structure breaks down as escaped quotes are still interpreted as regular quotes.
Thanks @jaydeb91 for your suggestion. My description of the issue was not well written. Please allow me to rephrase it. In particular, there are two issues which are:
If a quote is already present in the data, it is either replaced (i.e. by an escaped quote) or removed. This degrades data quality. If the quotes are kept and the data is read again via a CSV reader, the reader node can not cope with i.e. escaped quotes.
If a separator is present in the data, it is being removed by the CSV writer node.
I created a sample workflow illustrating the issue. Results as follows at for a quick glance … as screenshots since table feature works with pipes too. Four different scenarios were tested with column comparison added too:
Mode: Default setup with comma as column separator and quotes
Mode: Default node setup with escaped quotes (breaks table structure!)
Mode: Pipe separator with quotes
Mode: Pipe separator and quotes prevented
Mode: Pipe separator and quotes and replacement by "
Thanks for the feedback and the comprehensive workflow. The short answer here is that we generally recommend using the File Reader in most cases instead of the CSV Reader. (Note that when you drag-and-drop a CSV file from your KNIME Explorer to the canvas, KNIME chooses a File Reader node for you.) The CSV Reader still exists for backwards compatibility reasons, but it doesn’t handle edge cases for quoting and delimiting as gracefully as the File Reader does.
In this case, if you take the data in the second branch of the workflow where you have set up the CSV Writer to escape the quotes - which probably makes the most sense in terms of data preprocessing - you can use the File Reader instead of the CSV Reader and get expected results for all 6 rows.
thanks for your suggestion. Unfortunately it’s not only about reading files. The CSV writer node – worth to note that a common file writer node as an equivalent to the reader node seems to be missing – has some advantages like writing compressed files. The CSV Reader can also read from compressed files too which the common file reader node can not.
About the CSV file writer node. In case the separator, like a pipe, is already present in the data, it removes prior to writing the data which also causes sever data regressions.
While trying to evaluate results with the regular file reader node I also faced this issue (data preview worked):
WARN File Reader 3:25:0:15 Errors loading flow variables into node : Coding issue: Cannot create URL of data file from '/private/var/folders/jg/zcb9yqn90w54f3szwrsnqrl80000gn/T/knime_CSV write read35895/knime_tc_awjdlvx6aea2/default-setting-escaped-quote.csv' in filereader config
Edit #2: I Found a super simple work around to prevent the CSV writer replacing already present quotes. It’s by simply replacing quotes by quotes. To summarize. The CSV writer issue of removing quotes is triggered by its default functionality. It always removes quotes even if no replace value was provided. Same goes for the separator.
The problem is, this input field is not accessible. Cursor won’t enter as it seems to be deactivated. Please not the slight difference in its Appearance.
I’ll try to speak briefly to some of the issues that you’ve brought up. I’d have to ask some other folks on the team about why the options in the node are setup a particular way, but in the meantime maybe I can give you some workarounds.
The File Reader node can indeed read ZIP files - give it a try!
The issue with the CSV Writer removing delimiters like pipes seems to arise when using the Quote Mode: never option. This can be dealt with, though, by including the pipe delimiter in the replace separator in data with box. (I think this also addresses your last comment, in that this particular box is only selectable when then Quote Mode: never option is chosen.)
The File Reader error you mentioned occurs when passing a flow variable to the node that contains a file path without using a leading file:// protocol.
that helped a lot. Wasn’t even aware the the file reader can read compressed files. Nothing stated in its node documentation.
I tried instructing the file reader via variables to interpret all values as strings. I assume it’s the “FormatParamater” value. Unfortunately the global parameter, so not the one related to each column represented in the array, is not saved / been kept. What is my mistake?
I’m not aware of a way to force the File Reader to treat all variables as strings via a flow variable option. There are are couple of options in this thread that other users have suggested previously: