Line Feed / Carriage Return characters Messing up in import of file

Hi there,

Hope somebody can help me with the following.

Problem
I’m trying to read a CSV file with a File Reader/CSV Reader in an automated way. Some of the string columns contain the LineFeed character. These characters are messing up the import of the file in Knime by attempting to start a new row when it’s not supposed to (and often fails completely because it messes up with quotations chars in string columns or when you don’t want to have short lines).

Background
I have a basic CSV file with multiple columns and rows. When you open the file in Notepad++ you see that a new row is indicated with the CarriageReturn + LineFeed combo at the end of the line. Columns are delimited by the comma char and string columns are encapsulated with double quotation chars. A possible string value that is causing the problem is for example “StreetName LN CityName”. Excel does not seem to have any trouble reading the file correctly, so I assume it should also be possible within Knime.

My attempts

  1. In the CSV Reader I tried to use the \r\n combination as a row delimiter, but seems to not work.
  2. I can clean-up the tables and remove the LN chars by hand, but this is obvious not what I want (automation is key)
  3. I can turn on “short lines allowed” and do some sort of elaborate way to identify the rows and stitch them back together, but this is usually a lot of processing when you are dealing with large files.
  4. I can read the file as binary and 12 other nodes (JAVA snippet to clean up; Cell split rows, Ungroup rows, Cell split columns, Insert headers etc), but this seems to be way too complex/ugly.

Thanks in advance and kind regards,
Ruben

Hi @Ruben,
have you tried the File Reader node, it offers more options and dedicated quotation handling, if you open the Advanced options menu.
best,
Gabriel

Hi,
As you mentioned that Excel has no problem opening/reading the file correctly, maybe if you open the file in Excel and save it as CSV (comma delimited) again, the problem with reading the file gets solved.