Escape character in unquoted data fields

Hi,

I’m trying to work with multiple csv files but I’m running into an issue which I’m not able to solve.

The csv files are not quoted, but some data fields have a comma in it (because of an error in the source system or typo). These commas are followed by the escape character &. These commas are causing Knime to create an extra column and things get messed up.

Simplified example:
First name, last name, company
John, Doe, Company A
Jane, Doe, Company,B

How can I get Knime to ‘ignore’ the comma between Company and B?

The escape character function in the csv read node only seems to work data fields are between quotes.

Thanks!!

Hi @robvp

Welcome to the KNIME Community!

One solution could be switch from the CSV reader to the Line Reader. That will take the row ‘as is’ initially.

image

Then use a Cell Splitter node with comma as separator, the output as new columns with a fixed array size of 3. That means it will only split into 3 elements. In your case, first name, last name and anything after the second comma will be considered as one.

As such, you end up with the company and it’s associated letter captured in one column. Use a Column Rename node is desired to bring back the original column names.

A second option would be to use the CSV Reader but tick the box to Support short data rows.

image

It will create an additional column for those rows that have the extra comma. To update the company column, you can subsequently use a String Manipulation node with expression:
join($company$," ",$Column3$)

Hope this helps!

2 Likes

Thanks @ArjenEX !!

I found a way to combine with your suggestion.

I still use the csv reader, but I’m using /r/n as delimiter. That seems to be similar to the line reader. Your further suggestion didn’t fully work, but my example was a bit too simple - I realised after I saw your suggestion - as the data field with the additional comma is in the middle of like 10 columns instead of at the end. But I just replaced the string ",&” with a blank and next I used the comma as separator. That split the data into columns as they should.

Next I had to make the first row a header as it contained the column names.

The last step was to change the data type of some columns from string to double.

Thanks for your help!!

3 Likes

Good to hear that you got it eventually! :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.