Not reading all Commas in "Column delimiter"

Le18BR · July 1, 2022, 1:30pm

Hi! Hope you’re doing well.

I’m trying to read a file but it doesn’t breaks the columns the right way for all the comma matches. For this example, the line that starts with “1BAJOP” is the one that it’s not working. I have more lines in the file with the same problem as this one.

The raw file is like this:

When I use the “Text to column” function in Excel, it works just fine:

When I try to use the CSV reader/File Reader it doesn’t breaks the column the right way as you may see in “Row 4”

PS: If I use another delimiter that does’n exists in the file and then use a Cell Spliter Node with the comma as delimiter it works.

aworker · July 1, 2022, 2:05pm

Hi @Le18BR

Could you please upload your text file here (at least with the text portion you are displaying) if it is not confidential so that we can try to help from there ?

Best
Ael

mlauber71 · July 1, 2022, 2:52pm

@Le18BR you might have to try and toy with the advanced settings especially the ones dealing with quotations since this seems to be the line where there is a problem.

Then: CSV is probably the worst format to transfer data with so if you have any other chance you might as well take it.

If other measures fail you could try and employ R package Readr to import messy data from CSV.

Le18BR · July 1, 2022, 4:04pm

Knime forum doesn’t allow uploading .csv files, I’ve renamed the extension of the file to .txt, just rename it back to .csv and you should have the same file.
abc.txt (898 Bytes)

Le18BR · July 1, 2022, 4:12pm

Unfortunately, the CSV format is the only option in our supplier website.

I’ll look into the R suggestion, thanks!

mlauber71 · July 1, 2022, 4:23pm

@Le18BR in line 6 there is a very messed up format. The whole line is enclosed in a quotation and then ZENDES, INC. (which has a comma in the name) again is enclosed in a double quote. Some cleanup might be necessary and if such things appear in a CSV file on a regular basis disaster is imminent. I will see if I can come up with a solution. Question ist would this be a pattern like quotes at the beginning and end or would there be all sorts of strange quotations.

Would it be possible to use a less common separator in the CSV file?

mlauber71 · July 1, 2022, 5:40pm

@Le18BR I built a workflow that would remove leading and closing quotation marks (if a ‘legitimate’ double quotation is in the first or last column the workflow will have to be modified).

The idea ist to read the CSV as one string in one column. Clean the columns according to several rules, export it again as CSV and then read the thing back into KNIME as a ‘clean’ CSV. The R Readr package should be able to handle double escapes but it did not work immediately.

Again my recommendation stands: use other file formats like Parquet, ARFF or SQLite to transfer data

The result should be OK. If additional strange things appear they might also be removed from the single imported line:

kn_forum_44692_csv_file_comma.knwf (81.6 KB)

Le18BR · July 1, 2022, 7:19pm

That worked! Can’t thank you enough.

system · July 8, 2022, 7:19pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.