I am trying to read in a few CSV files that has many many rows (1.7 million rows with 13 columns). CSV node will scan the data partially and force me to use “integer” or “long” for some columns. However, there would be rows near the end where the data suddenly become “1.2” as in double or float. CSV node cannot allow me to choose that in the menu drop down. I looked another arrangement of File Reader -> Cell Splitter. However, I am having problem on how to separate the single column into multiple columns since the separator is hard to configure. Do you have any other suggestions? Thank you for your thoughts
My data looks like below (3 column example)
“field 1”, “field 2, may have comma between quotes”, “number field, which may look like int, but has double sometimes”
You can import all columns as Strings and then decide what to do. Eg replace . with , for certain columns and then convert them into double (or remove the . and convert to integer) - depending on your business case.
Then there are several other ways to import ‘messy’ CSV files. You might want to try out the new CSV reader Labs or I often had success with the R package Readr.
If you want to check out further options about KNIME and CSV I have created a collection for that:
If I understand your problem correctly, you can just increase the number of rows that are scanned in the Advanced Settings tab. Let me know if that works for you.