How to handle Non-Native after CSV-read

petermeissner · March 6, 2020, 7:42am

I have a loop reading in CSV-files.

For a particular column named MyColumn It seems that for some of the CSVs the CSV-Reader node is guessing the number format to be integer for others it seems to guess it to be a float.
When collecting all rows at the end of the loop I get MyColumn with a mystic Non-Native type.

Basically what I like to happen is that when in doubt Knime makes all columns float.
I already used “Allow variable column types” in the loop end
I considered type casting MyColumn in each loop step but since the loop is part of another loop I will run throughs where MyColumn is present and some where its not and if I travel that road I am basically better of using Excel and doing everything by hand.

So any third options?

Any clever node I have overlooked?
Some medium involved Java node scripting, perhaps?
???

swebb · March 6, 2020, 8:02am

Can you use the File Reader instead? I think you can specify the column type in the dialog of that one.

petermeissner · March 6, 2020, 10:33am

I could.

But its a loop, so I do not have each CSV at hand at config time
And its a loop of loops so table structure change - one outer loop its tables of type A (with MyColumn present) the next outer loop its tables of type B (wihtout MyColumn present)

ipazin · March 6, 2020, 2:14pm

Hi there @petermeissner!

Maybe. How about Math Formula (Multi Column) node? Use Wildcard/Regex Selection option and input your column name. For expression use only $$CURRENT_COLUMN$$ and don’t check Convert Selected Columns to Int. If it happens with multiple columns then simply use Manual Selection and check Enforce Exclusion with no column being selected for exclusion.

Haven’t tested it thoroughly but should do the trick regardless of column being present

Br,
Ivan

mlauber71 · March 6, 2020, 3:33pm

I built and example where the data gets imported via R three times, num, string and guess (Excel files in this case - but the approach may be modified). If for one option there are too many nulls then the decision would be made that it is the other format.

You could also force every column to be imported as string and then later decide what to do with them.

petermeissner · March 9, 2020, 8:26am

Hey, thanks for your efforts.

This underpins my gut feeling that there is no pure Knime-only solution to this problem.

I have a working solution using an R-Node, too: A simply data.table::rbindlist(lapply(file_names, read.csv), fill = TRUE) … since R is not very picky about integers and doubles and will happily switch to the more general type this will produce a table with all columns having a defined data type.

petermeissner · March 9, 2020, 8:51am

Interesting, I did not know this Knode.
A MultiColumn Rule-Engine with a type check function would do the trick, I guess.
But this one does not exist.

ipazin · March 9, 2020, 5:00pm

Hi there @petermeissner,

also you can try out Column Auto Type Cast node. Works with Non-Native types and you should end up with Double type.

not sure about MultiColumn Rule-Engine but agree with type check function/node. There has already been couple of question regarding it…

Br,
Ivan

system · September 8, 2020, 5:00am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.