Loop CSV reader, enfore type error

Hi @knime ,

I’m having some trouble with csv concatenation…

I’m using a loop to iterate trough a file list :
image

For a certain column the first 2 files of the list have only “0” , and KNIME automaticly assign type Integer to the column,
image

but for other files you can have either 0 or floats (0.1 , 0.425 … ).
image

So I get an error :

 Execute failed: The column 'Dist min EdgeA-EdgeB' can't be converted to the configured data type 'Number (integer)'. Change the target type or uncheck the enforce types option to map it to a compatible type.

So I tried to enforce use of “Standard Double” (or Full Precision)
image

which leads to the “opposite error” :

 Execute failed: The column 'Dist min EdgeA-EdgeB' can't be converted to the configured data type 'Number (double)'. Change the target type or uncheck the enforce types option to map it to a compatible type.

So I change configuration to remove “Enforce types”

Execute failed: Input table's structure differs from reference (first iteration) table: Column 2 [Dist min EdgeA-EdgeB (Number (double))] vs. [Dist min EdgeA-EdgeB (Number (integer))]

Finally I tried to rename the third file (with float in the column of interest) so it’s processed first, and thus enforce creation with a Double , and I get the error :

 Execute failed: The column 'Dist min EdgeA-EdgeB' can't be converted to the configured data type 'Number (double)'. Change the target type or uncheck the enforce types option to map it to a compatible type.

Here is a link to download csv files

Thank you for your help,

Best,

Romain

On the Advanced tab of the CSV Reader node, enable support for changing file schemas.

You’ll also need to allow changing file specifications in the Loop end node.
image

Alternately, you could do away with the loop and only use the CSV reader node to read all of the files directly and concatenate them:

When I do this, everything works without having to change any file schema settings.

7 Likes

Thanks @elsamuel ! I wouldn’t have find these options by myself :sweat_smile:

What if I would like to append the csv filename or location as a new column, is there a simple option for that too ?

Thank you again,

Best,

Romain

Hi @romainGuiet ,

As @elsamuel said, allowing for changing schemas should work with the sample data that you have supplied.

A couple of extra notes that may be of use. Firstly there is no need to put the CSV Reader in a loop to perform this operation. You can now tell it to scan a folder for all the required CSV files, which reduces the nodes used.

Secondly, you can hit problems with “changing schema specifications” if a file is sufficiently large and does not consistently contain the same data types on all rows (e.g. some rows are clearly doubles, and some rows contain just integers). The reason this might cause problems is that the checking of specification scans only so many characters of the file. If it is large enough, the type-checker could mis-interpret the specification.

For this reason, it can sometimes be desirable to set all of the problematic column transformations (within the Transformation tab of CSV Reader) to String, and then define the column types afterwards. You could use the “Column Auto Type Cast” node after you have read the files, but this also presents a potential issue. If there is inconsistency in data types across different executions of your workflow (i.e. the files change between executions and the data types are inconsistent) then on one run, the type casting may set the column types differently to the way it would set them on a subsequent run. This might (or might not!) cause problems with your downstream flow depending on what you are subsequently doing with the data.

To avoid that potential for inconsistency between executions, I assembled a component “Redefine Table Column Types” which is available on the hub
Redefine Table Column Types – KNIME Hub.

It requires a little extra effort in the form of supplying a data table with the required data types as a sample row, but might be useful if this is likely to be an issue.

I have attached a workflow as an example of the above


Read multiple CSVs with defined types.knwf (36.6 KB)

2 Likes

Hi, there is a tick box on Advanced Settings “Append path column” where you give it a column name (default being “Path”).

There are other ways of doing it. For example, if you are using the loop construct, you could use “Path to String (Variable)” to copy the loop “path” variable to a String variable and then use a “Variable To Table Column” node to create a column from that String variable on each iteration.

1 Like

Thanks @takbb !

I found this topic and downloaded the latest version (I had the July 2021) ! So perfect !

The Path To String … was the “old” trick I knew , and was wondering if an update arrived! and YES it arrived! :tada:

Thank you again,

Romain

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.