File Reader / Definition of "Content" Type

Dear all,

I am using a "looped" workflow iterating over a number of csv-files to get one large file. Unfortunately the column types are changing for different files from integer to double. The files only contain numeric values. The file reader node faces that difficulty and stops working.

The workaround is to set the "difficult" columns to Double by default.

But unfortunatly this happens with different columns in different files.

Is there any possibilities to set at the very beginning all column types to "Double" so that any numeric value is seen as double value.

Any idea how to ease my job?

Best regards and thank you,

Jürgen

 

PS: Each files has app 250 columns and 10000 rows

 

Hi Jürgen, I could only imagine that you read in the whole file whereby the data is read into a single column (using a fake column delimiter in the File Reader), and then split this data with the Cell Splitter node afterwards; double/int columns can then easily by converted into the most general type.

Hi Gabriel.

it's a nice idea... but after splitting the column into colums I have not seen a possibility to get the column names....

I will explore R to achive that

Thank you very much

Jürgen

PS: Obviously another option would be to correct / adapt the csv files upfront 

Hi Jurgen, 

One option that may be worth exploring is manually casting everything to a double using a column list loop start but I expect this could be a bit slow and cumbersome to set up.  If that sounds promising though, post back and I'll cook up an example. 

Regards,

Aaron

 

Hi Jürgen,

It is possible to read column names in R and write them into a file as first line. You can then append your data and proceed as Gabriel had suggested.

Best regards

Jerry

Dear all,

thank you very much for your suggestions. I have done the "looping" in R and will process the file in KNIME.

Best regards,

Jürgen

PS: The difficulties result from Excel and inconsistencies in column naming...