Reading Excel Sheets with different column counts and different data types

I built a solution but you may want to check it out if it works for you. With R I check all the sheets in the excel files from a folder. The sheets get imported and read back into KNIME the type is determined by a guess from the first 50k lines.

Then I try to find out which combination of type and column name is there the most (all of the time - you might adapt that) and then only those are kept. But initially, all the data is loaded into KNIME so you might use it later. Filename and sheet-name are stored for later use.

Note: there is a solution with R that would store the data and files in a tibble within R but that is not comfortable to bring back to KNIME later. So I just used a loop.

Yes it also would have been possible to use just KNIME nodes but readxl offers the possibility to force R to read all columns as strings so you might later daecide how to convert them.

kn_example_r_excel_find_united_fields.knar (261.5 KB)

3 Likes