Reading Excel Sheets with different column counts and different data types

I have a collection of data extracts whose column count varies between 90-110 and I want to choose the 26 that exist across all, as well as the data type might vary some maybe double or integer or string.

I tried using a column filter and that approach didn’t pan out well.

I built a solution but you may want to check it out if it works for you. With R I check all the sheets in the excel files from a folder. The sheets get imported and read back into KNIME the type is determined by a guess from the first 50k lines.

Then I try to find out which combination of type and column name is there the most (all of the time - you might adapt that) and then only those are kept. But initially, all the data is loaded into KNIME so you might use it later. Filename and sheet-name are stored for later use.

Note: there is a solution with R that would store the data and files in a tibble within R but that is not comfortable to bring back to KNIME later. So I just used a loop.

Yes it also would have been possible to use just KNIME nodes but readxl offers the possibility to force R to read all columns as strings so you might later daecide how to convert them.

kn_example_r_excel_find_united_fields.knar (261.5 KB)

3 Likes

Why have you broken this down:

v_path <- paste0(knime.flow.in[[“var_path_data”]], knime.flow.in[[“File name”]], “.”, knime.flow.in[[“File extension”]])

Also doesn’t work in my case as I get this error

WARN R Snippet 8:41 Node created an empty data table.
WARN Constant Value Column 8:56 Node created an empty data table.
WARN Extract Table Spec 8:50 Node created an empty data table.
WARN Constant Value Column 8:51 Node created an empty data table.
WARN Constant Value Column 8:55 Node created an empty data table.
WARN Constant Value Column 8:52 Node created an empty data table.
WARN Loop End (2 ports) 8:54 Node created empty data tables on all out-ports.
WARN R Snippet 8:41 Node created an empty data table.
WARN Constant Value Column 8:56 Node created an empty data table.
WARN Extract Table Spec 8:50 Node created an empty data table.
WARN Constant Value Column 8:51 Node created an empty data table.
WARN Constant Value Column 8:55 Node created an empty data table.
WARN Constant Value Column 8:52 Node created an empty data table.
WARN Loop End (2 ports) 8:54 Node created empty data tables on all out-ports.

You might have to adapt the path in order to function on your machine. Eg the “/” might have to be “\” and it assumes the files are located in a folder /data/ on the same level as the workflow. Maybe you check if the list of files before the loop is suitable for you and there are no strange files there that you would not want.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Just a quick note that reading multiple files with different different numbers of columns and data types is now possible within the new Excel Reader in version 4.3. Just select to read “Files in folder” and, if having different data types, disable to limit rows being scanned. In fact, a bunch of new reader nodes can do this!

Best,
Simon

3 Likes