Union multiple excel files with different columns

danielkataoka · October 20, 2022, 6:23pm

Hi all.

Sorry if this question has already been answered.

I have multiple excel files in a folder and I need to union all.
The problem is that I need to select some columns.

Is it possible?

Tks

knimediger · October 20, 2022, 7:35pm

First of all I’m assuming that all the xls files are coming with the same structure (i.e. data types in the columns). Furthermore I assume that the number of input files is fixed and know upfront
The general frame for solving your challenge might then look like this:
Read each of the files one by one with the xls Reader Node Excel Reader — NodePit
The apply the Concatenate Node https://nodepit.com/node/org.knime.base.node.preproc.append.row.AppendedRowsNodeFactory?numWorkflows=999999 with the tables.

danielkataoka · October 20, 2022, 7:48pm

Hi @knimediger .

Sorry but my xls files don’t have the same structure.
Some files have more columns than the others.

mlauber71 · October 20, 2022, 8:57pm

@danielkataoka you could use the Reference Column Filter – KNIME Hub to make sure only the set of columns you want remain.

Daniel_Weikert · October 21, 2022, 3:25pm

Read the files with a loop, loop end node (allow changing table specifications) and finally filter what you need.
br

bruno29a · October 22, 2022, 5:08am

Hi @danielkataoka , there are a couple of ways to go about this, but it’s hard to tell without seeing the structure of your files.

There aren’t much you saying. As I always say, the more info you give, the more precise the solution you will get. “Help us help you”.

So, are we to assume that you want to keep only the columns that exist in all the files, or should the ones that don’t have the additional columns to be filled with EMPTY values?

If it is the former, how are the files with more columns set up? Are the additional columns all after the common columns? If that is the case, then you can choose the option “Read only data in” instead of reading the entire data of the sheet in the “Sheet area” option:

And of course, we want to read all the files at once, which is faster, so we want to read with the “Files in folder” option instead of “File”:

If the columns are in no specific order, I’ll still read all the files at once, but making sure to have the “Support changing file schemas” option on:

You can then do your Column Filter and apply any other rules on the dataset.

system · January 20, 2023, 5:08am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.