Join Multiple CSVs with different column headers

Hi everyone,

How do I join multiple files (or in this case CSVs) into 1 file? I’m new to Knime so I apologize if this is a common knowledge. I’ve been reading posted topics but I can’t find a direct answer to my problem. My problem is — I have at least 12 CSVs that I want to join. I want to join them by their unique IDs and combine the columns in the output file. Here’s a sample of the data:
File 1

Column 1 Column 2 Column 3 Column 4
id housing
3100205 46
3100240 200
3100415 944
3100555 835
3100625 248

File 2

Column 1 Column 2 Column 3 Column 4
id population
3100205 96
3100240 403
3100415 1504
3100555 1370
3100625 472

File 3

Column 1 Column 2 Column 3 Column 4 E F G
id income_1 income_2 income_3 income_4 income_5 income_6
3100205 34 5 2 0 0 0
3100240 121 2 2 0 0 0
3100415 522 41 14 10 7 2
3100555 468 25 8 4 2 2
3100625 171 13 6 4 0 2

The output file should be like this:

Column 1 Column 2 Column 3 Column 4 E F G H I
id housing population income_1 income_2 income_3 income_4 income_5 income_6
3100205 46 96 34 5 2 0 0 0
3100240 200 403 121 2 2 0 0 0
3100415 944 1504 522 41 14 10 7 2
3100555 835 1370 468 25 8 4 2 2
3100625 248 472 171 13 6 4 0 2

Thanks everyone for your help!

Hi @r_cayme and welcome to the Community,

Please have a look at the below workflow:


Join_multiple_files.knwf (83.0 KB)

which generates the following outcome:

Please make sure to:

  • replace Table Creator nodes with CSV Reader nodes;
  • extend the worflow from 3 up to 12 cases;
  • use your own data.

Of course, you can use loop to read your csv files, however I feel simple solution might be enough for you.

Happy KNIMEing,
Kaz

1 Like

Hello @r_cayme

I have made a solution for you:

The overall flow looks like this:


As you are new to KNIME I will take you through it and tell you how you can modify it for your own needs.
First thing you need to do is replace the Table creator nodes with CSV reader nodes. To add all of them into the Concatenate node you will need to right click on the Concatenate node and select ‘add Input port’ - keep doing this until the number of input ports matches the number of CSV reader nodes you have. when that is done attach a line from each CSB reader node to an empty Concatenator node input port. Execute the Column Resorter node - and you are done. the Output should be what you require.
Things to note:

  1. in your examples you gave the joining column name is called ‘id’, if that is not the case you will have to make adjustments by changing ‘id’ to what every it is in: Group Loop Start Node, Table Row To Variable node and the Constant Value Column node.
  2. If you have more or less .csv’s then remove them or add them to the flow remembering to right click on the Concatenate node to add or remove the input ports.

Frank

One more thing - I broke my rule today and have a loop within a loop in a KNIME flow. But, I only had my lunch hour.

1 Like