How to Join Multiple Tables with Different Schemas in KNIME (Similar to Alteryx Multi-Join)?

Hi KNIME Community,

I have a use case where I need to join multiple incoming tables from flat files, and these tables have different schemas.

In Alteryx, there is a direct tool called Join Multiple, which allows joining multiple inputs easily.

However, in KNIME, I couldn’t find a single node that does this.

So far, I’ve tried using two Joiner nodes to combine the tables, but this approach doesn’t scale well when the number of tables increases.

I came across this Multiple join - #3 by aworker suggests using Recursive Loop Start/End nodes to wrap the Joiner node and perform multiple joins iteratively. However, I couldn’t get this to work in my case.

Here’s what I need help with:

  • How can I configure the Recursive Loop so that the Joiner node joins all tables one by one?

  • Is there a better approach for joining multiple tables with different schemas in KNIME?

Any guidance, examples, or best practices would be greatly appreciated!


What I’ve Tried:

  • Used multiple Joiner nodes manually (works but not scalable).

  • Attempted Recursive Loop setup (Joiner inside loop), but I’m stuck because the Joiner node only accepts two inputs.


Goal:

A solution similar to Alteryx’s Join Multiple tool, but in KNIME.

Hi @Arpit_Sharma ,can you give an idea of the type of data you are trying to join, possibly even some small demo files.

As you have noted, there is no joiner node in KNIME that can handle joining more than two tables at once.

I’m wondering though in your case how much data from each input table is to be added to the output. It could be for example that there are two primary tables to be joined and then other tables simply augment the output which could be efficiently achieved by a series of Value Lookup nodes.

I don’t think there is necessarily a “one pattern fits all” for this, so if you can give more information about the nature of the various input tables and expected output table that would be beneficial.

btw I understand your concerns about not scaling, but trying to suggest any pseudo-generic solutions with no idea of the use case is difficult.

Hi,

it’s not easy to find a generalized solution but maybe this is a good start:
The idea is, that we define a list of Paths, Columns to use for the GroupBy and for the Joining. Then iterate recursively over this list and expand the table in each iteration.

1 Like

But you need to be careful with the sizes… I dont think that hundrets of iterations will work. I tested it with three tables

HI guys,

A question… If tyou have some files/tables with the same scheme, you can use a csv/json/file reader node with the folder configuration to load multiples files. This will make a single table with all data that you have.

If you need to compare file by file and add some column because they have differences, use de join node for it.

After that, you can use de group node to sum the information or make some calc and at the end, use de row filter and column filter to remove information that you don’t need any more.

Can it be a solution for you?

Thanks,

Denis Iongbloed

Hi @Arpit_Sharma

I’m not fully sure what your expected result is, but… When I need to have data from various (in terms of schemes applied) files, usually I:

  1. Rename columns in input files to somehow reflect filename in the column header.
  2. Apply ‘Concantenate’ node with ‘Union’ as expected outcome.

Thanks to this I get new rows and new columns. However, as I already mentioned, I’m not sure if this could fit your needs.

Happy KNIMEing

Hello,
I have a same scenario where i am using 6 tables joining 2 columns in every table. In all the tables, 1 column has same name. However, the second column has different name but data is same. I tried to do this method. But, i am not getting the same number of records as multi join. I tried this method in Alteryx, its giving same records in knime.
I need the records same as multi join.Please provide a better solution for this.

@Anjum_Taj it is difficult to advise just from the screenshot. From there three active ports and the concatenate node it seems something is off.

In general you might want to plan ahead what is happening at each step and check the results. Matching non matching and what it is you want to achieve. Joining: adding columns to a matching set of IDs and/or appending the same columns to new IDs.

The key to success in these cases from my experience is in the planning.

You can check out the example provided.

Best case would be you provide data representing your challenge so we could user that to discuss.

1 Like


Hi @mlauber71
I am adding these columns in the knime. First joiner, i am taking from Input#1 and Input#2 and then concatenating all the outputs. In the second joining the first input coming from conacatenate taking Input#1 columns with 3 rd input table columns and so on.. Till 5 tables

  1. deactivate the toggle “split outputs” in Knimes Joiner. Then all the output is in the first output (and output 2 and 3 get deactivated)

  2. configure it to match your actual need (Inner / Left / Full outer join), and just to n-1 joins (n tables), adding one table at a time.

I dont know Alteryx Multi-Join but if this is similar to a Full Outer Join, then you will need to Merge the columns you joined on if you have something else than a 1:1 match (column merger node)

1 Like