Configurable amount of join keys

marcandavi2 · September 18, 2024, 1:53pm

I would like to perform a JOIN between to tables with identical schema based on a configurable set of columns. To pick these, I would use the Column Filter Configuration node to get a variable that represents the array of selected columns. But then, how do I pass these to the Join node?

In Knime version 4.3.0 there isnt the option to configure the join predicate as an array of strings

takbb · September 18, 2024, 4:29pm

Hi @marcandavi2 , do you mean KNIME 5.3.0 ?

marcandavi2 · September 18, 2024, 4:40pm

Yes, made a typo in my original message

takbb · September 18, 2024, 5:04pm

Thanks for clarifying. I can certainly see why that change would be problematic! I admit I am not a fan of the non-array configuration of flow variables for such nodes as it makes it very difficult, if not impossible, to configure them programmatically, as you are demonstrating.

The only thing I can suggest at the moment is searching the node repository for the deprecated version of the Joiner node, which is still available.

(I think you may have to switch in Modern UI for this search to work, if you are currently using Classic. You can switch back again after you have found the node and included it on your workflow)

search for:

joiner //deprecated

and make sure to choose the node shown in red below, as there have been several joiner nodes deprecated in the past!

JPollet · September 18, 2024, 5:14pm

Hi,
I agree : array configuration are simple in this case. Can this evolve for future versions ?
Joel

takbb · September 18, 2024, 6:08pm

Here are a couple of mechanisms for dynamic joining using the newer (5.3) Joiner node. It’s not particularly pretty.

The additional functions have been added in components on the workflow to enable column selection. One mechanism is to create a concatenated key of the selected columns from the two tables and have the Joiner join using just that key column.

The second mechanism is to define a series of flow variables for a predefined (maximum) number of columns, and have a component generate the flow variables. The joiner node is pre-configured with, say, 10 join predicates. The component generates this number of join flow-variables; repeating the first predicate if there are fewer than the (10) defined predicates.

Dynamic Join.knwf (122.1 KB)

The components are a little “thrown together” as a proof of concept. There may be a more efficient way to achieve their respective goals.

JPollet · September 19, 2024, 11:22am

Hi,
@takbb I take a look at your workflow and I made the connection with a old problem I have : in your workflow, it is assuming that first colmun selected in table 1 match first column in table 2. I search for a selection widget for : for each of the column chosen in table 1 (so an arbitrary number of column), user can select a value (take in the list of columns choosen in table 2) ie a generalized Value selection widget. Is the solution straightforward or may I create another post ?
Best,
Joel

takbb · September 19, 2024, 12:23pm

Hi @JPollet , the “solutions” I gave were intended to answer the specific question from @marcandavi2 in which they were trying to join two tables that had identical schemas. They are very much a proof of concept and I think generally it would be better to use the deprecated nodes than to try to work with the suggestions given, so I think should be treated more as an academic exercise than a production solution :-).

You are right that the first (upper) solution, which creates a key from column aggregations would work only if the chosen “join columns” were in the same order on both tables.

To make the upper solution (using column aggregations) work regardless of the table column order, it would be necessary to ensure that the column order was the same in both tables when generating the column aggregations.

This could be achieved by modifying the upper flow’s “Select Joiner Columns” component as follows:

This still pre-supposes that all of the column names to be used for joining are the same in both tables.

The lower solution in that workflow simply requires that columns in both tables have the same names, and should work regardless of whether the columns are in the same order.

There are limitations to the configuration/widget nodes in terms of building components and the amount of control users have over selections and ordering of selections, which I think is what you are asking about.

I think if the above doesn’t provide the answers you are looking for (and I suspect it doesn’t ), then it would be better to create another post as you have suggested, and give more detail of the specific scenario you envisage.

JPollet · September 19, 2024, 12:28pm

Ok @takbb thanks for the hints. I will make a specific post.

Best,
Joel

Edit : here Dynamic mapping : user can select pairs key_value

system · September 26, 2024, 12:29pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.