HI ALL, i have some problem with 2 tables and a JOIN. in table A (Left), I have different lines like this: (all are uniques) |A1|AAAA| |A2|BBBB| |A3|CCCC| In a table B (right) i have lines like this: (lines are not uniques, sometimes are duplicated) |AAAA|123| |AAAA|123| |AAAA|123| |BBBB|0| |BBBB|0| |CCCC|9999| |CCCC|9999| |CCCC|9999| |CCCC|9999| When i join together the lines of table A increase for how much rows find the record in a table B. I fix this with “Duplicate row filter” before to join the table B, but i this i should have a different solutino, right? How can i join only with only 1 row of table B? [immagine]

Hi @gcas , The joiner node performs the standard “relational” joins and it’s not common to have a join-on-first option but I can understand there may be an occasional need to do this. Whilst there is no direct option in the joiner to do it, you can replicate this behaviour without using the duplication removal by “ranking” the rows within the keys that you wish to join on. [image] For the table that is the “lookup table” (Table A in your example), you just assign every row a “rank” of 1. [image] For the other table, you can assign each row a rank grouped within “key” [image] [image] I added an extra column to your Table B containing to demonstrate this. [image] After the ranking node, each row is ranked as follows: [image] (EDIT) In the joiner, you include the “rank” column in the join. Since your lookup table only has 1 as its ranks, only the first row will match for each key. In the joiner we elect to remove the rank nodes as part of the join. Here I have elected to include the (right) unmatched rows too: [image] and so you then get match-on-first [image] or if you only include matched rows you get the same output as if you’d removed duplicates: [image] Join on first match only.knwf (13.7 KB) Personally I would generally do duplicate removal as you already suggested. I know it’s still extra steps, but I hope that helps if duplication removal is for some reason otherwise causing problems.

JOIN WITH SOME TROUBLE

KNIME Analytics Platform

mlauber71 June 24, 2022, 2:41pm 7

@gcas with a lot of databases you can use Window functions like RANK to have more control over which record gets selected at the end. You might have to have a criterion to determine which row would get choosen. I have discussed options here

An example whith H2 database is here

3 Likes