JOIN WITH SOME TROUBLE

gcas · June 24, 2022, 9:13am

HI ALL,
i have some problem with 2 tables and a JOIN.

in table A (Left), I have different lines like this: (all are uniques)

|A1|AAAA|
|A2|BBBB|
|A3|CCCC|

In a table B (right) i have lines like this: (lines are not uniques, sometimes are duplicated)
|AAAA|123|
|AAAA|123|
|AAAA|123|
|BBBB|0|
|BBBB|0|
|CCCC|9999|
|CCCC|9999|
|CCCC|9999|
|CCCC|9999|

When i join together the lines of table A increase for how much rows find the record in a table B.

I fix this with “Duplicate row filter” before to join the table B, but i this i should have a different solutino, right?

How can i join only with only 1 row of table B?

lelloba · June 24, 2022, 10:07am

Hi gcas,

let me try to help.
What if you use a group by node on table B before the join, so that you won’t have repetitions in B?
With group by you can aggregate column 2B using sum, average, etc. or just selecting the first value.
The join should result as you asked.

RB

gcas · June 24, 2022, 12:06pm

Thanks for you suggest!
“Group By” is an alternative solution of “remove duplicate”.

I didn’t know this node (thanks!), but i think is not the best solution.

In my mind JOIN node should have the option i think is missing: match only 1 time

lelloba · June 24, 2022, 12:12pm

If you think it can be an useful feature, you can suggest it in this part of the forum:

I don’t think that it is currently available.

RB

takbb · June 24, 2022, 12:58pm

Hi @gcas ,

The joiner node performs the standard “relational” joins and it’s not common to have a join-on-first option but I can understand there may be an occasional need to do this.

Whilst there is no direct option in the joiner to do it, you can replicate this behaviour without using the duplication removal by “ranking” the rows within the keys that you wish to join on.

For the table that is the “lookup table” (Table A in your example), you just assign every row a “rank” of 1.

For the other table, you can assign each row a rank grouped within “key”

I added an extra column to your Table B containing to demonstrate this.

After the ranking node, each row is ranked as follows:

(EDIT)
In the joiner, you include the “rank” column in the join. Since your lookup table only has 1 as its ranks, only the first row will match for each key. In the joiner we elect to remove the rank nodes as part of the join. Here I have elected to include the (right) unmatched rows too:

and so you then get match-on-first

or if you only include matched rows you get the same output as if you’d removed duplicates:

Join on first match only.knwf (13.7 KB)

Personally I would generally do duplicate removal as you already suggested. I know it’s still extra steps, but I hope that helps if duplication removal is for some reason otherwise causing problems.

gcas · June 24, 2022, 1:06pm

Thanks a lot for your reply and approach!
I could use your solution waiting for a join-on-first (or last) option in Join Node.

Have a nice day!

mlauber71 · June 24, 2022, 2:41pm

@gcas with a lot of databases you can use Window functions like RANK to have more control over which record gets selected at the end. You might have to have a criterion to determine which row would get choosen. I have discussed options here

An example whith H2 database is here

system · July 1, 2022, 2:41pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.