Hello,
I have problem with understanding JOINER NODE and its 2 options: Left and Right Outer Join.
Description:
I have 2 datasets - A and B. A is 1000 rows and B 1000000. When I am trying to join A with B using Left Outer Join and 2 columns (in this case A is on the top - left side of the joiner and B is connected to the lower input - right side in the joiner).
As an output I get 1000 rows from A extended (joined) by columns from B. That is ok.
A
col1 col2 valA
X B V
Y C 7
T D 8
B
col1 col2 valB
X B G
Y C 8
I H 9
Output: A Left Outer Join B using col1 and col2 and match all of the following:
X B V G
Y C 7 8
T D 8 ?
Case 2 is weirder: I am doing the same, but A this time is connected to lower input and B to upper input - changing option to Right Outer Join - here the output should be the same - 1000 rows. Instead I am getting 1002 - 2 additional rows for every matched row between A and B.
G V B X
8 7 C Y
? V B X -> additional row with nulls (notice that X B V are from table A and above they have match in table B
? 7 C Y -> additional row with nulls (the same Y C 7 had match but also have null values which is weird)
Using logic these two outputs should be the same (and they are for small datasets, but for my generated 2 - 1000 and 1000000 i am getting weird outputs).
If someone is interested here is the example pictures that could be more clear then description above (result of a Right Outer Join):
h t t p : //postimage . org/ image /fkxiruei1/
Please notice how it is possible for 2 rows from table A to have match in table B and in the same moment to have null values. - Only when using Right Outer Join and big dataset.
Any help appreciated. Thanks!