Joiner - full outer join issue

Hi All,

I’m encountering a problem when using the Joiner node. When I select a “full outer join”, the full join doesn’t seem to work correctly, rather data from the ‘right’ table is discarded and placed at the bottom of the table with ‘?’ in the place the the joining column(s). I’ve made a very simple working example below where two tables are joined, where each contains a single month missing from the other.

Selection_072

Joining these tables with an outer join on the column ‘month’ I would expect the following:

Selection_073

where the months missing data (May missing from table 1, and July missing from table 2) are both returned, but with ? in place of the missing value.

However through Knime i get the following when using a full outer join:

Selection_074

Where May’s result has been moved to the bottom and the month is replaced with a ‘?’, which mean’s i’m losing the entry for May. The node is registering that this entry is present when doing the join, since the “profit” data is retained, but the month gets replaced with “?” instead of simply adding a row with Month = May, Profit = ?, Cost = 750. The node is correctly dealing with the fact that the month “July” is missing from the right table however, giving a row with, month = July, profit = 1500, cost = 1.

Another consequence of this is that swapping the order or the two tables results in a different output, which doesn’t seem to make sense for an outer join, as shown below:

Selection_075

This time May is present, but July is missing.

Does anyone know any solutions to this issue??

Thanks very much!

Hi there,

Welcome to KNIME Community Forum! I agree with you. It is bit unexpected.

A bit of explanation. In Column Selection tab of Joiner’s configuration the default is that joining columns from top input are left and from bottom input are removed.

So in your example Joiner only left Month column from top input as it was, and populated it with missing values where there was no value. As you noticed node joins correctly but does not joins joining column. This should be improved. Also this setting is the reason why swapping the order or the two tables results in a different output.

As a workaround you can leave both joining columns and after that use Rule Engine node to replace missing values. Here are expressions and configuration (need modification according to your column names of course):

After that using Column Filter node you can drop column1 (#1) column from my example.

Tnx for reporting this!

EDIT: New Joiner node, Joiner (Labs), addressed this issue (there is option Merge join columns) and with it full outer join works as expected.

Br,
Ivan

7 Likes

Good explanation, @ipazin! I would add that you can use the Column merger node to combine the two date columns - select them both in the node dialogue, and select the ‘Replace both columns’ option

Steve

3 Likes

Hi there,

just to follow up on this one. Ticket for this one already exists and has been given +1.

Br,
Ivan

Thanks very much! This solution worked perfectly!

Thanks again!

Harry T

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.