anti-join between 2 tables

mkfam9 · November 26, 2015, 10:21am

Hi,

I need to do an anti-join between 2 tables, but the "joiner" node doesn't have this option. Is there any other node that can be used for that purpose? Any ideas about how the anti-join can be done in Knime?

Thank you!

Iris · November 26, 2015, 3:27pm

Hi,

what is a Anti-Join?

Sounds interesting. Maybe a Column Splitting? or the reference row filter?

mkfam9 · November 27, 2015, 9:12am

An anti-join between two tables returns one copy of each row from the first table for which no matches are found in the second table.

I attached a picture from wikipedia.

anti-join.png

ImNotGoodSry · November 27, 2015, 1:07pm

What you are looking for is the Reference Row Filter node. Please see the screenshot attached to this post.

Best,
Marc

antijoin.png

Geo · November 28, 2015, 3:25pm

Yes, but that's still a workaround, for you require two Reference Row Filters for it to work the same way: one for the first table using the 2nd as reference and another for the 2nd table using the 1st as reference.

s.roughley · November 29, 2015, 2:23pm

In that case I think you need the Joiner node with "Full Outer Join"- rows not in the left table will have missing values for the left table columns in the output, and rows not in the right table will have missing values in the right table columns in the output. A couple or row filters, and then you will probably need to do something like use a table splitter to split the left and right columns, a column rename (regex) to remove the suffixes from the right table and a concatenate to put the back together as a single table

Steve

Iris · November 30, 2015, 9:14am

@Geo: We will publish a Reference Row Splitter with 3.1. :-)

michael.deligny · November 30, 2015, 11:49am

Might be 'Set Perator' it's what you're looking for. it 'll give you a table which can be the ref table for 2 successive ref table filters on each table...?

ImNotGoodSry · November 30, 2015, 3:41pm

@Geo:

The use of the Reference Row Filter node is the perfect answer to mkfam9's question. So, in case we learned something new today, because mkfam9 asked the wrong question, I feel very sorry for that. And by the way, using two or three nodes for an obviously not so common task should be acceptable.

mkfam9 · December 2, 2015, 7:35am

Thank you everyone for your contribution. The solution posted by "ImNotGoodSry" worked for me and it is what I was looking for.

@ImNotGoodSry
I don't understand why my posting was qualified as "wrong question", but I am open to constructive feedback so that I can improve my future posts.

ImNotGoodSry · December 2, 2015, 9:58am

@mkfam9:

Don't worry, the "wrong question" remark was addressed to "Geo". ;-)

Geo · December 2, 2015, 7:16pm

@ImNotGoodSry:

I do perfectly understand the solution involving the reference row filters and there are even other solutions not mentioned here. In data management there are always many good answers for any given problem.

And yes, why not even have the "anti-join" in the Joiner node ? Seems to me like a reasonable feature request. That's why I've called the reference row filter solution a work-around.

ImNotGoodSry · December 3, 2015, 9:40am

Dear Geo,

mkfam9's problem could be solved by using one single node. There was absolutely no need to use multiple Reference Row Filters. So the term "workaround" seems not very appropriate for the given problem.

However, if the use of the term "anti-join" was inappropriate, because an "anti-join" (technically speaking!) is something else, I would say, he asked the wrong question.

I hope you understand what I mean.

Best,
Marc

Geo · December 4, 2015, 1:02am

Ok, I understand what you mean. So the problem asked here appears to be based on the assumption that one can identify each row using a single column in any given set, isn't it ? How would one then tackle multiple column comparisons using Reference Row Filter ? Or would such a circumstance be considered outside of the anti-join's scope ?

beginner · December 17, 2015, 9:36am

> How would one then tackle multiple column comparisons using Reference Row Filter ?

Concentante the columns together in an unambigous and defined format. Use that column in reference row filter.