Finding non-matching elements between tables

lparsons42 · February 25, 2019, 4:27pm

Hello
I have two tables that I am looking for matches between. The first table has shorter strings than the second, so the shorter strings from the first are used to search the second. In my case I’m using a chunk loop to go through the first table, and then a Rule-based Row Filter to search the second table using the strings from the first. This works well, however it doesn’t retain the strings from the first table that have no matches from the second.

In other words, if there is a string from table 1 that reads “AAAABBBBCCCC” and it matches a string in table 2 that reads “ZZZZAAAABBBBCCCCDDDD”, I see that in the output. What I don’t see though is any information on a string from table 1 that has no matches in table 2. Is there a way to do this? I don’t want a list of all the strings from table 2 than the string from table 1 did not match (as this would be enormous and not terribly useful), I just want to know which strings from 1 had no matches at all to 2. I don’t see a way to do this with Rule-based Row Filter; I’d be fine with changing to a different matching node if one can do this for me.

thank you!

DaveK · February 25, 2019, 4:54pm

Hi @lparsons42,

are you looping through the first table one by one? If so, each time the String from the first table doesn’t match anything from the second table, the Rule-based Row Filter should output an empty table. Then you can use the Empty Table Switch node to collect the Strings from the first table that did not match.

I’m happy to look at your workflow if you have further questions.

Cheers,
David

lparsons42 · February 25, 2019, 5:14pm

Thank you for the quick reply! This makes sense but I’m not sure how to handle the output from the Empty Table Switch. I tried changing my previous “Loop End” node to “Loop End (2 ports)” - with the second out of the Empty Table Switch going to the second input of the 2 port Loop End - but when I run the workflow it returns an error “Active Scope End node in inactive branch is not allowed”.
Is there something else I should do with the second output of Empty Table Switch so that the non-matching strings are collected into a table for the end of the loop?
thank you!

DaveK · February 25, 2019, 5:29pm

Hi @lparsons42,

could you attach a workflow showing the issue?

Cheers,
David

lparsons42 · February 25, 2019, 5:54pm

I looked a little further and found someone else having a similar issue that was resolved by using an “End IF” switch upstream of “Loop End” - rather than a two-port Loop End as I had tried. Here we see the result of doing this. However this causes problems with the lower output from the “Empty Table Switch” creating empty tables (which seems to be what it aims to do). This is why I have more “Constant Value Column” nodes on that branch, to fill them in. However the “Constant Value Column” nodes report “Node created an empty data table” as well; I’m not sure how to overcome that.

DaveK · February 25, 2019, 5:59pm

Hi @lparsons42,

could you maybe export your workflow (check the box which says that the workflow should not be reset) and attach the file (.knwf) here? That would help a lot.

Thanks,
David

lparsons42 · February 25, 2019, 7:48pm

David

Attached is my most recent modification to the same workflow. This appears to work correctly in that the final table comes out with rows that show which of the elements from 1 match something from 2 and which do not. It does throw a lot of warnings along the way though, particularly of incompatible data table structures. If there is a better way to built this please let me know.

thank you!

Table-matching-and-mismatching.knwf (234.4 KB)

DaveK · February 26, 2019, 9:06am

Hi @lparsons42,

a workflow is attached. Is that what you are looking for? The output are the two tables of the Row Splitter node. The upper table contains all ‘non_p_peptide’ that did not have any matches. And the lower table the ones that had matches.
The usage of the Empty Table switch can be a bit tricky.

Table-matching-and-mismatching.knwf (167.8 KB)

Cheers,
David