Match keywords as substring of any row in another table column

For example, in table 1, we have
Column name: Keyword
Row0 Hello
Row1 World
Row2 Bye

in table 2, we have
Column name: Checklist
Row0 World Peace
Row1 Hello there

Now I want to mark each row in table 1 according to if the value is present as a substring of any value in Checklist.

The expected result is this new column appended in table 1.
Row0 Yes
Row1 Yes
Row2 No

Hi @user2022 and welcome to the Knime Community.

What if table 2 had this:
Row2 Goodbye

What would be the appended value in table 1 for Row2? Would it still be No, because there was no word as “bye”, or would it be Yes, because “bye” is a substring of “Goodbye”?

Just trying to make sure we understand what you mean by “the value is present as a substring of any value in Checklist”

Thanks for replying.

Yes, let’s consider “BYE” is a substring in “GOODBYE”. It doesn’t have to be a single word, also, case won’t matter because everything has been uppercased already beforehand.

Hi @user2022 , thank you for confirming.

There are a few ways to do this, and here are 2 ways that I put together for you.

Method 1: Uses pure knime nodes and relies on the Cross Joiner

Method 2: Uses a virtual DB such as H2 where we can do a join using LIKE statement.

The workflow has both methods, and looks like this:

This is what I used as keywords:
image

This is what I used as Checklist:
image

Results using method 1:
image

Results using method 2:
image

As expected, both gives the same results.

If your dataset is big, then the cross joiner might end up with a huge dataset as it will do a cross join (meaning all the possible combinations of Keyword and Checklist), in which case, the virtual DB might be better.

You can try both to see what runs best for you.

Here’s the workflow:

4 Likes

We’ve also done something similar to this in the Just KNIME It challenges

Here are more solutions to this kind of problem as well:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.