How to Compare Strings in Two Columns?

Hi All,

Here is the demonstration. I would like to know whether the string in Column appears in Column A.
Is there any node that can help me? I have tried to use rule engine like the following but it doesn’t work.

$Column A$ LIKE $Coulmn B$ => “T”

Column A Column B
abcdefg a
abcdefg b
abcdefg c
abcdefg z
abcdefg x
abcdefg y

Result:

Column A Column B Column C
abcdefg a T
abcdefg b T
abcdefg c T
abcdefg z F
abcdefg x F
abcdefg y F

Hi @HKuser

Alternativly you can use a String Manipulation node, to identify the position within the string you are looking for. All values from 0 en higher means the string is present, -1 the string is nog present.


gr. Hans

4 Likes

Hi @HKuser ,

[Whilst writing, I see that @HansS has already provided a more straightforward solution :wink: I will continue to add this as alternative examples which may still be useful as general info]

Unfortunately as you have found the Rule Engine cannot do this directly. The LIKE function requires wild cards around the pattern for this to work and (to my knowledge) you cannot perform any kind of concatenation within the Rule Engine to add the required wild cards to the value of Column B.

Likewise, the alternative MATCHES predicate requires a regular expression which again you cannot build (except as a literal) within Rule Engine.

To make this work with the Rule Engine, you would therefore need to add an additional node (e.g. String Manipulation) to build the pattern that you are trying to match and then use that within the Rule Engine.

The attached workflow demonstrates this for both Wildcard (LIKE), and Regular Expression (MATCHES).

If using the String Manipulation node, you can however do it without the Rule Engine in this example, as the String Manipulation node can also return TRUE or FALSE in response to a regular expression match, and has the advantage that it can also build the pattern. [Though for this use-case the solution given by @HansS is more straightforward]

I have included this in the workflow too.

Column String compare example.knwf (13.6 KB)

5 Likes

I expect a lot of future KNIME’rs will end up in this topic while searching on how to do this so throwing another method into the party mix: :slight_smile:

A Column Expression with contains(column("column1"),column("column2"))

3 Likes

Thank you all for your reply! It really works for me!

How about if I would like to count the frequency of the specific words appearing within the text?

I don’t know can I count the Chinese words?

1 Like

Out of my head, something like split sentence to list, ungroup then group by sentence identifier and word with count aggregation
br