Rule Based Row Splitter: "Matches" gives different result vs "=". Why is this?

After completing an inner join on two tables comparing 3 strings (I’ll call them A1 vs A2, B1 vs B2, C1 vs C2) I get 366 rows in the output table. Then I am trying to isolate other strings that might not match using the row splitter node.
After seeing unexpected results, I am using the row splitter node to test pairs of strings from the joined table one at a time.

I’m finding that if I use this rule:
$B1$ MATCHES $B2$ => TRUE
One row is rejected as a non-match even though the values look identical. 365 rows pass the rule.

If I change the rule to
$B1$ = $B2$ => TRUE
All 366 rows pass the rule as I would predict.

When I examine the table in Excel and use an =IF(B1=B2,1,0) statement I get a “1” indicating Excel thinks the values are equal.

I’m trying to understand the difference between how “MATCHES” is supposed to compare strings vs how “=” compares strings, but am coming up empty. Does anyone know?

I already confirmed there are no unseen leading or trailing spaces. The previous Inner Join node thinks they match. One thing about this particular row… the strings in question are the only ones with parentheses. The values have this identical format “abcd (efgh)”. But it’s not obvious why this would cause Knime to think the two seemingly matching strings fail the MATCH rule.

Thank you.

Hi,
Matches expects the right value to be a regular expression, so if you have special values in there, it treats them as control characters of the Regex. The dot, for example, stands for any value. * means the previous value is repeated 0 or more times, + means the previous value is repeated 1 or more times.
Kind regards
Alexander

3 Likes

Thank you Alexander. Clear answer and very fast reply!
Dan

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.