Expression Row Filter node does not behave as expected

Hi Community.
I am exploring the Expression Row Filter node as I think it has a lot of potential (along with the other Expression Nodes!).
I ran into the following issue
I have a column with emails and I want to filter out those rows with invalid emails.
Using a Row Filter or a Rule-based Row Filter is pretty simple. For example, the syntax for that would be

$email$ LIKE “@.*” => TRUE

However, when using the Expression Row Filter, I am not getting the desired output. I entered the following syntax and the node does not return any rows
like($[“email”], “%@%.%”) = TRUE

I have also used other wildcards (e.g., “@.*”) with similar results.
Strangely, a modified version of the wildcard (e.g., “%.%”) partially returns results.

I am guessing the issue is with the character @, but I am not completely sure.

Any advice is appreciated!

1 Like

Hi @cmora

I have had a play with this, and I agree that something is not right here. I went looking at the Expression node too, as it was easier to test results and it indicates the same problem when it comes to wildcards and the like() function.

According to the help documentation, the Expression nodes use the SQL style wildcards (so % and _ rather than * and ?), but in spite of that it still does not appear to work correctly.

like

(string, pattern, modifiers)

Check if string matches the given pattern using the syntax of SQL’s LIKE:

  • _ is a single-character wildcard
  • % represents 0, 1, or more characters
  • to match a literal % or _, use [%] or [_]

To do a quick test I created a table with a single column containing the value ABCDE…

column1
ABCDE

I then used the Expression node and created a new column using the following:

like($["column1"], "_B%E")
This returned TRUE, as expected

I then tried this:
like($["column1"], "%B%E")

and it returned FALSE !

i.e. to summarise:

column1 like($["column1"], "_B%E") like($["column1"], "%B%E")
ABCDE true false

:hushed:
To me, that is just clearly wrong. If % represents 0,1 or more characters then it should never not match where _ matches, and yet it does.

I’ve tried this with both KNIME 5.3.3 and KNIME 5.4, on Windows 10 and see the same result.

I also tried a number of other variations just using literals to see what happens:

expression result
like("ABCDE", "A%%E") false
like("ABCDE", "A%E") true
like("ABCDE", "%%%%E") false
like("ABCDE", "%E") true
like("ABCDE", "____E") false
like("ABCDE", "A____") false
like("ABCDE", "A%") true
like("ABCDE", "A%E") true
like("ABCDE", "A%_") true
like("ABCDE", "%") true
like("ABCDE", "%B%") true
like("ABCDE", "%B__%") false
like("apple", "a%le") true
like("banana","_a_a_a") false (this one from the help doc that says it returns true)

As far as I can see,all of the above should have returned true, and the final one is taken directly from the help example that says it should return true.

I have marked your post as “bug”.

Pinging @armingrudd, in case you saw this already from our community hacking days, and @carstenhaubold I’m pinging you on this one as I know you’ve been involved with our questions about the roadmap for Expression node so you will be interested and may even know what (if anything) we’re doing wrong here. It seems amazing to me that if it really is a bug, I’ve not seen this mentioned before, and yet it simply isn’t giving the expected results. So what are we missing? best regards :wink:

8 Likes

Dear @cmora,

Thank you for reporting this problem. We created a ticket for it and will update you here as soon as it’s fixed. (AP-23909)

Dear @takbb, thank you so much for your investigation. It was really helpful and made everything much clearer and easier to track down the problem.

3 Likes