Row filter regex matching seems not working correctly

Hi all
it seems that using Row Filter node with regex rule \W*((?i)NO MAIL(?-i))\W* returns empty resultset
Some hints?
Thanks
image

Hello @morelator,

suggest to share idea behind regex, dataset for testing and KNIME version…

Br,
Ivan

2 Likes

Hi
unfortunatelly it’s impossible to share the whole content, but the string in which I’m looking for the correspondence is like following:

Corpo permanente vigili del fuoco NO MAIL

image

The version of Knime is the latest 5.3.3

@morelator instead of a screenshot a text would be nice. And you maybe should check the logic behind your regex. the big W represents numbers ChatGPT tells me and in your text there are no numbers. So maybe you can tell us what you would like to match.

2 Likes

Hello @morelator,

still not clear and agree with @mlauber71.

However if you are just looking for NO MAIL in a cell you can use Rule-based Row Filter with following syntax:
$yourColumnName$ LIKE “NO MAIL” => TRUE

Or if it’s case sensitive either add another rule to above mention node just lowercase or use Row Filter like this (configuration window is from 4.7.7 but you’ll figure it out how to apply it on Row Filter from your version):
RowFilterConfig

Br,
Ivan

2 Likes

Hi @morelator, the row filter will be trying to match the regex to the entire cell value that is being inspected.

So unless you begin and end your regex with .* (to match any number of any characters at the beginning and end of your data set, your regex will only match cells containing, the words NO MAIL and containing no other letters.

For example this would return rows, but still won’t necessarily be what you want
.*\W*((?i)NO MAIL(?-i))\W*.*

this will also return rows such as:

Column
NO MAIL
SomeNo Mail or something
no mail is wanted
UNO MAILED

because your \W is matching “zero or more” non-word characters, making it fairly redundant.

I suspect you are actually wanting to return where NO MAIL appears between word boundaries.

In that case maybe you want this instead:

.*\b((?i)NO MAIL(?-i))\b.*

This would match only rows:

Column
NO MAIL
no mail is wanted

out of the following dataset:

Column
XYZ
NO MAIL
SomeNo Mail or something
no mail is wanted
UNO MAILED

Agreeing with the previous comments, it makes it much more difficult for people to help when you only tell us that the regex doesn’t work, but don’t tell us what you are actually expecting it to do. If, as it transpires, your regex is incorrect, it is only by luck that we might infer your intention. Much better and quicker if you state exactly your intention, and help us to help you.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.