Row Filter / Pattern

Hello,
I use ROW FILTER and the “Pattern Matching” function to keep lines on which I will apply “Math Formula.” With knime AI, I configured 12 pattern row filters but only 6 are retained. Why?


Thanks
Br

Hi @Brain, without seeing your data and the entire regex pattern it’s going to be difficult to say anything other than there’s probably something wrong with the pattern or that the data isn’t quite as you were expecting :wink:

My starting guess would be typos or a problem with hidden or special characters (e.g. tabs, new lines, multiple-spaces) not matching as you’d want.

Would you be able to upload examples of the data and the pattern, or better still a minimal sample workflow that demonstrates the problem you are having?

Hi,
Thanks for your answer.
If you have another idea, you are welcome.
PIVOT1 - Copie.xlsx (9.1 KB)
KNIME_project9.knwf (83.8 KB)

Br

Hi @Brain ,

There are some issues with your regex pattern and the data. This is your current regex pattern:

IB|Vitesse Max|Temps du travail (ssms)|meilleurs 800m|Meilleurs 200m|Vitesse Meilleurs 100m|Rec ef|FC 5mn|FC 15mn|FC Retour|V200 bpm

Brackets have special meaning in regex and therefore need to be “escaped” with \
The pattern would become this:

IB|Vitesse Max|Temps du travail \(ssms\)|meilleurs 800m|Meilleurs 200m|Vitesse Meilleurs 100m|Rec ef|FC 5mn|FC 15mn|FC Retour|V200 bpm

This would then allow it to find
Temps du travail (ssms)

Your data has spaces and some other characters that haven’t been included in the pattern

e.g. Vitesse meilleurs 100 m (km/h)

If you are wanting this to be found, then you could modify the pattern to include optional whitespace, and additional (km/h) or other wording by append .* to match any other characters:

IB|Vitesse Max|Temps du travail \(ssms\)|meilleurs 800m|Meilleurs 200m|Vitesse Meilleurs 100\s*m.*|Rec ef|FC 5mn|FC 15mn|FC Retour|V200 bpm

Are you wanting FC 15mn in the pattern to match both
FC 15mn
FC 15mn%

If so you could adjust your pattern to:
IB|Vitesse Max|Temps du travail \(ssms\)|meilleurs 800m|Meilleurs 200m|Vitesse Meilleurs 100\s*m.*|Rec ef|FC 5mn|FC 15mn.*|FC Retour|V200 bpm

You could do likewise for |FC 5mn| and make it |FC 5mn.*|

I haven’t looked for every issue, as I don’t know exactly what you want to match, and what you don’t but hopefully that will give you some pointers.

You might consider using Rule Based Row Filter instead of Row Filter, as although it is a little more long-winded to configure, it is much easier to maintain because you can put each filter on a separate line, and mix between regex using “MATCHES” and wildcards using “LIKE”, which can simplify things.

(nb. The downside of using the Rule Based Row Filter is that it doesn’t provide case-insensitive matching, but the workaround to that would be creating an uppercase version of the column being matched and then using that with the row filter and writing everything in upper case in the rules)

I’d also recommend using the Row Splitter variants of the above nodes instead of Row Filter, as this then clearly shows the rows that didn’t match, making it easier to check the results.

4 Likes

Ok Thanks
Row Filter doesnt find V200 bpm …

Hi @Brain,

a small space character is missing (see image).

Add it and the row will be filtered.

Have a nice day,
Raffaello Barri

3 Likes

Hi @Brain ,

please mark @takbb’s answer as solution. He did all the work, I just pointed out a small detail :slight_smile:

Have a nice evening,
Raffaello

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.