Regular expression to filter certain rows

Hi,
I’m learning how to use regular expression in Knime.
I don’t have any special nodes I just use ROW FILTER.
I need to select specially named MS excel files.
The selection criteria is like this [1-9].[0-9a-zA-Z_]Q[1-4]+.xlsx$
The data is as follows
10_1_result_2020_Q2_manual.xlsx
10_1_source_data_2020_Q2.xlsx
10_1_source_data_2020_Q2_manual.xlsx
10_2_result_2020_Q2.xlsx
10_2_source_data_2020_Q2.xlsx
1_1_result_2020_Q2.xlsx

I need the following to be filtered out :
10_1_result_2020_Q2_manual.xlsx
10_1_source_data_2020_Q2_manual.xlsx

So the result:
10_1_source_data_2020_Q2.xlsx
10_2_result_2020_Q2.xlsx
10_2_source_data_2020_Q2.xlsx
1_1_result_2020_Q2.xlsx

All that have “manual” in the title so don’t select.

I found an online page on the internet where I can create regular expressions.

Fore mentioned filter works on the site but unfortunately it doesn’t work in Knime.
Can you explain to me why?

Hi @MarekV

All that have “manual” in the title so don’t select.

In this case, I would personally stay away from the Regex and instead use a wildcard pattern.

If you exclude the matching pattern *manaul* with the checkbox for wildcard usage enabled, you get the desired result.

But since you are learning Regex, the reason for it failling is because the expression that you are using only captures a subsection.

image

While the Row Filter checks if entire string matches the expression, which is not the case. If you add a leading wildcard by the means of .* , the whole string will be captured and the KNIME row filter works subsequently.

Hope this helps!

3 Likes

Even with the regex I would go with the “manual” option :sweat_smile:
br

2 Likes

To @Daniel_Weikert 's point, the Regex is not actually filtering on “manual” - I’m not saying that it would not necessarily give you what you are looking for, but it would probably give you other results too as it would give you anything that satisfies the Regex, which is not filtering explicitly on “manual”.

1 Like

Hi
Thank you for explaining where I made a mistake .
When I look so the problem was “." and “.”
“.” this means whatever character
"
” means repetition

Of course it would be enough to filter “manual” the problem is that the infomation is in 510 directories .
The file name must have the same structure.
The files are generated audotmatically but undisciplined users can save copies or change the file to a non-standard name.
I can’t prevent this but I can only select files that have the same name structure.
Regular expressions is a powerful tool , which I want to use as much as possible in the future.
image

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.