Difference in results for Regexmatcher vs python Script

Hi,
I was trying to find dates of format “03/03/2023” in a table and I used the below regex match as in the screenshot.

However, this regex match gives me false results. ( screenshot below)

But, when I use the same regex under a Python script - I get the right results. ( screenshot below for the code and the result)


Can someone help me understand why this difference?

@Jyotendra

2 Likes

I have follow-up question on a similar line…
Python or Regexer identifies “[0-9]+.[0-9]{2}” as the right Regex identifier for finding values like “180.34” but KNIME also identifies below as the ‘True’. I cannot figure out the reason as it is a very simple REGEX.

A regexMatcher function is boolean based. So it returns true or false depending on if a match was found for the column you selected. The regex that you have however has a meta escape in it so it’s basically matching everything that has at least 4 numbers in it.

image

Change it to [0-9]+[.][0-9]{2} if you only want to capture the format with a period.

image

5 Likes

Hi @Jyotendra , just adding to @ArjenEx’s explanation, you need to remember that the “.” in regex means “any single character” unless it is escaped or, as in the explanation already given entered in square brackets.

So your regex “[0-9]+.[0-9]{2}” can be read as:
“match a sequence of 1 or more digits followed by any character followed by 2 digits”.
And if you read it that way you can see that all of your examples match as follows (I have spaced out to highlight possible components parts which allow the match to be true):
225 . 00
21 : 46
315921851 3 08
9 2 11

the regex [0-9]+\.[0-9]{2} would also work (although as noted earlier, when entered in nodes such as string manipulation, it would have to be “double escaped” as "[0-9]+\\.[0-9]{2}"

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.