Partial string matching with regex

Dear Knime Community

I’ve be struggling with partial string matching in Knime. The task is to match all records that follow a certain pattern: “[digit],[digit]”, basically a comma between one or two digit numbers without spaces. The issue is that the pattern occurs in the middle of a string or in the beginning, but it is always just a part of the string.

I tried using RegexMatcher in the StringManipulation node using following statement “[0-9]+(,[0-9]+)*”, then I also tried “MATCHES” in the Rule Engine node but without any success.

Does anybody have any ideas how to resolve this?.. or experience working with partial matching in Knime?

Thank you in advance!

Hi,
Try this pattern:
.*\b[0-9]+,[0-9]+\b.*
If there is no space between numbers and the rest of the string then remove the "\b"s.
This pattern matches the whole string that contains the numbers separated with a commo.
So if you use the regexMatcher function in String Manipulation node then the output would be as below:
65,87 sfsgsg => True
sdgsgsdg => False
sdfsdf34,345fdgsg => False (True if "\b"s are removed)
34334,3434fdgag => False (True if "\b"s are removed)
1,1sdf => False (True if "\b"s are removed)
,456fdhdsfhdfhg => False
25234sdfsdf => False
sdgsag44,34dsfgsdg => False (True if "\b"s are removed)
sdgsag 44,34 dsfgsdg => True

Best,
Armin

2 Likes

Hi Armin

Thank you for that - worked like a charm! Could you also help me to understand what is the function of “.*” in this expression?

Best wishes
Katya

I’m glad I could help you. :blush: Welcome to KNIME community. :wink:
The "." (dot) means any character and the "*" (asterisk) means zero or several repetition. So ".*" means anything (including nothing).
When you put".*" before and after your pattern then anything that contains the pattern will be a match.

Armin

P.S. You can use this link to check your regex patterns:
https://www.regextester.com/

1 Like