Regex 101 Query in KNIME

trafalgarlaw · September 5, 2023, 7:20am

Hello, I am just curious why this regex code is working in column expressions node and when I try to paste it in Regex 101 site (https://regex101.com/) it is not reading the data the KNIME is getting.

Column expressions code: column(“Content_SplitResultList”).match(/\d{3}([.])\d{3}([.])\d{3}/)[0]

Data: Nº. 000.702.508

Result: It is getting the ‘000.702.508’

So I’ve tried to paste it in Regex 101 just to check and ensure if code is correct but I am not able to fix. Here’s the snapshot in Regex 101. Hope someone can share some tips as I just want to maximize the Regex 101 site for better understanding of other Regex codes. Thank you!

takbb · September 5, 2023, 7:37am

Hi @trafalgarlaw ,

It’s because you’ve pasted in the keywords “.match” with associated brackets and and the [0] array subscript, none of which form part of the regular expression. The / at beginning and end can be included, provided that they are set as the delimiters on regex101, as they act like double-quoting a string in other languages.

The regular expression itself is just
\d{3}([.])\d{3}([.])\d{3}

If you try that in regex101.com, you should find it works

trafalgarlaw · September 5, 2023, 8:53am

Another curiosity question, would like to know what’s this [0] is for? Or what’s its purpose why is it in the code? @takbb

Yeah, thank you it worked now in Regex 101, amazing!!!

takbb · September 5, 2023, 9:52am

Hi @trafalgarlaw
The .match(\ \) function is actually a javascript function which returns the results of the match as an array, rather than as a single value.

[0] represents the entire result, and array subscripts [1], [2] and so on represent the results returned for “capture groups” within the regex.

I don’t know if it was intentional or not but in your regex you have included parentheses “( )” around the [.] symbols, each of these is a “capture group”, the 1st capture group “captures” the character found between the first pair of parentheses, which is a [.] representing a single period/dot".". The 2nd capture group is what is in the second pair of parentheses which is also a single “.”.

So
Column expressions code: column("Content_SplitResultList").match(/\d{3}([.])\d{3}([.])\d{3}/)[0]
will return the entire match 000.702.508
Column expressions code: column("Content_SplitResultList").match(/\d{3}([.])\d{3}([.])\d{3}/)[1]
will return the contents of the 1st capture group i.e “.”
Column expressions code: column("Content_SplitResultList").match(/\d{3}([.])\d{3}([.])\d{3}/)[2]
will return the contents of the 2nd capture group i.e. “.” again

If your regex had instead included brackets around each \d{3} instead:
Column expressions code: column("Content_SplitResultList").match(/(\d{3})[.](\d{3})[.](\d{3})/)[1]
it would have returned the first set of 3 digits, whilst using subscript [2] would have returned the second set of digits and [3] would have returned the third set of digits.

trafalgarlaw · September 5, 2023, 11:16am

Wow! Love the samples and explanation. Really understand the purpose and its function perfectly. Thank you so much @takbb!! The best!

system · September 12, 2023, 11:16am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.