Hello, I am just curious why this regex code is working in column expressions node and when I try to paste it in Regex 101 site (https://regex101.com/) it is not reading the data the KNIME is getting.
So I’ve tried to paste it in Regex 101 just to check and ensure if code is correct but I am not able to fix. Here’s the snapshot in Regex 101. Hope someone can share some tips as I just want to maximize the Regex 101 site for better understanding of other Regex codes. Thank you!
It’s because you’ve pasted in the keywords “.match” with associated brackets and and the [0] array subscript, none of which form part of the regular expression. The / at beginning and end can be included, provided that they are set as the delimiters on regex101, as they act like double-quoting a string in other languages.
The regular expression itself is just \d{3}([.])\d{3}([.])\d{3}
If you try that in regex101.com, you should find it works
Hi @trafalgarlaw
The .match(\ \) function is actually a javascript function which returns the results of the match as an array, rather than as a single value.
[0] represents the entire result, and array subscripts [1], [2] and so on represent the results returned for “capture groups” within the regex.
I don’t know if it was intentional or not but in your regex you have included parentheses “( )” around the [.] symbols, each of these is a “capture group”, the 1st capture group “captures” the character found between the first pair of parentheses, which is a [.] representing a single period/dot".". The 2nd capture group is what is in the second pair of parentheses which is also a single “.”.
So Column expressions code: column("Content_SplitResultList").match(/\d{3}([.])\d{3}([.])\d{3}/)[0]
will return the entire match 000.702.508 Column expressions code: column("Content_SplitResultList").match(/\d{3}([.])\d{3}([.])\d{3}/)[1]
will return the contents of the 1st capture group i.e “.” Column expressions code: column("Content_SplitResultList").match(/\d{3}([.])\d{3}([.])\d{3}/)[2]
will return the contents of the 2nd capture group i.e. “.” again
If your regex had instead included brackets around each \d{3} instead: Column expressions code: column("Content_SplitResultList").match(/(\d{3})[.](\d{3})[.](\d{3})/)[1]
it would have returned the first set of 3 digits, whilst using subscript [2] would have returned the second set of digits and [3] would have returned the third set of digits.