Regex finds comma wenn searching for period

Hey there,

I wanted to use regexReplacer in the String manipulation node to remove periods in numbers, so for example: 1.000 → 1000

I used the following regex for this operation: regexReplace($Value$,“([0-9])\.([0-9])” ,“$1$2” )

Now, according to https://regex101.com/, this should yield the correct values, and it does. However, it also removes commas: 3,5 → 35

This should not be the case. I have attached the workflow. I work on Windows 11. My KNIME version should be up to date: KNIME 5.7.0.

RegexCommaBug.knwf (44.5 KB)

@KasimirNepomuk I cannot test it right now but you should try and double escape the dot.

1 Like

Hello, thank you, this seams to work. But why does it need to be doubly escaped and why does it only remove commas?

The last question I was able to answer myself: there are no other values except for the comma and the period that are between two numbers and not part of another group.

1 Like

Hi @KasimirNepomuk , the “double escaping” is generally needed in KNIME (and some other languages) when the regex is in a literal string within a script. This is because KNIME first has to interpret the string, which can also contain escapes. So in the first level of parsing, the single `\` is treated as an escape within the script, and this is before it is passed to the regex engine, so what gets passed to regex has had a single escape already stripped.

You can find a little more detail on this here

3 Likes

Hello The provided WF seems to be empty for me. You may not need a regexReplace(), you can test with standard replace() in a ‘String Manipulation’ node:

replace($Value$, ".", "")

Regarding double scape… some nodes require it aiming to translate from standard regex syntax to supported Java syntax like in ‘String Manipulation’ or ‘Expression’ nodes, then double scape is needed:

This syntax seems to literal for me and it has many fail cases:

regex_replace($["Value"], "^(\\d+)\\.(\\d+)", "$1$2") (‘Expression’ node)

regexReplace($["Value"], "^([0-9]+)\\.([0-9]+)", "$1$2") (‘String Manipulation’ node)

This expression would be valid as well, and it’s suitable if you have consecutive thousand groups:

regex_replace($["Value"], "(\\d+[,]?)[.]?", "$1") (‘Expression’ node)

regexReplace($["Value"], "(\\d+[,]?)[.]?", "$1") (‘String Manipulation’ node)

BR

1 Like