newline character does not work when using column values for replacement

Hi experts,

I try to use a configuration table for regex string parsing.

I use a column expression and use column.match for condition and regexReplace for string manipulation.

It works fine for matching, finding, and replacing the specific parts in the string.

However, when I try to insert a newline into replace string, it seems that the newline character (\r\n) cannot be recognized by system and will only return “rn” as two escape characters.

Is there anything I forgot to change?

Thanks for your help! :slight_smile:
regexReplaceTest.knwf (15.0 KB)

Hi @WadeLo -

Did you try escaping your backslashes? As in

\\r\\n

Hi @ScottF,

I tried but couldn’t make it work

Raffaello

Hi @WadeLo , @lelloba and @ScottF

Wow, this is a weird one…!

All the more so because using regexReplace in both Column Expressions and String Manipulations is what I use as my go-to method of putting newlines into text!

When done as a literal (i.e. typed directly in the node config) , the following works fine:

regexReplace(column("text"),"(.*)\\s(.*)","$1\r\n$2")

I don’t normally bother with the carriage return, just the line feed, but I know some apps like the “windows” CR included.
regexReplace(column("text"),"(.*)\\s(.*)","$1\n$2")

But try as I might, I couldn’t coerce it to this by populating the table:

image

image

In the end, I decided that the only way I could find to handle it was to wrap it in an additional regexReplace:

if(regexMatcher(column("text"),column("matchRegex")))
regexReplace(
    regexReplace(column("text"), column("findRegex"),column("replaceRegex") )
    ,"\\\\r\\\\n","\r\n")

with the original regex table as this:
image

The result was this:

You’ll note the “inconsistency” of the double escaping required here. Not easy to predict exactly what escapes will be required!


If all those backslashes are a bit much, an alternative is to hack it slightly and have the RegexReplace string in the table replacing with different characters, e.g. ¬¬

image

and then use a wrapper regexReplace to replace "¬¬" with "\r\n" like this:

if(regexMatcher(column("text"),column("matchRegex")))
regexReplace(
    regexReplace(column("text"), column("findRegex"),column("replaceRegex") )
    ,"¬¬","\r\n")

btw, in the uploaded workflow, the original Column Expression code uses the match() function but that is replaced here by regexMatcher(). The match() function isn’t listed as an available function, so I think regexMatcher() should be the correct function to use… But actually the condition isn’t required here, since regexReplace only performs a replacement if the expression matches, so the following would work equally well:

regexReplace(
    regexReplace(column("text"), column("findRegex"),column("replaceRegex") )
    ,"\\\\r\\\\n","\r\n")

or (from the alternative above)

regexReplace(
    regexReplace(column("text"), column("findRegex"),column("replaceRegex") )
    ,"¬¬","\r\n")

I’d say that somewhere in the translation of the column value into the function, some additional string handling is occurring (aka getting in the way!). It is certainly counter-intuitive.

My theory… based on the way \r and \n require only single backslashes when typed in the Column Expressions node, whereas other regex items require a second escape is that KNIME handles the typed entry of \r and \n as physical carriage return and linefeed as if they were being entered directly as those control characters. So when typed (or at least when the literal is parsed) in the Column Expressions (or String Manipulation) node, they are treated there and then as if they are actually CR and LF, whereas when they are pulled in from a data feed, they are not interpreted in the same way, and are just seen as the literal characters of backslash and r and n. So on that basis I think there is probably no way to pull those characters in from the data. The reason why the wrapping regexReplace works, is because it is converting a text string of literal back slashes and r and n into a typed (and therefore interpreted/parsed ) CR and LF. Just a theory…

2 Likes

Hi @ScottF @lelloba @takbb ,

Thanks for your supports!

I use a workaround to insert a newline symbol( e.g. vbCrLf) and and a Replace function outside the regexReplace the symbol into \r\n and it works.

Just curious about the compile and translate mechanism in KNIME nodes. :smile:

Thanks & best regards,
Wade
regexReplaceTestUpdate.knwf (15.2 KB)

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.