How to replace or remove the \ character?

I have some data with the following pattern:
firstname \“fifa\” familyname”

now I like to replace the \" characters and everything between them.
That I get the following data pattern:
“firstname familyname”

I tried it with the String Manipulation node ( replace($composers$, “\ * \” , " ") ) and with the String Replacer node ( Regex [\.*?\] ). Even if these regular expressions are wrong, I still get error messages because the \ character in the KNIME nodes does not work like all other characters. It overrides the functions of the other characters.

What can I do?

Even here in the forum, the character is not implemented correctly. I hope the text is still readable regarding my question.
I mean the backslash. I have to write it twice so that it appears here

Hi @Thoralf , you are right that the forum software gives special meaning to some characters, and if pasting certain text (especially if you want it to appear exactly as written) you need to highlight it and use the “preformatted text” button on the forum toolbar:
image.

This also prevents it doing unwanted conversions of "double quotes" into “smart quotes”.

In the same way, regex, along with some other scripting languages, gives special meaning to the \ character so if you want it to be treated as an actual \ and not say as an “escape” character, it is common to have to double-up so you are “escaping” it to give it its literal meaning.

If you have some text:
"firstname \"fifa\" familyname"

and you want to use String Manipulation to remove all substrings "…", you could use the following:

regexReplace($column1$,"(\\\\\".*?\\\\\")","")

I’ll try to explain the presence of two sets of 5 backslashes… :wink:

In order that String Manipulation does not treat the enclosed " characters as string terminators, these need to be escaped by placing a \ in front of each.

Additionally, string manipulation treats \ characters as “escape” characters (such as we just used to escape the double-quote, so in order that it not treat these as “escape characters” they themselves need to be escaped, so each \ needs to become two \ and this allows the final pattern that arrives at regex to be:

(\\".*?\\")

which is what we actually want. This is then ok for regex because it also needs to handle escape charaters, and those escaped backslashes are then treated as a single literal \ by regex.

It can be “fun” but here is my basic rule of thumb:

Write the regex pattern that you think you will require, and then add an extra \ in front of each \ in the regex, and add an additional \ in front of every " (double-quote) contained within the regex pattern (but don’t try to escape the actual string terminators required by String Manipulation!). You should generally then end up with what you need for String Manipulation.


tip: When you run the above code, you may find that you have double-spaces in the middle of your resultant string. If this is undesirable, you could wrap the whole regexReplace(...) function with a call to removeDuplicates(...) which will remove all duplicate spaces.
e.g.

5 Likes

Hey @takbb have many thanks for your detailed explanations! I understood it and have learned now. Based on your rule of thumb, I tried the regex “(\\”.*?\\“)” and it works. Good rule of thumb! So no 5 backslashs are necessary at all. Although it works with that too.

Hi @Thoralf , glad to hear it works for you and thanks for marking the solution.

I wasn’t quite sure how many backslashes you said also worked, as I suspect the forum software may have removed some from your last post :wink:

I couldn’t use an even number of backslashes in my regex expression as String Manipulation wouldn’t accept it

I would recommend testing your regex to see that it gives the desired outputs for all of the below, since whilst it may appear to work, you might find it actually doesn’t for specific data:

column1
firstname \"fifa\" familyname
firstname "fifa" familyname
firstname \"fifa" familyname
firstname \"fifa\" family \"something\" name
firstname \fifa\ familyname

Testing with 1, 3 and 5 backslashes, I found that 5 backslashes is the only one that correctly identifies and removes the "…" fully-delimited text in all cases:

Hey @takbb, you are right. The 5 backslashes work best in the String Manipulation node. I use the String Replacer node and it also works with only two backslashes:

This takes me from the data row “firstname \”fifa\“ familyname” to the needed data row “firstname familyname”

But the regex with only 2 backslashes does not work in the String Manipulation node. There must be at least 3 backslashes, better 5.

1 Like

Ah yes. Sorry @Thoralf, I misunderstood what you were saying. Yes the rule of thumb is just for String Manipulation (and possibly other scripting nodes).

In String Replacer the rules are different as you found, as you aren’t trying to include it within a scripted string, so you are only having to follow regex rules for escaping.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.