String manipulation node can't remove whitespaces

Hi all, I’ve downloaded a keyword list from Google Search Console, which comes with a column that contains the CTR of each column.

Im trying to convert some columns to number, but I have some problems. The number comes in this format:

2,59 %

removerchars() function does nothing. If I remover the % symbol first, and then try to remove the white space, its just there, I can’t do anything about it. The only thing that I didn’t try is the regex functions, but no matter if I replace the white space by “” or null or whatever, it just does nothing.

It drives me crazy to be honest. Why is it happening? I could modify the CSV in excel but the whole point of using Knime is automating it…

Update: It works if I do removeChars("2,59 %") but not with removeChars($CTR$) ¿? Also, the column is classified as string, so it should work, isn’t it?

2 Likes

Try: removeChars(string($CTR$))

Still not working :confused:

You could use a String Replacer node with the following settings:
Pattern: [.]{0,}([0-9\,]{1,}).{1,}
And replacement text: $1
Select all occurrences

1 Like

Still not working with regex :confused:

This is the file if you want to play with it: https://1drv.ms/u/s!AhrE8WJJ4etSyAa_aRnQA8YoMSU0

Guess I’m gonna have to use Excel…

Looking at the file: The space between the numbers and the % sign are not regular spaces, but protected aka. non-breaking whitespaces. Using a regular expression \s should cover protected whitespaces as well.

1 Like

That’s interesting, how do you know it? Anyway tried regexReplace($CTR$,"\s","") and I get no result.

1 Like

I have updated the pattern for the String Replacer node, please also make sure that you have selected regular expression for pattern type.

2 Likes

Your pattern works, thank you :slight_smile:

Anyway, it would be interesting to know how to remove this non-breaking spaces, because using \s in the string replacer node or regexReplace() function does not work. Maybe someone reads this and find it useful.

Ok, if someone reads this thread having a similar problem with non-break whitespace, it’s unicode character is \u00A0, so you can use it in the STRING REPLACER node, using Regular Expression as pattern type.

It would be specially useful for people whose language uses commas as number separator, for example Spanish.

@davekalpak and @qqilihq thank your for your help :slight_smile:

7 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.