I have a string column and I am trying to replace some words from that string column.
example:
String column
“ @_CCarter: Cent line 6pm-7pm. Only time one can irritate me and get away with it†wow
I want to remove corrupting characters like “ , †etc.
Output for this would be: @_CCarter: Cent line 6pm-7pm. Only time one can irritate me and get away with it wow
I have a file wherein I have a list of all such characters and replacements for it.
I am trying to use Cell Replacer but I think that this node is trying to compare the entire sentence and hence not working. String Replacer works but the list is long so I can’t have hundreds of string replacers. I tried String Replace (Dictionary) but that isn’t working properly.
You can use this expression in a “String Manipulation” node: regexReplace($column1$, "[^a-zA-Z0-9~!@#$%^&*{}<>()\\[\\]\\-_.+=?:,\\\\/\";| ]", "")
I included all letters, numbers and special characters in the list so anything other than what is in the list will be removed.
Best,
Armin
P.S. If I’m missing any characters you can add them by yourself to the list.
I assumed that there are so many different strings and he wants to remove those none standard characters.
By using String Replace (Dictionary) he has to create a value and keys for each different string.
I think using that expression in a String Manipulation node is much easier.
Or maybe I’m missing something here.
HI Armin
Thanks a lot. It seemed to work. It cleaned the column. I added a few things like white spaces etc but barring that it worked nicely. Thanks a lot.
Ace.
HI Ivan
Thanks for your comment. Although Armin’s solution solved the problem which I was trying to solve, the question around the node Cell Replacer remains. I have attached the flow and the files. Have a look. Maybe I am making some mistake or maybe there is a wider problem.
Thanks
Ace Dirtyfile_test.xlsx (15.8 KB) Html_cleaner.xlsx (10.7 KB) cellreplacer_problem.knwf (8.4 KB)
For “String Replace (Dictionary)” node, you need a text file as a dictionary.
The format of the input in the text file should be like this:
Value1, Key1, key2, key3, …
Value2, key2, key2, key3, …
Each line of the text file can have a set of value and the keys.
The node checks each cell to find a match in keys (the whole string should match a key), if it finds one, the node will replace it with the value corresponding to the key. If there are duplicate keys with different values, then the last one will be applied.
HI Armin
Thanks for your message and extremely sorry for delay in response.
As per your comment, I changed the order of the columns, labeled them accordingly. But the same problem persists. I think the issue is that the node can not look inside the string but looks at the entire sentence as a string in which case the condition fails. I think so but feel free to disagree or advise alternative.
Thanks Ace
@ace2131 and @alex_never sry for delay on this one. Regarding String Replace (Dictionary) node way of working. Seems the node is working same/similar as Cell Replacer node so both of them are not to be used in this case… Anyways I will check if String Replace (Dictionary) node should work this way.