Hi everyone, could someone tell me how to remove emoji characters from a text in knime?
Hi @anun. Welcome to the forum!
Is it emojis in particular or is it all “special” characters that you are wanting to remove. Do you have a sample of text that you could upload in a file as an example?
I had a look around past forum items regarding unprintable characters and emoji. I also had a look at the various regex nodes and string replacement nodes to see if I could find a solution that way, but I didn’t. No doubt somebody else will comment if they know of a good solution.
In the meantime, I have created a workflow using the java snippet node (you would need to modify the java snippet to use your chosen input column name and output column name, but I have put comments in the code at those places). The workflow has 3 different (but similar) snippets using a variety of regex and methods that I found on StackOverflow, so see if any of these work for you. They may or may not each filter out enough/too much.
Hopefully it will be of assistance until somebody suggests a simpler solution. It’s possible that the other nodes I tried can be made to work, but that I was just getting the regex wrong.
Hi @takbb, thank you!
Yes it’s particularly emojis
Thank you for the detailed answer. You are a lifesaver!
I did some further experimenting with this, and you may be interested in the test workflow I have put on the hub. I make a “String Emoji Filter” component which is experimental but you are welcome to use. I’ll try to improve it over time. I use it in my test workflow, and the component can be configured to use your choice of the 3 methods that I showed earlier with the java snippets. As you will see from the outputs on my test, none of the filters are perfect, although the one I show as “Filter Type 2” gives the best overall result. I did find though an instance where the presence of a specific emoji can cause the loss of one or more characters when filtering. I don’t at the moment know why this occurs. However I think it is a “rare” case, rather than the norm, and overall for emoji removal is currently my best suggestion.
I hope this is of use.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.