Hi @jorgemartcaam , I put something together that seems to work.
Final results:
Workflow looks like this:
Explanation:
First, I strip any leading and trailing spaces, this is a bit important because I am going to deal with regex and also hex values:
I then removed the Emojis via a Regex. This is where the main magic happens:
I looked at a few documentations and suggestions from other people. Mainly I looked at:
https://unicode.org/Public/emoji/13.1/emoji-sequences.txt
And here, there’s a person who understood that there are characters, unicode scalars and glyphs:
And here, someone who tried to handle all the unicodes of the Emojis:
In the end, I went with:
strip(regexReplace(column("NAME"), "[^\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s]", ""))
which I got from:
Each of the expressions is explained there.
However, even after applying the above expression, there were a few cases left where a special character would still be left.
As always, I use hex values to be able to see what was that character as I could not see it in ASCII. After converting and comparing the results with your Emoji Free column, the extra characters that would appear were “fe0f”, which variations of “20fe0f” or “20fe0f20”, or “200dfe0f20”, etc…
hex “20” is basically a space, and hex “0d” is basically a CR (Carriage Return).
So, I basically removed the “fe0f” first, as I could use the strip() function after that to strip space and CR/NL (Carriage Return / New Line).
There you have it.
Here’s the workflow:
Remove Emojis from string.knwf (26.0 KB)
Note: We could get rid of the conversion to hex if we could remove hex"fe0f" directly in the Column Expressions. It is too bad that the Column Expressions does not offer functions to convert from and to hex string.