How to deal with unicode / special / invisible characters?

Hi @mwiegand

Looking up this specific character, it falls into a class of unicode characters termed “Formatting”. I was recently investigating how to strip emoji and other characters from strings, and in the process discovered some quite useful info.

In regex, there are a set of regex patterns describing unicode “classes” or “categories”. Once you know the class you wish to find or filter, you can then use these patterns.

The pattern for this particular Formatting class is \p{Cf} , so if you replace your regex “SUCCESS” node to look for that class instead, you will find SUCCESS! :wink:

I had a go at putting together a component for filtering based on these classes, with some success. It can be found here, and is useful if the aim is simply to remove characters of particular classes.

(If the config screen for the component only initially has a small number of classes listed, try executing the component to “refresh” it. I probably need to fix something, but not quite sure where.)

Some information about the different unicode categories, such as the “formatting” category can be found here:

8 Likes