Clensing Text Using Regex

armingrudd · April 10, 2020, 5:48am

The regex pattern suggested by @AlexanderFillbrunn works perfectly fine on your example data set.

You have to use it like this in the String Manipulation node:

regexReplace($Content$, "#.*\\}\\);", "")

But please be careful: Although this approach solves your issue here, I do not recommend it since many other issues are likely to arise. For example, if your text has a # character before the undesired section of the text, then everything starting from the first # will be removed. If your text has any }); character sequence after the undesired section, everything to that point will be removed. Using this particular pattern, if your text has 2 or more sections with the undesired string, then everything starting from the first one to the end of the last one will be removed.

I suggest you follow a better approach to get the content from the source if possible.

Another approach could be checking for alphabetic density in the text and remove sections where it becomes sparse.

However, regarding your current data set, you are good to go for now.