Replace ~8000 terms in forum DB extraction (NickName Anonimization)

So I have the following problem:

I have around 8000 nicknames I need to replace to random equivalents (basically a list of english and spanish names). It’s easy with the users table, but not so easy for the comments table.

Quotes are embedded as text in the message body, such as “Written by John in Dec 19: bla bla bla”. I need to search and replace all those nicknames.

Problem? The dictionary replacer seems to 1. Ignore wildcards 2. Replaces the whole string, so it wipes out the entire message, which is not what I’m looking for.

Is there any way to do this in Knime?

Hello @iagovar,

you could perform your task using loop and in each iteration replace one nickname with regexReplace() or only replace() function from String Manipulation node but that will probably take long considering number of your nicknames.

There is a feature request for Cell Replacer to have wildcards and regex functionalities so hopefully one day this will be easier to do with KNIME. (Internal Reference: AP-13269). I have given it +1 for you.

Br,
Ivan

2 Likes

Can you provide a dataset / sample?

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.