I try to slove the problem by not using external data, because I want to analyze different articles with very different content. Therefore it would b e the more efficient way to let KNIME replace the words by itself.
you could solve this using a density based clustering.
Therefore use the String Distances (i used the Levenshtein distance, with weight 0 on insertion) and afterwards the DBSCAN node gave me the following results:
music
Cluster_0
musik
Cluster_0
porn
Cluster_1
porno
Cluster_1
pornographisch
Cluster_1
After identifiying the Clusters you would need to set them to one of the names. E.g. by taking one of the values in the cluster.
that sounds like the perfect solution I was searching for. The only problem I have is that I have no fucking clue how to build the proper workflow because I never worked with the DBSCAN. Is it possible to attach your workflow?