Replacing tagged content

Hello Guys,

I just started to work with knime and I like so much.

I have a workflow with the objective of filter countries from documents.

Then, I have parsed the documents and after that created tags from a dictionary tagger (with a list of countries).

However, terms like Iran/Iranian counts as specific frequencies. I would like to count these kinds of concurrences as one instances of one country. I am sending the picture of the workflow.

I tried almost all “replacer nodes” but they did not work, I realize that is because the tagger nodes.

Thanks for the attention   

Rodrigo.

Hi Rodrigo,

that is a typical problem in text mining. You need to normalize the words before you count. Stemming is an option (or lemmatization but we don't have a lemmatizer node). But in your case the words Iran and Iranian would not be stemmed to the same stem since the first is a noun and the second an adjective. To replace these words using a dictionary you can use the Dictionary Replacer (2 inports) node. As second input the nodes takes a dictionary with search and replacer words. Use the replacer node before the stemmer. Have you tried that?

Cheers, Kilian

 

Hi kilian,

Thank you for the answer.

I did what you say and this worked, but there was a little and important detail. The use of case converter node before the replacement. The dictionary is case sensitive.

Cheers,

Rodrigo