Replacing tagged content

Hello Guys,

I just started to work with knime and I like so much.

I have a workflow with the objective of filter countries from documents.

Then, I have parsed the documents and after that created tags from a dictionary tagger (with a list of countries).

However, terms like Iran/Iranian counts as specific frequencies. I would like to count these kinds of concurrences as one instances of one country. I am sending the picture of the workflow.

I tried almost all “replacer nodes” but they did not work, I realize that is because the tagger nodes.

Thanks for the attention   

Rodrigo.

Hi Rodrigo,

that is a typical problem in text mining. You need to normalize the words before you count. Stemming is an option (or lemmatization but we don't have a lemmatizer node). But in your case the words Iran and Iranian would not be stemmed to the same stem since the first is a noun and the second an adjective. To replace these words using a dictionary you can use the Dictionary Replacer (2 inports) node. As second input the nodes takes a dictionary with search and replacer words. Use the replacer node before the stemmer. Have you tried that?

Cheers, Kilian

 

Hi kilian,

Thank you for the answer.

I did what you say and this worked, but there was a little and important detail. The use of case converter node before the replacement. The dictionary is case sensitive.

Cheers,

Rodrigo

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.