Term frequency

anon33357744 · May 2, 2018, 10:48am

Hey KNIME Community,

I have to classify texts … In order to minimize the dimension of my vector space, terms were conceived … Now I have the problem that my texts contain spelling mistakes and for example the words ‘proces’ and ‘process’ are not considered to be the same word. How can I handle such problems? Where would I have to build the node in my workflow.

Thank you,
Canan

kilian.thiel · May 2, 2018, 1:55pm

Hi Canan,

there is no node that handles spelling mistakes. You can replace these words yourself e.g. using the Dict. Replacer node. But besides that there is no way to handle this. How many mistakes do you have? Is the a significantly large number?

Cheers, Kilian

anon33357744 · May 2, 2018, 2:20pm

Hi Kilian,

ok, thank you
I have 871 data sets and in each of the data set are misspellings.

Regards,
Canan

kilian.thiel · May 2, 2018, 4:25pm

If the number of miss-spelled terms is not significantly large I would simply ignore it. It takes a lot of effort to handle this by using dictionary based replacements.

Cheers, Kilian

anon33357744 · May 3, 2018, 10:40am

Ok thank you very much Kilian

system · May 10, 2018, 10:41am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.