Topic Extractor in non-English languages

How well does Topic Extractor node perform in languages other than English?


Very related with my original post: topic extraction relies on lemmatization. As far as I understand, KNIME text processing only offers lemmatization for certaing languages: since lemmatization is offered via Stanford lemmatizer, which in turn depends on results from the Stanford POS tagger (English, French and German), I find no way of implementing this for Spanish. Any alternatives?



Hi Peleitor,

the Topic Extractor node should be language independent, lemmatization is not neccessarily required for that node. Simply apply it on a preprocessed set of documents and specify the number of topics you want to extract.

Cheers, Kilian


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.