How to filter for specific language?

Hello. In my KNIME workflow i load up some data taken from the internet. Most of it is German language, but it seems that there are some Russian and English texts between the data aswell.

 

Is there any node that lets me seperate the German words from other languages in order to filter out the other languages? The final result should be a Tag Cloud, but only with German words.

 

Thanks in advance.

Hi,

there is no such node that detects language or filters words based on their language. Do you have documents with mixed language or do they contain words of only one language? If so, you can try to count the number of German stop words in these documents. If the number is high the document is likely to be in German.

Cheers, Kilian

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.