Hello there. I am absolutely new to KNIME and i am pretty new to text processing in general.
So i know there is some difference between Stemming and Lemmatization of words, although im not sure if i totally understand it. I found the Snowball Stemmer node, but it does not work that great with the language my files are in (german). Thats why i wanted to try Lemmatization, but so far i didnt find any node for it and google doesnt help either.
Also, the POS Tagger node tags some of the verbs in the text as NN or NNS, does anyone know why that could happen?
Please keep in mind that i am new to all of this, sorry therefore.
Thanks in advance for any help.
stemming is the reduction of words to their word stem, e.g. swiws, swimming to swim. Lemmatization also replaces synonyms, e.g. automobile with car or vice versa. There are no lemmatizer nodes available in KNIME Text Processing. The Snowball Stemmer node uses the Snowball stemming lib (http://snowball.tartarus.org/), which provides stemmer for various languages. You need to select the language of the texts in the dialog of the node. To open the dialog of a node double click the node, or open the context menu (right click on the node) and click configure.
The POS tagger works only on English texts. For German texts you can use the Stanford Tagger. This node provides models for English, German, and French. In the dialog of the node you can specify the model (language) you want to use.