Taggers vs. Tokenizers? How options are named in the POS Tagger node configuration window

Greetings! I’m a newbie and learning on my own and full immersion into KNIME. (So forgive me if I’m in the wrong place or not doing something right or…)

I noticed in the POS Tagger node configuration window that there is an option called “Word tokenizer”. Why isn’t this option called, say, a “Token tagger” or, just simply, a “POS tagger”?

If I have already converted a string-type to a document-type, the document-type has already been tokenized (but then may need some sort of tagging). Yes? In fact, the last option in the String to Document node is to choose a tokenizer and this option is labeled correctly. Then I could use the POS Tagger node to tag the tokens in each of the documents, but the option is labeled incorrectly (IMHO).

Am I missing something or is the POS Tagger node option of “Word tokenizer” confusing me.

Hi @rdarrenstanley,

the option is labelled correctly. The tagger nodes sometimes need to re-tokenize some words/tokens but it is rather irrelevant for the POS Tagger so you can just ignore it. The POS Tagger does not have an additional option related to the tagging itself as it simply uses the English model provided by OpenNLP library. Another part-of-speech tagger is the Stanford Tagger node which provides multiple models for POS tagging based on the StanfordNLP library.

Hope this helps.

Best regards,
Julian

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.