How to filter for tags created by Stanford tagger node?

myusername · February 12, 2015, 3:26pm

Hello there. The title basically asks the question: I am having a workflow where at some point i have to use the Stanford tagger to assign part of speech tags to my texts. My texts are in german language. The Stanford tagger node is using the "German fast" tagger model.

Now my problem is, that i want to filter the tagged words for nouns and verbs only for example.

In a previous setup for english language i used the POS tagger node and later applied the POS filter node. As this way does not work for german texts, i think i have to use the Standford tagger node, but is there something like the POS filter node for Stanford tags aswell?

I am fairly new to text mining and this field, so sorry if the answer to this question is commonly known already. Google didn't get me any answers.

Also one more question: I’m running knime on a 64 bit Windows with 16 gb RAM of which knime has 10 gb available through the xmx setting in the knime.ini
Still it does always freeze for me when I use some of the preprocessing nodes, like punctuation filter or bag of words. I am running a data set of about 98 thousand forum posts through them. Shouldn’t that be possible without any problem?

Thanks in advance.

myusername · February 16, 2015, 11:10am

I just found the General Tag Filter node. It seems that this one offers to filter for STTS tags, although i havent tried it yet. Is this node the way to go?

kilian.thiel · February 18, 2015, 12:13pm

Hi,

the Standford tagger node allows for the tagging of English, German, and French texts. For each language there exists an extra tag set, e.g. for English the Penntreebank tag set, for German the STTS tag set. For each language a different Stanford POS tagging model is used, which uses the proper tag set.

To filter German POS tags use the STTS Filter, for English tags the POS Filter, and for French tags the French Treebank Filter node. You can also use the General Tag Filter node which can filter based on all available tag sets.

Cheers, Kilian

system · June 2, 2023, 9:49pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.