I can't locate German language models in the taggers / tokenizers

Hi there,

trying a bit of text mining in German. However, it seems - contrary to what I have read in several forum posts - there are no “German” languages installed by default.

Neither in the Strings to Document node do I find German tokenization nor in the Stanford Tagger (although this post indicated it should be there: POS Tagger German - #2 by Kathrin)

See these screenshots:

KNIME preferences

Stanford Tagger (only “English”)
image

I am not sure if I remember this correctly (last German text mining I did is a while back) but do I have to install these languages somehow through the preferences?

Thank you in advance.

In Germany we have a proverb:

You have an advantage if you’re able to read.

The description of the Stanford Tagger Node describes it pretty clearly:

image

I installed the German language pack (KNIME Textprocessing German Language Pack – KNIME Community Hub) and everything works out just fine.

Sorry for this unnecessary post.

1 Like

What are you mining?
br

1 Like

Basically purchase order texts for some colleagues who want to find some patterns in what they order from vendors.

But the workflow is pretty basic, only calculating the TF and then filtering everything below a certain threshold.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.