Text Mining for Dutch Language

Hi Knimers,

I was just wandering if there are possibilties (yet) for Mining Dutch texts?

Thanks,

Kees

Hi Kees,

which use case are you looking at specifically? A number of nodes in the Text Processing group can work with Dutch already (e.g. the Snowball Stemmer node or the Hyphenator node) or are language agnostic. Other nodes are indeed English specific. Anything you are missing in particular?

Cheers,
Marco.

Hi Marco,

Most important I think would be the Enrichment possibilities. Part of speech, named entity. With most other things you can somehow find your way. But these two I think are essential.

Thanks,

Kees

Hi Kees,

Unless you can get it done with a generic node, like the Dictionary Tagger, I am afraid that implementing a specific Dutch support for those functionalities would require some custom coding (aka creating some sort of Custom Tagger node). Is that a possibility?

Cheers,
Marco.

Hi Marco,

You mean having someone build it for me. That is not an option I am afraid.

Thanks,

Kees

Hi Kees,

just an idea, but maybe one of the many good Dutch IT students can work on this as part of his/her Bachelor or Master thesis.

Cheers,
Marco.

The snowball Stemmer supports the Dutch language. Dutch stopword lists can be found on internet and added to the node. I think these are the most important elements. I haven't found a Dutch POS tagger and free sentiment dictionaries untill now.

Thanks for your suggestions. 

Kees