Categorize text documents

Hi everybody,

so far I have been a quite basic user of KNIME, done some easy stuff and I know well about all the general usage of KNIME like building workflows etc. Now I wanted to dig a bit deeper and try out more of the incredible capabilities.

In fact I have a bunch of text files, that are all “categorized”, means they have different topics assigned, one topic for each document.

I now want to train a model with the existing files, and let KNIME then use the model to classify a bunch of untagged files.

I tried to find a good starting point to learn how to do it, but to be honest my research gave me so many results, different plugins and situations I was kinda lost what fits my problem.

Would you experts have any recommendation where to start from? This would be really highly appreciated!
(I don’t expect any solution, a link or tut would be enough for me to learn. Yet also if theres kinda blueprint or soemthing this still would be nice…)

Thanks a lot in advance!

Best wishes,
Jo

Hi Jo,

here’s two pointers to existing threads which should help you to get started:

Let us know how it goes!

Philipp

4 Likes

First of all, you will want to be familiar with the KNIME Textprocessing extension. We have recently released a series of short videos on our KNIMETV Youtube Channel to help familiarize folks with the extension:

You can also find several examples of workflows for document classification on the Hub. Here’s one such example:

4 Likes

Was a great series. Would love to see more on the KNIME channel

1 Like

Dear Philip, dear Scott,

this seems very helpful and helps me a lot to narrow down the points where to start from!
Thanks a lot, highly appreciate it!

3 Likes