Text Processing Workflow

SteveO · June 4, 2019, 11:39am

My apologies if this is in the wrong category, please feel free to move it if so

I have been doing Udemy courses on Knime, and watched the webinar on text mining - but the screen on the webinar video is too small to see what’s being done.

Anyway, I wonder if anyone can point me in the direction of a good workflow for the following:

I have 10,000 phrases, some are very similar to each other, some very different. We have three levels of categories that the phrases fit into. We have manually categorised 1,000 of the phrases and plan to do more. This would allow (I assume) a training and test set using just that 1,000 phrases.

From what I understand, I can go from an excel file reader where it will show the phrase, the level 1 cat, level 2 subcat, and level 3 subcat - partition that and use perhaps a decision tree learner and predictor? It’s just that I can’t see how to have a hierarchy of categories in there - and the phrases are more dependent on themselves than other data on how they should be classified (i.e. character count, occurrences of certain words, etc…)

Any pointers would be much appreciated.

Thank you kindly,

Steve

ScottF · June 4, 2019, 2:13pm

Hi @SteveO and welcome to the forum!

Here is a relatively simple workflow at our Workflow Hub that might get you started. There are several other workflows on the Hub to perform this type of analysis - some of them implement word vectors, some use neural networks… there are many different approaches.

You might also check out a very good book on text analysis we have available in KNIME Press called From Words to Wisdom, which includes several workflows to reinforce NLP concepts.

SteveO · June 4, 2019, 2:22pm

Oh brilliant, thank you so much, that’s really helpful!

May I ask, if I get particularly stuck beyond my understanding - is there a way to hire someone for consulting?

ScottF · June 4, 2019, 2:25pm

We have trusted consulting partners available that we can point you to, depending on your location. A short list is here: https://www.knime.com/knime-trusted-partners

If you decide that’s something you need, please reach out and we can get you connected.

SteveO · June 4, 2019, 2:27pm

Great stuff, thanks!

system · December 4, 2019, 2:38am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.