My apologies if this is in the wrong category, please feel free to move it if so
I have been doing Udemy courses on Knime, and watched the webinar on text mining - but the screen on the webinar video is too small to see what’s being done.
Anyway, I wonder if anyone can point me in the direction of a good workflow for the following:
I have 10,000 phrases, some are very similar to each other, some very different. We have three levels of categories that the phrases fit into. We have manually categorised 1,000 of the phrases and plan to do more. This would allow (I assume) a training and test set using just that 1,000 phrases.
From what I understand, I can go from an excel file reader where it will show the phrase, the level 1 cat, level 2 subcat, and level 3 subcat - partition that and use perhaps a decision tree learner and predictor? It’s just that I can’t see how to have a hierarchy of categories in there - and the phrases are more dependent on themselves than other data on how they should be classified (i.e. character count, occurrences of certain words, etc…)
Here is a relatively simple workflow at our Workflow Hub that might get you started. There are several other workflows on the Hub to perform this type of analysis - some of them implement word vectors, some use neural networks… there are many different approaches.
You might also check out a very good book on text analysis we have available in KNIME Press called From Words to Wisdom, which includes several workflows to reinforce NLP concepts.