My apologies if this is in the wrong category, please feel free to move it if so
I have been doing Udemy courses on Knime, and watched the webinar on text mining - but the screen on the webinar video is too small to see what’s being done.
Anyway, I wonder if anyone can point me in the direction of a good workflow for the following:
I have 10,000 phrases, some are very similar to each other, some very different. We have three levels of categories that the phrases fit into. We have manually categorised 1,000 of the phrases and plan to do more. This would allow (I assume) a training and test set using just that 1,000 phrases.
From what I understand, I can go from an excel file reader where it will show the phrase, the level 1 cat, level 2 subcat, and level 3 subcat - partition that and use perhaps a decision tree learner and predictor? It’s just that I can’t see how to have a hierarchy of categories in there - and the phrases are more dependent on themselves than other data on how they should be classified (i.e. character count, occurrences of certain words, etc…)
Any pointers would be much appreciated.
Thank you kindly,
Steve