Hi Peter,
I definitely suggest to also try the Palladian Text Classifier nodes. They are (at least) a strong baseline, and in comparison to the rather sophisticated and heavyweight Text Processing nodes from KNIME super-simple to set up (two nodes: one learner, one predictor) and fast. The preprocessing can be configured for different n-gram settings and uses an optimized NB scoring algorithm.
More details here:
We (and our customers) are using this classifier for a wide variety of text classification tasks (e.g. sentiment analysis, product classification, language identification, …). It’s for sure not the right tool if you want to win today’s Kaggle challenges where you optimize for a per mille accuracy, but definitely a pragmatic tool for real-world use cases.
In case of questions regarding the classifier – let me know
– Philipp
PS: I’ve built a workflow to train a simple language detection model a while ago. It’s still available here:
https://www.knime.com/book/text-classifier