I am trying to classify text (ranging from 1 word to a few sentences) into ~8 different classifications. I am using the textclassifierlearner and textclassifierpredictor nodes. The classification isn't very accurate even though my learner has ~1300 examples of classified text. Do you have any tips to increase my accuracy or any suggestions for another node that would work better?
In the following, I assume that you're talking about the Palladian text classifier.
- Have you played with different feature settings?
- Have you tried different preprocessing options?
- Have you tried different scorers?
- Are the classes equally balanced?
- How do you assess accuracy? Do you look at the result, or do you use a test set?
- Have you checked about frequent confusions? (understanding, which categories are often confused, helps you in optimizing your settings)
In general, despite its simplicity, the PTC works very well, compared to much more complicated setups. But if your goal is to optimize for few percent/per mille, the PTC probably not the right tool for you.
Can you go in detail, what kind of data you want to classify?
Can we use Palladian Text Classifier when we have to 5 classes? Like we want to predict whether a sentence is assertive, directive, commissive, declarative,expressive.