I have created text classification model. Initially, I used the Naive Bayes Learner. However, unfortunately, I can only include 20 prediction classes. I have 56. Does that rule that one out?
So, I turned to the Decision Tree Learner. It has a high predictive accuracy on the training set. However, within the unseen set, it is making a lot of predictions which really should be NULL. The unseen set contains a lot job titles not in the training set. I had hoped that the presence of previously seen words in the training set would allow for an accurate predicition. Is there a way for me to not classify a title if the score is low? I would prefer no classification, rather than an incorrect one. That said, the model is also predicting a lot of accurate classifications i the unseen set. Is there a method to threshold?
Thanks in advance.