Model is not working on same kind of new data

text-processing
#1

Hi
I’m working on text classification. I have successfully trained my model and I’m getting about 91% accuracy. However, when I’m running that model on the same kind of new data, the model isn’t working and gives only one constant prediction.
I’m uploading my workflow, please have a look at it and tell me where I’m going wrong.
Any help would be appreciated.
https://mega.nz/#!HypV0a4A!-FTyMPbnrndkbG0SeVql0FctS3k_aCeSw-iSqV8Wo-I

0 Likes

#2

@armingrudd @ipazin @kilian.thiel, please look at the mentioned problem.

0 Likes

#3

Hi @MajidAbbasi

I took al look at your wf. And for me it looks ok. I tested your model (added a validation partition) and I think it’s not overfitting. But when you look at the column names from your Document Vector and I found out that your model is build on 1290 features. But only 956 are present in the file you apply your model on (334 are not). In the file you apply your model on there are 1630 "new/unknow"columns/features. I’m not sure but that may be a cause of the the poor result when you apply your model.
text.

So both dataset are not that similair.
Hope this helps (a little…),
gr. Hans

2 Likes

#4

Thank you so much @HansS.
But what should I do to match both the data? Like, I have trained my model on a text and I am processing the new data in the same way as I have processed the training data. Where could I be going wrong?

0 Likes