Sorry for the basic question, I am very beginner. I created a logistic regression model (90%). I tried to use it for new data, but it dosen’t really work correctly. In the test data I used 70000 lines, and 4200 of them belonged to one of the groups (binary). The new data contains 380000 lines, and only 50 were categorised as the mentioned category.There should be appr. 20000…
I used the same format for the test data and the new data.
Many thanks in advance for your suggestions
You binary event appears to be fairly rare (6% in the training dataset)… It can be tough to model these events because the algorithm can essentially do nothing and still be correct 94% of the time, which can still be considered a good model (depending on the circumstances of course).
I think you may want to consider oversampling your data set. KNIME includes a node that can oversample called SMOTE: SMOTE — NodePit
And there are many articles on the concept if you just type “oversampling logistic regression” into your favorite search engine of choice.