logistic regression model for new data

Hi,

Sorry for the basic question, I am very beginner. I created a logistic regression model (90%). I tried to use it for new data, but it dosen’t really work correctly. In the test data I used 70000 lines, and 4200 of them belonged to one of the groups (binary). The new data contains 380000 lines, and only 50 were categorised as the mentioned category.There should be appr. 20000…
I used the same format for the test data and the new data.
Many thanks in advance for your suggestions :slight_smile:

You might want to check whether your model is overfitted to the training data.
Also make sure the data is preprocessed the same way
br

3 Likes

You binary event appears to be fairly rare (6% in the training dataset)… It can be tough to model these events because the algorithm can essentially do nothing and still be correct 94% of the time, which can still be considered a good model (depending on the circumstances of course).

I think you may want to consider oversampling your data set. KNIME includes a node that can oversample called SMOTE: SMOTE — NodePit

And there are many articles on the concept if you just type “oversampling logistic regression” into your favorite search engine of choice.

4 Likes

Thank you so much! The results looks much better now. :slight_smile:

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.