I am having a real problem with unbalanced data. Model ends with good scores (i.e. “accuracy”), but essentially all the the prediction models do is “put all chips on black” and the ones that are actually white are just “wrong classified”.
I tried 3 different predictor models and SMOTE and Row Sampling but no luck?!?
Anyone have any ideas or suggestions for very unbalanced data?
Thank you in advance!
I will attach workflow and database in next posts.
Data needs some cleaning and data preparation (different spellings for same thing)
Why are you converting the number fields to string?
Explanation of features? Hard to tell why you choose some and not others. Unclear manual feature selection
SMOTE is crap, don’t use it
Try xgboost with scale_pos_weight
Some data simply doesn’t have a good signal which makes a lot of sense here. Fire simply has a very high random aspect to it. Still, depending on what the models goal is, a poor model can still be of some help (reduced risk).