SMOTE for unbalanced data

nemad · March 11, 2018, 9:15am

Hello zizoo,

the issue in this case is probably the metric.
Accuracy often times suggests a higher generalization performance in case of unbalanced data therefore I would recommend to monitor precision and recall of the minority class as these metrics usually give you a better idea what your model is actually doing.

Concerning the SMOTE node I’d recommend to only oversample the minority class if your dataset is unbalanced because it otherwise won’t remedy the unbalance in your data.
Please note also to apply SMOTE only to your training data, your validation (or testing) data should correspond to the actual data distribution in order to obtain a valid estimation of the models generalization ability.

Cheers,

nemad