How to deal with unbalanced data: SMOTE vs. Equalizer ?

Vincenzo · September 29, 2017, 2:55am

Hi samer_aamar,

Would it be possible to give more details on the problem you are trying to solve? What is the classification problem that you are working on? Which kind of data are available in the dataset?

Most machine learning algorithms do not work very well with unbalanced datasets. That is why is better to identify a strategy to handle unbalanced datasets.

Moreover, when you want to evaluate the performance of the models in these cases, you may want to use the following metrics:

Precision/Specificity: how many selected instances are relevant.
Recall/Sensitivity: how many relevant instances are selected.
AUC: relation between true-positive rate and false positive rate.

The performance of machine learning algorithms is typically evaluated using predictive accuracy. Generally, in these cases, it is not appropriate to use accuracy.

In general, when use techniques such as SMOTE I would suggest you to first partition your dataset and then apply SMOTE only on the training set.

I would also suggest you to have a look at the following paper: https://www.jair.org/media/953/live-953-2037-jair.pdf.

Hope this is helpful,

Best,

Vincenzo