This workflow compares the performances of three different setups for a classification model that is used to detect fraud in credit card data. Firstly, a classification model is trained on imbalanced data. Secondly, a classification model is trained on resampled, balanced data. Thirdly, a classification model is trained on resampled, balanced data, and the predicted class probabilities are adjusted according to the class distribution in the original data. The performance is evaluated in terms of cost reduction compared to not using any model.
This is a companion discussion topic for the original entry at https://kni.me/w/0ufkiBeS8F8x6bhW