I’ve created a model for imbalanced data using SMOTE Node and learned it using XGBoost Tree Ensemble method, when I perform partition (CV) to see how my model predicts I get fairly good result, up to 99.998% accurate. However, when I create a separate workflow and predict there, using XGBoost predictor, I can’t get any proper classification even on the dataset that I used for learning, which makes no sense. Does anyone ever had any similar issues? Any help is much appreciated. I use Model writer and Model reader Nodes, data manipulation nodes are identical in both workflows.

XGBoost Predictor

mlauber71 May 22, 2020, 3:49pm 5

About unbalanced data you might want to consider this article and the hints from KNIME team members from previous threads especially concerning SMOTE.

Then I added another balancing attempt with R and ROSE algorithm. although I am a little bit wary about using it. You might want to consider maybe not balancing your dataset but bring the minority group to 10% or something and take a look at AUC and other metrics not just the scorer that would consider everything above 0.5 as success.

Another attempt you could make is use some H2O nodes which offer you some balancing settings:

Also you might see what H2O AutoML would do with your data and if it could come up with some solutions. It also allows for balancing although I have never tried it:
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/balance_classes.html

Imbalanced Data : How to handle Imbalanced Classification Problems

SMOTE Hints from KNIME Team members

Try ROSE algorithm

5 Likes

Is it possible to implement RACOG and ADASampling in Knime?

How to improve our Fraud prediction model created using Keras Neural Networks?

MEDICAL DATA