Problem with unbalanced data with examples attached

You could try these things.

Read this article about imbalanced data

Use the h2o auto-machine learning approach with knime wrapper

and choose AUCPR as sort metric
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/sort_metric.html

Also you could try and tell the algorithm to use balanced data. You might have to be careful with that and only balance your training data, not the validation data
https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/balance_classes.html

See if H2O comes up with a good cutoff point.

In addition you could try R vtreat and tell it which one is the positive class and see if this in combination with other measures is of any help.

I will see if I can put together an ‘unbalanced’ version of my H2O.ai automl wrapper.

2 Likes