Optimization Options

Corey · August 29, 2019, 3:03pm

Hi @montecarlo,
I’ll just had a few KNIME specific comments here as well, perhaps it will help you start your search!

Some common options for dealing with unbalanced data like this, as the article @HansS linked suggests, included over-sampling your minority class, under-sampling your majority class, or adjusting a classification threshold of your model.

1) You may try oversampling your minority class with the SMOTE node, this generates new artificial data points instead of just re-sampling.
https://kni.me/n/YznaX_v45d3OzPEV

2) Another thing you may want to try out is this new component that we’ve recently released, the Classification Threshold Analysis. You can use it with any model that outputs a probability as well as a classification, such as the random forest.
What it will do is generate different statistics for your model based on a changing classification threshold. For example you may want to classify something as your minority class even if the model only gives 5% probability, this is another way of dealing with unbalanced data without changing your sampling.
https://hub.knime.com/knime/space/Examples/00_Components/Guided%20Analytics/Classification%20Threshold%20Analysis

3) And finally to address the question on optimizing for metrics other than accuracy. You won’t be able to adjust that kind of thing in a learner node but the Parameter Optimization Loop nodes can be set to optimize hyper parameters by any metric you choose, such as precision, see this example on the hub!
https://kni.me/w/dXpLM-QbwblHGXQe

Best of luck!