Upweight the minority class in the dataset

Hi!

I would like to build a classification model to predict a disease. However, I have a very unbalanced dataset. That’s why I am applying some techniques in order to balance the dataset and improve my predictive model. One of them I would like to apply is to upweight the minority class, but NOT duplicate my existing cases or using the SMOTE node in order to generate new synthetic cases. I would like to give different weights into my rows depending on the class column (two groups). For that, I splitted my data into two different groups and I added a column called “Weight” which defines the weight I want to give to each group (two possibles values). I would like to end up having a dataset with all the rows and columns to create my model (except the “weight” column).

image

I would like to use my “weight” column not as a variable for my model. But as an input to specify how much this row is significant to predict my results. Is possible to that in some way???

@helfortuny

You might want to look at the H20 Gradient Boosted Machine Learner (and other H20 nodes). These have an option to weight the samples according to a weighting column. I’ve done something similar to what you are suggesting previously. There may be other nodes that allow you to weight samples, though a quick look at some of the other available nodes didn’t turn up nodes with a weighting option.

The option to add a weight is in advanced settings second from last option.

DiaAzul
LinkedIn | Medium | GitHub

2 Likes

The Learner nodes in the KNIME XGBoost Integration also support weighting of rows as well. :slight_smile: This was a relatively recent addition in version 4.6:

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.