Upweight the minority class in the dataset


I would like to build a classification model to predict a disease. However, I have a very unbalanced dataset. That’s why I am applying some techniques in order to balance the dataset and improve my predictive model. One of them I would like to apply is to upweight the minority class, but NOT duplicate my existing cases or using the SMOTE node in order to generate new synthetic cases. I would like to give different weights into my rows depending on the class column (two groups). For that, I splitted my data into two different groups and I added a column called “Weight” which defines the weight I want to give to each group (two possibles values). I would like to end up having a dataset with all the rows and columns to create my model (except the “weight” column).


I would like to use my “weight” column not as a variable for my model. But as an input to specify how much this row is significant to predict my results. Is possible to that in some way???


You might want to look at the H20 Gradient Boosted Machine Learner (and other H20 nodes). These have an option to weight the samples according to a weighting column. I’ve done something similar to what you are suggesting previously. There may be other nodes that allow you to weight samples, though a quick look at some of the other available nodes didn’t turn up nodes with a weighting option.

The option to add a weight is in advanced settings second from last option.

LinkedIn | Medium | GitHub


The Learner nodes in the KNIME XGBoost Integration also support weighting of rows as well. :slight_smile: This was a relatively recent addition in version 4.6:


This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.