Penalizing misclasslification from imbalanced data

I am new to KNIME. Recently, I trained a classifier using SVM. My data is imbalanced and skewed toward one out of four labels. So, the classifier still seems to be misclassifying a lot of the sample as that skewed label. Is there a way to update SVM to penalize misclassifying?

Best regards,

1 Like

Hi there @acsmtl and welcome to the forum.

Your question about class imbalance is a common one that comes up from time to time. You might check the threads I’ve linked below:

Having said that, are you restricted to using SVM only or can you try other algorithms? You’ll see that SMOTE is discussed in the threads above, but some folks really advocate against it in favor of something XGBoost with class weights applied.

2 Likes

Thank you, Scott. I can also use XGBost Tree Ensemble Learner and Predictor as well. Could you please let me know how class weights can be applied? Thanks in advance.

The option you’re looking for in the XGBoost Tree Ensemble Learner is on the Booster tab - it’s called “Scale positive weights”. From the the node description:

Scale positive weight

Controls the balance of positive and negative weights, useful for unbalanced classes. A typical value to consider: sum(negative instances) / sum(positive instances).

Note that with this approach you have to be careful to set your positive class appropriately as described here:

Check here for another possible alternative using the H2O nodes:

3 Likes

Thanks again. Found it. Does Scale positive weight only work for binary classification? I am asking because both examples talk about binary classification. My data has 4 possible classifications.

:man_facepalming:

You explicitly mentioned multiple classes. My fault for not paying closer attention.

You could still apply SMOTE to deal with the problem though. Alternatively, you could do some manual oversampling or bootstrapping to artificially increase the counts of your minority classes.

1 Like

Great, thanks so much for all your suggestions and responses.

Hi @acsmtl & @ScottF

Very interesting thread with plenty of hints about the XGBoost node. Thanks to both !

As explained in the help of the XGBoost node and recalled by @ScottF

@acsmtl if you still want to use -Scale positive weight- when using XGBoost classification, you could convert your -4 classes- problem into a 4 -2 classes problem-, where you would train and use 4 different XGBoost models to classify every “Positive” class against the 3 others set as the complementary Negative class. In this case, you would need to set the -Scale Positive Weight- specifically for each of the XGBoost models. In the end, you could consider the P (class=Positive) returned by every model to calculate for every sample its final predicted class among your 4 classes.

Hope this helps

Best,

Ael

Interesting proposal. Do you already have a workflow for that? br

Not in KNIME yet. I’ll let you know if eventually I implement it as a KNIME workflow.

Best

Ael

Thanks a lot , really appreciate it!.
br and enjoy your weekend

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.