How to classify a highly unbalanced dataset? KNIME exist node for "Tomek links" under sampling?

Meihong · April 5, 2021, 9:13am

Hi,

I meet a challenge to classify a highly unbalanced dataset, where False type data are only round 1% of total dataset. I have tried SMOTE over-sampling but it didn’t solve the problem.
Now I would like to try under-sampling with Tomek links, and wondering if exit Node in KNIME that I could choose to do Tomek links under-sampling?
And for this kind of highly unbalanced dataset, are there any other methods to increase the classification accuracy?

Thanks in advance for your helpful answers!

BR
Mei

iperez · April 5, 2021, 1:12pm

Hi @Meihong. I’ve used the SMOTE node. It works well but heavily increases the computation time.

Daniel_Weikert · April 6, 2021, 5:49pm

What does your workflow look like? Build in python or all in KNIME?

mlauber71 · April 6, 2021, 5:54pm

@Meihong you could take a look at this debate

Meihong · April 6, 2021, 9:51pm

It build in all with KNIME node.

The purpose of this task is to find the key feature which cause the fail.

Meihong · April 6, 2021, 9:55pm

Is that because the settings that I made for SMOTE are not suitable? This is what I used in SMOTE now

Meihong · April 6, 2021, 10:28pm

Is that Tomek link doesn’t have node in KNIME? We need to use Python to build it?

iperez · April 7, 2021, 12:17am

No those are reasonable values for the parameters. The issue must be the signal is weak as explained by @mlauber71 in the thread

ipazin · April 7, 2021, 1:22pm

Hello @Meihong,

To my knowledge there is no dedicated node in KNIME to perform Tomek Links method to do undersampling. Python is probably a way to go.

Adding link to KNIME Python integration guide in case you don’t have it configured and not aware of it:
https://docs.knime.com/latest/python_installation_guide/index.html

And link to new blog post with steps on how to configure it fast and easy:

Br,
Ivan

system · October 7, 2021, 1:23am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.