SMOTE - more efficient way for oversampling

Hi, i have minor question - i am using SMOTE node to balance unbalnace data:

  • target class have to outcomes: 0 and 1
  • 0 occurs in ca. 99,9% of cases, 1 in 0,01%

data set is ca. 2M of rows

i am using smote in two alternative ways:
#1 - put node and configure it to “oversample minority class”
#2 - put rowsplitter splitting target class - oversample “1” by smote x1000 and then concatenate what have splitted before

solution #1 is much more time consuming but gives very similar outcomes - is solution #2 proper way to use smote?

You could try your luck with R’s ROSE library.

kn_example_rose_balanced.knwf (905.8 KB)

Hi @zarniak,

thanks for reporting this!

Both options are correct. So you are fine to use the faster option (#2, i.e. filter the minority class, oversample it with a fixed factor and then add it back to the majority class).

We identified the source of this performance difference and will improve the code to eliminate it in the future.

