SMOTE for unbalanced data

kilincali35 · December 9, 2018, 2:56pm

Hi I need do do oversampling for an unbalanced classed data.

I created the workflow but I need to ask you a question.

SMOTE uses a distance based model.

1- Shall i transform my categoric variables as one-hot-encoded before SMOTE node or does it automatically complete this task?

2- How can I determine the #of Nearest Neighbor in SMOTE node?

SimonS · December 10, 2018, 5:32pm

The implementation of the SMOTE node does only handle numerical columns at the moment. Probably, we will extend the functionality in a future release. That means, yes, you have to transform your categorical variables beforehand.
There is a setting in the node dialog called “# Nearest neighbor”. Is this what you are asking for?

Let me know if you need further information.

Cheers,
Simon