SMOTE for unbalanced data

Hi I need do do oversampling for an unbalanced classed data.

I created the workflow but I need to ask you a question.

SMOTE uses a distance based model.

1- Shall i transform my categoric variables as one-hot-encoded before SMOTE node or does it automatically complete this task?

2- How can I determine the #of Nearest Neighbor in SMOTE node?

Hi @kilincali35,

  1. The implementation of the SMOTE node does only handle numerical columns at the moment. Probably, we will extend the functionality in a future release. That means, yes, you have to transform your categorical variables beforehand.
  2. There is a setting in the node dialog called “# Nearest neighbor”. Is this what you are asking for?

Let me know if you need further information.

Cheers,
Simon