Hi I need do do oversampling for an unbalanced classed data.
I created the workflow but I need to ask you a question.
SMOTE uses a distance based model.
1- Shall i transform my categoric variables as one-hot-encoded before SMOTE node or does it automatically complete this task?
2- How can I determine the #of Nearest Neighbor in SMOTE node?
- The implementation of the SMOTE node does only handle numerical columns at the moment. Probably, we will extend the functionality in a future release. That means, yes, you have to transform your categorical variables beforehand.
- There is a setting in the node dialog called “# Nearest neighbor”. Is this what you are asking for?
Let me know if you need further information.