I want to create three models after cleaning my data. These 3 models are KNN, Decision Tree and Random Forest. However, only KNN requires normalisation while the other 2 doesn’t. Please advice how I should craft my modelling process such that I need not use two different table partitioners but also have normalisation work for knn only
Hi @Mntyho,
Welcome to the forum.
You can use a single partitioning step to split your data into train and test sets. Then, for KNN, apply normalization only on the training set and use the same normalization model to transform the test set (example). For Decision Tree and Random Forest, use the unnormalized train/test sets directly. Avoid applying normalization to the entire dataset, as that can lead to data leakage.
Best,
Keerthan
2 Likes