How to use normalization in a data science solution

Hub · July 18, 2020, 7:33pm

This is how to apply normalization correctly in a data science problem. The normalization model is built on the training set and only applied on the test set! The same normalization model is used to denormalize the numerical attributes back into their original ranges.

This is a companion discussion topic for the original entry at https://kni.me/w/l9AcqvFQbXp3AfNS

evert.homan_scilifelab.se · November 14, 2023, 8:05pm

Have never seen this before, but tried it on my own data. When I apply the training normalization to the test set I get:

WARN Normalizer (Apply) 3:43 Normalized value is out of bounds. Original value: 4.0 Transformed value: 1.3333333333333333 Upper Bound: 1.0

This leads me to believe that if you want to do normalization it should be done on the entire data set (i.e. train and test set together) before splitting.

BW/Evert