Statistics, data distribution, outliers and data normalization

This exercise goes through statistics, data distribution, outilers and different scales. Steps: read excel file boston house prices, available in folder data/ rid off the first index column Check the data distribution of the prices (column MEDV) Are there any outliers? Check the values of the distribution for all columns and check if there are any missing values. Learn about the basic statistics indicators Remove rows containing outliers of MEDV and check it (to avoid having any outlier the IQR should be set up 1) Use the box plot to check the outliers Normalize your data by using z-score check the box plots and statistics again (resulted standard deviation should be 1 for all columns)


This is a companion discussion topic for the original entry at https://kni.me/w/eUR6RQT5Gm8wFV_M