#machinelearning

Machine learning

Machine learning algorithms do not perform well when the input numerical attributes have very different numerical scales.

There are two common ways to get all attributes to have the same scale: min-max scaling and standardization

Min max scaling

Min max scaling is sometimes called normalization. We move all values to be between 1 and 0

Using sklearn

from sklearn.preprocessing import MinMaxScaler

min_max_scaler = MinMaxScaler(feature_range=(-1, 1))
housing_num_min_max_scaled = min_max_scaler.fit_transform(housing_num)

Standardization

First we subtract the mean and divide the result by the standard deviation. Unlike min max it does not restrict to a range of values but it is more sensible to the outliers.

from sklearn.preprocessing import StandardScaler

std_scaler = StandardScaler()
housing_num_std_scaled = std_scaler.fit_transform(housing_num)