Normalization
Machine learning algorithms incrementally update the model parameters by minimizing the error between the real value and the one predicted with the last iteration's parameters. To measure this prediction error we introduce the concept of loss functions. A loss function is a measure of the prediction error. For a certain algorithm, using different loss functions will create variants of the algorithm. Most common loss functions use the L2 or the L1 norm to measure the error:
● L2 norm:
● L1 norm:
Where yi and ŷ are the real and predicted values of the samples.
The measure of the prediction error can end up being skewed when the different predictors differ by an order of magnitude. The large predictors obfuscate the importance of the smaller valued ones, thus making it difficult to infer the relative importance of each predictor in the model. This impacts how the respective weights of the linear model converge to their optimal value and as a consequence the performance of the algorithm. Predictors with the highest magnitude will end up dominating the model even if the predictor has little predictive power with regard to the real outcome value. Normalizing the data is a way to mitigate that problem by forcing the predictors to all be on the same scale.
There are two common types of normalization; data can be normalized or standardized:
- The min-max normalization, or normalization, which sets all values between [0,1]:
- The z-score normalization, or standardization, which normalizes with respect to the standard deviation. All predictors will have a mean of 0 and a standard deviation of 1:
The tree-based methods (decision trees, random forests, boosted trees) are the only machine learning models whose performance is not improved by normalization or standardization. All other distance/variance-based predictive algorithms may benefit from normalization. It has been shown that standardization is particularly useful for SGD, as it ensures that all the weights will be adapted at the same speed.
Efficient BackProp Yann A. LeCun et al. in Neural Networks: Tricks of the Trade pp. 9-48, Springer Verlag
Amazon ML offers z-score standardization as part of the available data transformations.