Standardization
How does one apply the standard scaler (also known as “standardization”)?¶
First calculate the median and interquartile range of the predictor values. Then subtract off the median from the predictor values. Then divide by interquartile range.
First calculate the mean and standard deviation of the predictor values. Then subtract off the mean from the predictor values. Then divide by standard deviation.
First calculate the min and max of the predictor values. Then subtract off the min from the predictor values. Then divide by the range (i.e. max minus min).
Solutions
The standard scaler uses the second method. But, as an aside, the other
two are actually fine too. The second one is called “RobustScaler” in
sklearn and the third one is called “MinMaxScaler.” The interested
student might look up
https://
Standardization using all available data¶
True or false: when using the standard scaler (also known as “standardization”), it is considered best practice to fit the mean and standard deviation only once, using all available data. This is true even if you will be making test/train splits later for the purposes of assessing performance.
Solutions
False. In most circles, it is considered best practice is to include the standardization process as part of the estimator and fit it only on training data if you are making test/train splits. That’s why sklearn has this “pipeline” architecture. Doing so means that every time you fit the estimator on some subset of the data (in the context of a test/train split or cross-validation) you end up refitting the mean and standard deviation.
Standardization with LDA¶
True or false: assuming you could use a computer with infinite numerical precision, the predictions made by LDA based on the raw features are the same as the predictions of LDA based on rescaled features.
Solutions
True! LDA is invariant to rescaling. Same idea goes with QDA, logistic regression (assuming you don’t use any ridge or LASSO penalties) and linear regression (assuming you don’t use any penalties). By contrast, if you use KNN or any sort of ridge/LASSO penalties, your predictions may indeed change with rescaling. Note: the condition about infinite numerical precision is necessary because in practice scaling can help with round-off errors (even for LDA, QDA, etc).
Standardization in linear regression¶
True or false: assuming you could use a computer with infinite numerical precision, assuming you do not use any ridge or LASSO penalties, the coefficients β estimated by liner regression will be the same regardless of whether you preprocess your data by scaling.
Solutions
False! Rescaling will change the estimated coefficients. But it won’t change the predictions. Here’s an example of how this works, in the sample case with one raw feature.
Say that you train linear regression on the unscaled data and you get coefficients . Then, given a new test query you’ll predict .
If you preprocess the data by multiplying the predictors by α, assuming infinite numerical precision, you’ll get coefficients . Then, given a new test query , you’ll obtain the transformed version and apply the rule .
Thus for linear regression without penalties, the predictions won’t be affected by rescaling but the coefficients will.
Standardization with nonlinear featurization¶
Given a dataset with predictor and one numerical response, consider the following procedure for estimating the regression function.
Will this estimator lead to different predictions if you rescale the data?
Solutions
Yes. There are some featurizations, such as this one, where a least square approach will be affected by scaling.
Rescaling with quartic coefficient penalties¶
Given a dataset with predictor and one numerical response, consider the following procedure for estimating the regression function.
Will this estimator lead to different predictions if you rescale the data?
Solutions
Yes. This estimator is much like a ridge regression estimator, except with instead of . As with ridge, this estimation procedure will be affected by scaling.