Skip to article frontmatterSkip to article content

Standardization

How does one apply the standard scaler (also known as “standardization”)?

  1. First calculate the median and interquartile range of the predictor values. Then subtract off the median from the predictor values. Then divide by interquartile range.

  2. First calculate the mean and standard deviation of the predictor values. Then subtract off the mean from the predictor values. Then divide by standard deviation.

  3. First calculate the min and max of the predictor values. Then subtract off the min from the predictor values. Then divide by the range (i.e. max minus min).

Standardization using all available data

True or false: when using the standard scaler (also known as “standardization”), it is considered best practice to fit the mean and standard deviation only once, using all available data. This is true even if you will be making test/train splits later for the purposes of assessing performance.

Standardization with LDA

True or false: assuming you could use a computer with infinite numerical precision, the predictions made by LDA based on the raw features are the same as the predictions of LDA based on rescaled features.

Standardization in linear regression

True or false: assuming you could use a computer with infinite numerical precision, assuming you do not use any ridge or LASSO penalties, the coefficients β estimated by liner regression will be the same regardless of whether you preprocess your data by scaling.

Standardization with nonlinear featurization

Given a dataset ((X1,Y1),,(Xn,Yn))\left(\left(X_{1},Y_{1}\right),\ldots,\left(X_{n},Y_{n}\right)\right) with p=1p=1 predictor and one numerical response, consider the following procedure for estimating the regression function.

β^=argminβ(i=1n(Yiβ0β1eXi))f^(x;β^)=β0+β1cosx\begin{aligned} \hat{\beta} & =\arg\min_{\beta}\left(\sum_{i=1}^{n}(Y_{i}-\beta_{0}-\beta_{1}e^{X_{i}})\right)\\ \hat{f}(x;\hat{\beta}) & =\beta_{0}+\beta_{1}\cos x \end{aligned}

Will this estimator lead to different predictions if you rescale the data?

Rescaling with quartic coefficient penalties

Given a dataset ((X1,Y1),,(Xn,Yn))\left(\left(X_{1},Y_{1}\right),\ldots,\left(X_{n},Y_{n}\right)\right) with p=1p=1 predictor and one numerical response, consider the following procedure for estimating the regression function.

β^=argminβ(i=1n(Yiβ0β1Xi)2+β4)f^(x;β)=β0+β1x\begin{aligned} \hat{\beta} & =\arg\min_{\beta}\left(\sum_{i=1}^{n}(Y_{i}-\beta_{0}-\beta_{1}X_{i})^{2}+\beta^{4}\right)\\ \hat{f}(x;\beta) & =\beta_{0}+\beta_{1}x \end{aligned}

Will this estimator lead to different predictions if you rescale the data?