Standardization - Some Data Science Questions

How does one apply the standard scaler (also known as “standardization”)?¶

First calculate the median and interquartile range of the predictor values. Then subtract off the median from the predictor values. Then divide by interquartile range.
First calculate the mean and standard deviation of the predictor values. Then subtract off the mean from the predictor values. Then divide by standard deviation.
First calculate the min and max of the predictor values. Then subtract off the min from the predictor values. Then divide by the range (i.e. max minus min).

Standardization using all available data¶

True or false: when using the standard scaler (also known as “standardization”), it is considered best practice to fit the mean and standard deviation only once, using all available data. This is true even if you will be making test/train splits later for the purposes of assessing performance.

Standardization with LDA¶

True or false: assuming you could use a computer with infinite numerical precision, the predictions made by LDA based on the raw features are the same as the predictions of LDA based on rescaled features.

Standardization in linear regression¶

True or false: assuming you could use a computer with infinite numerical precision, assuming you do not use any ridge or LASSO penalties, the coefficients β estimated by liner regression will be the same regardless of whether you preprocess your data by scaling.

Standardization with nonlinear featurization¶

Given a dataset $\left(\left(X_{1},Y_{1}\right),\ldots,\left(X_{n},Y_{n}\right)\right)$ with $p=1$ predictor and one numerical response, consider the following procedure for estimating the regression function.

\begin{aligned} \hat{\beta} & =\arg\min_{\beta}\left(\sum_{i=1}^{n}(Y_{i}-\beta_{0}-\beta_{1}e^{X_{i}})\right)\\ \hat{f}(x;\hat{\beta}) & =\beta_{0}+\beta_{1}\cos x \end{aligned}

(1)

Will this estimator lead to different predictions if you rescale the data?

Rescaling with quartic coefficient penalties¶

\begin{aligned} \hat{\beta} & =\arg\min_{\beta}\left(\sum_{i=1}^{n}(Y_{i}-\beta_{0}-\beta_{1}X_{i})^{2}+\beta^{4}\right)\\ \hat{f}(x;\beta) & =\beta_{0}+\beta_{1}x \end{aligned}

(2)

Will this estimator lead to different predictions if you rescale the data?

Forests and trees

Forests

Featurization, standardization, and the magic of splines

Featurization