KNN
KNN with one neighbor¶
Which of these choices best describes the nearest neighbor prediction rule, with ? Why?
low bias, low variance
low bias, high variance
high bias, low variance
high bias, high variance
Solutions
(b)
KNN and standardization¶
Say you have a dataset about houses, and you would like to use KNN with this dataset to predict the price of a house that was not in the dataset. However, before applying KNN, you recall that it is common practice to apply a standardization that involves two steps:
subtract the mean from each predictor feature (also known as centering) and
divide by the standard deviation of each predictor feature (also known in this context as scaling).
However, it would be possible to preprocess the data by only performing one of these steps. We will now consider how different preprocessing choices might affect your prediction. Let
denote the KNN prediction if no preprocessing is used,
denote the KNN prediction if a centering preprocessing step is used,
denote KNN prediction if a scaling preprocessing step, and
denote KNN prediction if centering and scaling preprocessing steps are used.
In typical cases, only one of the following is true. Which is it?
Solutions
The second answer is correct. KNN is invariant to shifts in feature space. So centering does nothing. But scaling has a huge impact on predictions.
KNN by hand¶
Say you have this dataset.
Predict the satisfaction level of an employee with 3 years of experience and 3 projects using KNN with .
Solutions
The prediction is 0.5. Here are the distances.
Employee 1:
Employee 2:
Employee 3:
Employee 4:
Employee 5:
Employee 6:
So the three closest train samples are #1, #3, and #5. So the average satisfaction is (.
KNN by hand, III¶
Say you have this dataset.
Predict whether an employee will leave or stay (0 for stay, 1 for leave) based on years of experience and the number of projects using KNN with .
Solutions
The prediction is stay. As per above, the closest train samples are employees #1, #3, and #5. Two out of three of those employees stayed.
Misclassification of 1NN with infinite data¶
As the number of data points approaches ∞, the misclassification error rate of a 1-nearest-neightbor classifier must approaches 0. True or false?
Solutions
Nope. In most real-world scenarios, the 1NN estimator always has variance, even with infinite data.
Curse of dimensionality¶
True or false: the curse of dimensionality (as applied to regression problems) states that one can only obtain good estimates for when the number of predictor features is less than the number of training samples.
Solutions
False. The curse of dimensionality can’t be pinned down so neatly. The curse of dimensionality, in general terms, just says that prediction gets harder with more features. But there’s no hard-and-fast rule. It is often possible to get good answers from LASSO with more predictor features than training samples.