KNN - Some Data Science Questions

KNN with one neighbor¶

Which of these choices best describes the $K$ nearest neighbor prediction rule, with $K=1$ ? Why?

low bias, low variance
low bias, high variance
high bias, low variance
high bias, high variance

KNN and standardization¶

Say you have a dataset about houses, and you would like to use KNN with this dataset to predict the price of a house that was not in the dataset. However, before applying KNN, you recall that it is common practice to apply a standardization that involves two steps:

subtract the mean from each predictor feature (also known as centering) and
divide by the standard deviation of each predictor feature (also known in this context as scaling).

However, it would be possible to preprocess the data by only performing one of these steps. We will now consider how different preprocessing choices might affect your prediction. Let

$\hat{y}_{0}$ denote the KNN prediction if no preprocessing is used,
$\hat{y}_{c}$ denote the KNN prediction if a centering preprocessing step is used,
$\hat{y}_{s}$ denote KNN prediction if a scaling preprocessing step, and
$\hat{y}_{c+s}$ denote KNN prediction if centering and scaling preprocessing steps are used.

In typical cases, only one of the following is true. Which is it?

$\hat{y}_{0}\neq\hat{y}_{c}$
$\hat{y}_{0}\neq\hat{y}_{s}$
$\hat{y}_{s}\neq\hat{y}_{c+s}$

KNN by hand¶

Say you have this dataset.

\begin{array}{|c|c|c|c|} \hline \text{Employee} & \text{Years of Experience} & \text{Number of Projects} & \text{Satisfaction Level}\\ \hline 1 & 3 & 4 & 0.6\\ 2 & 4 & 5 & 0.8\\ 3 & 2 & 3 & 0.5\\ 4 & 5 & 6 & 0.9\\ 5 & 3 & 2 & 0.4\\ 6 & 4 & 4 & 0.7 \\\hline \end{array}

(1)

Predict the satisfaction level of an employee with 3 years of experience and 3 projects using KNN with $k=3$ .

KNN by hand, III¶

Say you have this dataset.

\begin{array}{|c|c|c|c|} \hline \text{Employee} & \text{Years of Experience} & \text{Number of Projects} & \text{Turnover}(0:\text{Stay},1:\text{Leave})\\ \hline 1 & 3 & 4 & 0\\ 2 & 4 & 5 & 1\\ 3 & 2 & 3 & 0\\ 4 & 5 & 6 & 1\\ 5 & 3 & 2 & 1\\ 6 & 4 & 4 & 1 \\\hline \end{array}

(2)

Predict whether an employee will leave or stay (0 for stay, 1 for leave) based on years of experience and the number of projects using KNN with $k=3$ .

Misclassification of 1NN with infinite data¶

As the number of data points approaches ∞, the misclassification error rate of a 1-nearest-neightbor classifier must approaches 0. True or false?

Curse of dimensionality¶

True or false: the curse of dimensionality (as applied to regression problems) states that one can only obtain good estimates for $\mathbb{E}[Y|X=x]$ when the number of predictor features is less than the number of training samples.

The quality of your estimator

Assessing predictive error

The quality of your estimator

LOESS