Skip to article frontmatterSkip to article content

Classification decision rules (aka hard classification)

Estimating metrics from data

A binary classifier y^\hat{y} was applied to test data and its predictions were compared with the true responses. The following was found.

  1. There were 20 cases where the classifier predicted + and the truth was +.

  2. There were 30 cases where the classifier predicted - and the truth was +.

  3. There were 10 cases where the classifier predicted + and the truth was -.

  4. There were 40 cases where the classifier predicted - and the truth was -.

Given these numbers, estimate P(Y=+y^(X)=+)\mathbb{P}\left(Y=+|\hat{y}(X)=+\right) and P(y^(X)=+Y=+)\mathbb{P}\left(\hat{y}(X)=+|Y=+\right)

A better classifier

We would like to design a method that guesses whether a plant is poison ivy, based on urushiol measurements. Given a measurement xx, we have a very accurate estimate of the probability that the plant is poison ivy, p(poisonx)p(\mathrm{poison}|x). Using this estimate we constructed a hard classifier of the form

y^(x)={poisonif p(poisonx)>.2poisonotherwise\hat y(x) = \begin{cases} \mathrm{poison} & \mathrm{if}\ p(\mathrm{poison}|x)>.2 \\ \mathrm{poison} & \mathrm{otherwise} \end{cases}

In tests, we found the classifier had a false positive rate of 30% (P(y^(X)=poisonYpoison)=0.3\mathbb{P}(\hat y(X)=\mathrm{poison}|Y\neq \mathrm{poison})=0.3) and a true positive rate of 70% (P(y^(X)=poisonY=poison)=0.7\mathbb{P}(\hat y(X)=\mathrm{poison}|Y=\mathrm{poison})=0.7). True or false: in typical situations, it will be very difficult to find a new hard classifier that achieves the same false positive rate (30%) while attaining a substantially higher true positive rate (say, 75%).

Decision boundaries

Let y^(x)\hat y(x) denote a hard classifier. Which is true about the decision boundary of this classifier?

  1. It is not defined; the idea of a “decision boundary” is only well-defined for estimates of the conditional probability.

  2. If xx is on the decision boundary, then there are two points x1,x2x_1,x_2 that are very close to xx such that y^(x1)y^(x2)\hat y(x_1)\neq \hat y(x_2).

  3. If xx is on the decision boundary, then there is some ϵ>0\epsilon>0 such that y^(x~)=y^(x)\hat y(\tilde x)=\hat y(x) for all x~\tilde x such that xx~<ϵ\Vert x -\tilde x\Vert <\epsilon.

ROC

Let y^(x)\hat y(x) denote a hard classifier. Given a test dataset, how should we plot an ROC curve for this classifier?

  1. It cannot be plotted from a hard classifier.

  2. Considering all possible thresholds tt, compute P(y^(X)=+Y=+)/t\mathbb{P}(\hat y(X)=+|Y=+)/t and P(y^(X)=+Y=)/t\mathbb{P}(\hat y(X)=+|Y=-)/t. Plot the values you obtain.

  3. Considering all possible thresholds, compute P(y^(X)=+Y=+)/t\mathbb{P}(\hat y(X)=+|Y=+)/t and P(Y=+y^(X)=+)/t\mathbb{P}(Y=+|\hat y(X)=+)/t. Plot the values you obtain.