Classification decision rules (aka hard classification)

Estimating metrics from data¶

A binary classifier $\hat{y}$ was applied to test data and its predictions were compared with the true responses. The following was found.

There were 20 cases where the classifier predicted + and the truth was +.
There were 30 cases where the classifier predicted - and the truth was +.
There were 10 cases where the classifier predicted + and the truth was -.
There were 40 cases where the classifier predicted - and the truth was -.

Given these numbers, estimate $\mathbb{P}\left(Y=+|\hat{y}(X)=+\right)$ and $\mathbb{P}\left(\hat{y}(X)=+|Y=+\right)$

A better classifier¶

We would like to design a method that guesses whether a plant is poison ivy, based on urushiol measurements. Given a measurement $x$ , we have a very accurate estimate of the probability that the plant is poison ivy, $p(\mathrm{poison}|x)$ . Using this estimate we constructed a hard classifier of the form

\hat y(x) = \begin{cases} \mathrm{poison} & \mathrm{if}\ p(\mathrm{poison}|x)>.2 \\ \mathrm{poison} & \mathrm{otherwise} \end{cases}

(3)

In tests, we found the classifier had a false positive rate of 30% ( $\mathbb{P}(\hat y(X)=\mathrm{poison}|Y\neq \mathrm{poison})=0.3$ ) and a true positive rate of 70% ( $\mathbb{P}(\hat y(X)=\mathrm{poison}|Y=\mathrm{poison})=0.7$ ). True or false: in typical situations, it will be very difficult to find a new hard classifier that achieves the same false positive rate (30%) while attaining a substantially higher true positive rate (say, 75%).

Decision boundaries¶

Let $\hat y(x)$ denote a hard classifier. Which is true about the decision boundary of this classifier?

It is not defined; the idea of a “decision boundary” is only well-defined for estimates of the conditional probability.
If $x$ is on the decision boundary, then there are two points $x_1,x_2$ that are very close to $x$ such that $\hat y(x_1)\neq \hat y(x_2)$ .
If $x$ is on the decision boundary, then there is some $\epsilon>0$ such that $\hat y(\tilde x)=\hat y(x)$ for all $\tilde x$ such that $\Vert x -\tilde x\Vert <\epsilon$ .

ROC¶

Let $\hat y(x)$ denote a hard classifier. Given a test dataset, how should we plot an ROC curve for this classifier?

It cannot be plotted from a hard classifier.
Considering all possible thresholds $t$ , compute $\mathbb{P}(\hat y(X)=+|Y=+)/t$ and $\mathbb{P}(\hat y(X)=+|Y=-)/t$ . Plot the values you obtain.
Considering all possible thresholds, compute $\mathbb{P}(\hat y(X)=+|Y=+)/t$ and $\mathbb{P}(Y=+|\hat y(X)=+)/t$ . Plot the values you obtain.