Classification decision rules (aka hard classification)
Estimating metrics from data¶
A binary classifier was applied to test data and its predictions were compared with the true responses. The following was found.
There were 20 cases where the classifier predicted + and the truth was +.
There were 30 cases where the classifier predicted - and the truth was +.
There were 10 cases where the classifier predicted + and the truth was -.
There were 40 cases where the classifier predicted - and the truth was -.
Given these numbers, estimate and
Solutions
This question asks two things.
First, there were 30 cases where the prediction was positive. Among those 30 cases, there were 20 cases where the truth was positive. Thus we have reason to suppose that
Second, there were 50 cases where the truth was positive. Among those cases, there were 20 cases where the prediction was positive. Thus we have reason to suppose that
A better classifier¶
We would like to design a method that guesses whether a plant is poison ivy, based on urushiol measurements. Given a measurement , we have a very accurate estimate of the probability that the plant is poison ivy, . Using this estimate we constructed a hard classifier of the form
In tests, we found the classifier had a false positive rate of 30% () and a true positive rate of 70% (). True or false: in typical situations, it will be very difficult to find a new hard classifier that achieves the same false positive rate (30%) while attaining a substantially higher true positive rate (say, 75%).
Solutions
True. We built the hard classifier by thresholding on a high-quality estimate of , so there’s little reason to hope we can get something that has the same FPR yet higher TPR. (Here’s the same sentiment but with a bit more rigor: under mild conditions Neyman Pearson lemma tells us that if we in fact have a perfect estimate of then there will be no way to get a higher TPR will keeping the FPR the same). Of course, there might be a classifier that has slightly higher FPR (say, 31%) and much higher TPR (e.g., 100%).
In general, no one hard classifier can ever be said to be the “best.” It depends on what matters to you. Is higher TPR (good) worth the higher FPR that may come with it (bad)?
If you’re viewing things from a precision/recall point of view, a similar tradeoff appears. Is higher recall (good) worth the lower precision that might come with it (bad)? It just depends on what matters to you.
Decision boundaries¶
Let denote a hard classifier. Which is true about the decision boundary of this classifier?
It is not defined; the idea of a “decision boundary” is only well-defined for estimates of the conditional probability.
If is on the decision boundary, then there are two points that are very close to such that .
If is on the decision boundary, then there is some such that for all such that .
Solutions
Second option.
ROC¶
Let denote a hard classifier. Given a test dataset, how should we plot an ROC curve for this classifier?
It cannot be plotted from a hard classifier.
Considering all possible thresholds , compute and . Plot the values you obtain.
Considering all possible thresholds, compute and . Plot the values you obtain.
Solutions
First option takes it. To make an ROC curve, you need some sort of soft classifier (e.g., an estimate of the log odds).