Skip to content
Classification
Lesson 8 ⏱ 14 min

ROC curves and AUC

Video coming soon

ROC Curves - Measuring a Classifier Across Every Threshold

Why threshold-dependent metrics miss the big picture, building an ROC curve step-by-step from sorted predictions, the AUC as a probability interpretation, and when to use PR curves instead.

⏱ ~7 min

🧮

Quick refresher

Precision, recall, and the confusion matrix

Precision = TP/(TP+FP): how reliable positive predictions are. Recall = TP/(TP+FN): what fraction of real positives are caught. Both depend on the threshold used to convert model scores into binary predictions.

Example

A model outputs probability 0.7 for an example.

With threshold 0.5, this is predicted positive.

With threshold 0.8, it's predicted negative.

Changing the threshold changes TP, FP, TN, FN, and therefore changes precision and recall.

The Threshold Problem

A trained logistic regression or neural network outputs a score between 0 and 1 for each example. To make a binary prediction, we apply a threshold: if score > τ, predict positive; otherwise predict negative.

ROC curves and AUC are the standard evaluation metrics in medicine, fraud detection, and ad ranking — any domain where the cost of false positives and false negatives differ. A single accuracy number hides exactly the tradeoffs these metrics expose.

The default is often τ = 0.5, but this is arbitrary. The same model with different thresholds produces radically different precision and recall values:

ThresholdTPFPTNFNPrecisionRecall
0.920199300.9520.400
0.54289280.8400.840
0.248257520.6580.960

Which threshold is "best" depends on the application. But comparing two models using metrics at a single threshold is unfair — a model might win at threshold 0.5 and lose at threshold 0.7.

ROC curves solve this: they evaluate the model across all thresholds at once.

Building the ROC Curve

For each threshold value, compute:

  • Symbol: = TP / (TP + FN) — plot on y-axis
  • Symbol: = FP / (FP + TN) — plot on x-axis

As we lower the threshold from 1.0 to 0.0:

  • More examples are predicted positive → TP increases (better recall) but so does FP
  • The curve moves from (0, 0) toward (1, 1)

Worked Example: 5 Examples

Sorted by model score (highest to lowest):

ExampleScoreTrue LabelTPFPTPRFPR
Start000.000.00
A0.92+100.330.00
B0.81+200.670.00
C0.68210.670.50
D0.45+311.000.50
E0.22321.001.00

(There are 3 positives and 2 negatives total)

The ROC curve passes through: (0,0) → (0, 0.33) → (0, 0.67) → (0.5, 0.67) → (0.5, 1.0) → (1.0, 1.0)

AUC: Area Under the ROC Curve

The is the area under the ROC curve, ranging from 0 to 1.

For the worked example above, the area can be computed geometrically or via the trapezoidal rule:

AUC=i=1n1(yi+yi+1)2(xi+1xi)\text{AUC} = \sum_{i=1}^{n-1} \frac{(y_i + y_{i+1})}{2} \cdot (x_{i+1} - x_i)
(x1,y1),,(xn,yn)(x_1, y_1), \dots, (x_n, y_n)
the ROC curve points, from (0,0) to (1,1)
Δxi\Delta x_i
change in FPR between consecutive points

Computing for our 5-example case:

  • Segment (0,0)→(0,0.33): width=0, area=0
  • Segment (0,0.33)→(0,0.67): width=0, area=0
  • Segment (0,0.67)→(0.5,0.67): width=0.5, avg height=0.67, area=0.333
  • Segment (0.5,0.67)→(0.5,1.0): width=0, area=0
  • Segment (0.5,1.0)→(1.0,1.0): width=0.5, avg height=1.0, area=0.5

AUC = 0 + 0 + 0.333 + 0 + 0.5 = 0.833

The Probability Interpretation

AUC has an elegant probabilistic meaning:

AUC = P(score of a random positive > score of a random negative)

In plain English: imagine picking one patient who has the disease and one who doesn't, completely at random. AUC tells you how often the model gives the sick patient a higher risk score than the healthy one. An AUC of 0.85 means the model ranks the sick patient higher 85% of the time — regardless of where you set the threshold. It measures how well the model separates the two classes, not how well it predicts at any one specific cutoff.

In our example, AUC ≈ 0.83 means: if we pick one positive and one negative at random, 83% of the time the model gives a higher score to the positive. This interpretation holds regardless of class balance or threshold.

AUCInterpretation
1.0Perfect: every positive outranks every negative
0.9–1.0Excellent
0.7–0.9Good
0.5–0.7Poor but better than random
0.5Equivalent to random guessing
< 0.5Worse than random (flip predictions)

Code: ROC and AUC in Scikit-learn

from sklearn.metrics import roc_curve, roc_auc_score, RocCurveDisplay
import matplotlib.pyplot as plt

# y_true: array of 0/1 labels
# y_scores: array of model probability outputs
fpr, tpr, thresholds = roc_curve(y_true, y_scores)
auc = roc_auc_score(y_true, y_scores)

print(f"AUC: {auc:.3f}")

# Plot the ROC curve
RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=auc).plot()
plt.plot([0,1], [0,1], 'k--', label='Random')
plt.title(f'ROC Curve (AUC = {auc:.3f})')
plt.show()

Always plot the ROC curve in addition to reporting AUC — the shape of the curve contains information that the scalar AUC hides.

Quiz

1 / 3

On a ROC curve, what do the two axes measure?