Why interpretability matters — Interpretability & Fairness

The Doctor, the Judge, and the Algorithm

In 2016, a man named Eric Loomis was sentenced to six years in prison. Part of the evidence? A score produced by an algorithm called COMPAS — a . Loomis's lawyers asked how the score was calculated. The answer: trade secret. The Wisconsin Supreme Court ruled this acceptable.

That same year, a radiology AI reported a chest X-ray as "low risk for malignancy." The patient died of lung cancer eighteen months later. The radiologist who reviewed the scan didn't understand why the AI was confident — so they deferred to it.

These aren't edge cases. They're the central problem of machine learning in high-stakes settings: we build systems that work, but we don't know why they work, which means we don't know when they'll stop working.

What Does "Interpretable" Actually Mean?

The is not a single thing. Researchers distinguish several axes:

Global vs. Local

Global interpretability: What does the model do overall? What features matter? What decision logic has it learned? This is like asking "how does this type of medical test work?"
Local interpretability: Why did the model make this prediction for this patient? This is like asking "why did you recommend surgery for me specifically?"

Intrinsic vs. Post-hoc

Intrinsic: The model is interpretable by design. A or a shallow is interpretable because the model is the explanation.
Post-hoc: You train any model you like, then apply a separate method to explain it. LIME and SHAP (covered later in this unit) are post-hoc methods.

The Complexity Spectrum

Models trade off interpretability against expressive power. The intuition is direct: the more flexible a model is, the harder it is to summarize what it's doing in human terms.

\text{Interpretability} \propto \frac{1}{\text{Model Complexity}}

$h$: hypothesis complexity
$ε$: approximation error
$d$: dataset size

The rough spectrum, from most to least interpretable:

Linear models — each weight directly means "a one-unit increase in feature changes the output by ." Globally interpretable, but can't model non-linear patterns.
Decision trees — sequences of if-then rules a human can read. Interpretable but brittle.
Gradient boosted trees / Random forests — powerful and accurate, but hundreds of trees is no longer readable. Need feature importance tools.
Deep neural networks — state-of-the-art on most tasks, but with millions of parameters, no individual weight means anything to a human. Require dedicated explanation methods.

Is the Interpretability–Accuracy Tradeoff Real?

The folklore says: "interpretable models are less accurate." This is increasingly contested.

For truly high-dimensional unstructured data — raw pixels, raw audio waveforms — complex models do seem to have an inherent edge. But for tabular data with well-engineered features, the tradeoff is often smaller than assumed.

The EU AI Act and the Right to Explanation

Since May 2018, the EU's gives individuals the right to "meaningful information about the logic involved" in automated decisions. The 2024 EU AI Act goes further, classifying AI systems used in credit scoring, criminal justice, and medical diagnosis as "high risk" — requiring human oversight, documentation, and the ability to explain decisions.

This isn't just European law. Similar bills have passed in US states (Colorado's SB21-169, New York City's Local Law 144), and comparable frameworks exist in Canada, Singapore, and Brazil.

Interpretability is now partly a compliance requirement, not just an engineering virtue.

What Interpretability Cannot Do

Interpretability tools help you:

Audit models for bias before deployment
Debug models when they fail unexpectedly
Communicate model behavior to stakeholders
Build trust with end users who need to understand a recommendation

They don't give you perfect transparency into the model's internal mechanisms — that requires , a harder and more active research frontier.

Worked Example: Spotting a Spurious Correlation

Suppose a lung disease model trained on hospital data achieves 95% accuracy. Feature importance analysis shows that the most predictive feature is "patient was not admitted to the ICU." Sounds odd — sicker patients should have higher risk.

The explanation: asthma patients with pneumonia were routinely sent to the ICU. They got better care, and survived at higher rates than the baseline. The model learned "asthmatic patients = lower risk" because the training data reflected care patterns, not biology. Without interpretability analysis, this model would have sent asthmatic patients home.

This exact scenario was documented in Caruana et al. (2015) using a real hospital dataset.

Interactive example

Model Complexity vs. Interpretability Explorer

Coming soon

Summary

Opaque ML models can cause real harm when deployed in high-stakes settings — criminal justice, medicine, lending.
Interpretability = ability for humans to understand model reasoning; it has multiple flavors: global/local, intrinsic/post-hoc.
The interpretability–accuracy tradeoff is real but smaller than often assumed, especially for tabular data.
Regulation (GDPR, EU AI Act) increasingly mandates explainability for high-risk applications.
Post-hoc explanations are approximations — useful but not identical to the model's actual computation.