LIME: local linear approximations — Interpretability & Fairness

The Surgeon Who Needs a Reason

A hospital deploys a deep neural network that screens chest X-rays for early-stage pneumonia. The model achieves 94% sensitivity — better than the median radiologist. But before the hospital's IRB will approve deployment, clinicians need to understand why the model flags specific scans.

"High confidence: pneumonia" is not a clinical explanation.

This is exactly the problem was designed to solve. Introduced by Ribeiro, Singh, and Guestrin in 2016, LIME provides a local linear explanation for any prediction, from any model.

The Core Insight: Global Complexity, Local Simplicity

A deep neural network that classifies millions of images may be extraordinarily complex globally — its decision boundary twists through thousands of dimensions. But near any single input point, the boundary is approximately flat.

Think of the Earth's surface. Globally, it's a sphere. Locally, standing in a field, it looks flat. LIME uses this local flatness to fit a simple linear model near the prediction you want to explain.

The LIME Algorithm

Given a black-box model and an input you want to explain:

Step 1: Sample perturbations. Generate variants of the input. For text, this means randomly masking words. For tabular data, this means sampling feature values from the marginal distribution. For images, this means toggling superpixel segments on/off.

Step 2: Query the black box. For each perturbed sample, record the model's prediction: .

Step 3: Compute proximity weights. Close examples should matter more than distant ones — a perturbed sample that is nearly identical to the original should have a large influence on our explanation, while a very different sample should barely matter. Assign each perturbed sample a weight based on its distance to the original input :

K(x, z) = \exp!\left(-\frac{D(x,z)^2}{\sigma^2}\right)

$K(x,z)$: proximity weight for sample z relative to x
$D(x,z)$: distance between x and z in the interpretable feature space
$σ$: kernel width — controls how quickly weight decays with distance

Step 4: Fit an interpretable model. We want the simplest explanation that still matches what the black-box model predicts on our nearby samples. The formula below finds the sparse linear model that minimizes weighted prediction error (close samples contribute more) plus a penalty for using too many features. Use the weighted samples to fit a simple model — usually a sparse linear model with at most features:

\hat{g} = \arg\min_{g \in G} \sum_i K(x, z_i)\bigl[f(z_i) - g(z_i&#39;)\bigr]^2 + \Omega(g)

$g$: the local interpretable model
$z'$: simplified binary feature representation of z
$w_g$: weights of the interpretable model
$Ω(g)$: complexity penalty, e.g. L1 for sparsity
$K(x,z)$: proximity weight

Step 5: Return the explanation. The coefficients of are the explanation: positive coefficients = features that pushed the prediction up; negative coefficients = features that pushed it down.

Worked Example: A Sentiment Classifier

A review: "The food was great but the service was absolutely terrible and I'll never return."

Model prediction: negative (probability 0.83).

LIME perturbs this review by randomly dropping words and querying the classifier on ~2000 variants. It then fits a linear model on the bag-of-words representation. The explanation emerges:

Word	LIME weight
"terrible"	+0.41 (pushes toward negative)
"never"	+0.22
"great"	-0.28 (pushes toward positive)
"absolutely"	+0.09
"food"	+0.02

The explanation tells us the model is doing something reasonable: the negatives outweigh the positive sentiment about food. A clinician or content moderator can sanity-check this.

Limitations of LIME

1. Instability. Because LIME uses random sampling, running it twice on the same input often yields different explanations. This instability is dangerous in high-stakes settings. Alvarez-Melis and Jaakkola (2018) quantified this and proposed more stable alternatives.

2. What is "local"? The kernel width determines how large the "neighborhood" is, and there's no principled way to choose it. Too small: very few samples have significant weight. Too large: the linear approximation is poor.

3. The linear approximation may be wrong. In regions where is highly non-linear (near a decision boundary, for example), no linear model is a good local fit. LIME will still return an explanation, but it may be meaningless.

4. Distribution shift. The perturbations LIME generates may not correspond to realistic inputs. Replacing words in a sentence with their absence can produce grammatically odd fragments that the model has never seen in training.

When to Use LIME

LIME is most useful when:

You have a black-box model you cannot modify (no access to gradients or internal structure)
You need to explain individual predictions to non-technical stakeholders
You want a quick sanity check that the model is attending to the right features

It's less useful when you need:

Stable, auditable explanations that a regulator can rely on
Theoretically guaranteed properties (use SHAP instead — next lesson)
Explanations that reflect the model's global behavior

Interactive example

Interactive LIME: Explain a Text Classification Prediction

Coming soon

Summary

LIME explains any single prediction by fitting a locally linear model to perturbed variants of the input.
The proximity kernel ensures the linear fit is anchored to the neighborhood of .
LIME is model-agnostic — it only needs query access to .
Key limitations: instability from random sampling, arbitrary kernel width choice, potential unrealism of perturbations.
Best used for individual prediction explanations and quick sanity checks, not for auditable compliance explanations.