When a model makes mistakes, there are really only three sources of error. Two of them you can control. One you cannot. Understanding which is which changes how you debug and improve your models.
Bias: Systematic Wrongness
Imagine trying to fit a curved line through curved data using only a straight line. No matter how much data you collect or how long you train, the line will always be wrong in the same predictable way. That is .
Bias is the error that comes from wrong assumptions baked into your model's structure. A linear model on data shaped like a parabola has high bias. The model is not flexible enough to represent the truth.
Low-capacity models have high bias. They pre-decide too much about the shape of the relationship before even looking at data.
Variance: Sensitivity to Your Training Set
Now imagine a degree-15 polynomial fit to only 5 data points. It passes perfectly through all 5 - training error is zero. But change two points slightly and you get a completely different wiggly curve. The model memorized your specific training data rather than the underlying pattern.
That is . A high-variance model would change dramatically if trained on a different sample from the same distribution.
High-capacity models have high variance. They are flexible enough to memorize noise, not just signal.
The Archery Analogy
Picture two archers shooting at a bullseye:
- High-bias archer: every arrow lands 2 feet to the left. Consistently wrong in the same direction - tight grouping, off target.
- High-variance archer: arrows scatter all over - sometimes left, sometimes right. Unpredictable.
Both are bad. What you want is tight grouping in the center: low bias AND low variance. The problem is that fixing one tends to worsen the other.
Interactive example
Bias vs. variance target diagram - drag complexity slider to see arrow scatter change
Coming soon
The Decomposition
The total expected prediction error decomposes into three parts:
- error from wrong model assumptions
- sensitivity to training sample
- irreducible noise in the data
- Bias²: how far off your average prediction is from the truth.
- Variance: how spread out predictions are around that average across different training sets.
- σ²: the randomness inherent in the data. Even a perfect model cannot predict the exact flip of a coin.
You cannot eliminate irreducible noise. It is your floor.
The Complexity Spectrum
Think about fitting polynomial curves to data:
| Complexity | Bias | Variance | Behavior |
|---|---|---|---|
| Degree 1 (line) | High | Low | Misses curves, looks similar across datasets |
| Degree 3-5 | Balanced | Balanced | Usually the sweet spot |
| Degree 15 | Low | High | Memorizes training points, wildly different across datasets |
This pattern generalizes to all model families. Shallow decision trees have high bias; deep unconstrained trees have high variance. Small networks underfit; large networks without regularization overfit.
Interactive example
Polynomial degree slider - watch training error vs. test error as complexity grows
Coming soon
Where Regularization Comes In
Regularization techniques deliberately add a small amount of bias in exchange for reducing variance significantly. You constrain the model, pushing it toward simpler solutions so it does not memorize noise.
Think of it as a penalty: "Yes, you could memorize every training example with a wildly complex function, but complexity costs you." The model settles for a slightly worse training fit in exchange for better generalization.
- regularization strength - controls how much complexity is penalized
- complexity penalty on model weights
Adding bias is usually a good trade when your model is overfitting. The bias you add is small and controlled; the variance you reduce can be enormous.
What Is More Dangerous in Practice
In practice, overfitting (high variance) is usually more dangerous than underfitting (high bias) for complex models. High-bias failures are obvious - the model just does not work well. High-variance failures are subtle - the model looks amazing on training data and only fails in deployment.
Regularization exists specifically to make high-capacity models trustworthy, not just impressive on training benchmarks.