Slope at a Point
You already know how to compute the slope of a straight line:
- rise - change in output
- run - change in input
But what about a curve? The slope of a curve changes as you move along it. A derivative answers the question: what is the slope at this exact point?
rising — increasing x increases loss
Drag left or right. The orange dashed line is the tangent — its slope is the derivative. At x = 0 the derivative is 0: that's the minimum.
The big idea: zoom in far enough on any smooth curve, and it starts to look like a straight line. The slope of that line is the derivative.
Derivatives are the engine of every trained ML model. When a neural network learns, it computes the derivative of its error with respect to each parameter — potentially billions of them — and nudges each one in the direction that reduces that error. This process is called gradient descent, and it runs on calculus you are about to understand.
The Formal Definition (Intuition First)
Draw a chord connecting two points on a curve: and . The slope of this chord is:
- a small nudge in x - how far apart the two points are
- how much the function value changes over that nudge
Now shrink — from 0.1, to 0.01, to 0.001. The chord rotates, getting closer and closer to the tangent line at . In the limit:
- the derivative - the exact slope at x
- 'the limit as h approaches zero' - the theoretical endpoint of shrinking h forever
A Concrete Example: f(x) = x²
The derivative of is f'(x) = 2x. (We'll learn why in the next lesson — for now, use it.)
| x | f(x) = x² | f'(x) = 2x | What this means |
|---|---|---|---|
| -2 | 4 | -4 | Steeply falling |
| 0 | 0 | 0 | Flat — bottom of the bowl |
| 1 | 1 | 2 | Rising moderately |
| 3 | 9 | 6 | Rising steeply |
The because that's where the parabola turns from falling to rising.
Four Ways to Write a Derivative
All of these mean exactly the same thing:
- f'(x) — "f prime of x." Compact, clean.
- df/dx — "the derivative of f with respect to x." Explicitly names what's being differentiated.
- dL/dw — you'll see this constantly. "How does loss L change when we nudge weight w?"
- ∂L/∂w — notation. Used when the function has multiple inputs (which every real ML model does).
The last two — and — are the beating heart of gradient descent.
The Sign Tells You the Direction
The :
- f'(x) > 0: function is increasing. Moving right goes uphill.
- f'(x) < 0: function is decreasing. Moving right goes downhill.
- f'(x) = 0: momentarily flat. Could be a minimum, maximum, or inflection point.
The ML Connection
Your model has parameters — weights and biases . The loss function measures prediction error. You want to minimize L.
The derivative tells you exactly how to adjust each parameter:
- a parameter (weight or bias)
- learning rate - how big a step to take (small positive number)
- gradient of loss with respect to w - the derivative
Without derivatives, you'd have no way to know which direction to adjust billions of weights. Derivatives are what make learning from data possible.
# Numerical derivative: approximate f'(x) using a tiny step h
def numerical_deriv(f, x, h=1e-5):
return (f(x + h) - f(x - h)) / (2 * h)
f = lambda x: x**2
print(numerical_deriv(f, 1)) # → ≈ 2.0 (f'(1) = 2·1 = 2)
print(numerical_deriv(f, 3)) # → ≈ 6.0 (f'(3) = 2·3 = 6)
# In practice, PyTorch computes exact derivatives automatically
import torch
w = torch.tensor(0.5, requires_grad=True)
b = torch.tensor(0.0, requires_grad=True)
x, y = 2.0, 3.0
loss = (y - w * x - b) ** 2
loss.backward() # applies chain rule through every operation
print(f"∂L/∂w = {w.grad.item():.4f}") # exact gradient, no hand math
print(f"∂L/∂b = {b.grad.item():.4f}")