Skip to content
Math Foundation Derivatives
Lesson 1 ⏱ 12 min

What does a derivative measure?

Video coming soon

Derivatives Explained: Slope at a Single Point

Animated visual showing secant lines converging to the tangent line as h approaches zero.

⏱ ~7 min

Slope at a Point

You already know how to compute the slope of a straight line:

slope=ΔyΔx=y2y1x2x1\text{slope} = \frac{\Delta y}{\Delta x} = \frac{y_2 - y_1}{x_2 - x_1}
Δy\Delta y
rise - change in output
Δx\Delta x
run - change in input

But what about a curve? The slope of a curve changes as you move along it. A derivative answers the question: what is the slope at this exact point?

InteractiveDrag the point — watch the tangent line
xf(x)-2-112f(x) = x²f'(1.2) = 2.4
x =1.20
f(x) =1.44
f'(x) = 2x =2.40

rising — increasing x increases loss

Drag left or right. The orange dashed line is the tangent — its slope is the derivative. At x = 0 the derivative is 0: that's the minimum.

The big idea: zoom in far enough on any smooth curve, and it starts to look like a straight line. The slope of that line is the derivative.

Derivatives are the engine of every trained ML model. When a neural network learns, it computes the derivative of its error with respect to each parameter — potentially billions of them — and nudges each one in the direction that reduces that error. This process is called gradient descent, and it runs on calculus you are about to understand.

The Formal Definition (Intuition First)

Draw a chord connecting two points on a curve: (x,f(x))(x, f(x)) and (x+h,f(x+h))(x+h, f(x+h)). The slope of this chord is:

chord slope=f(x+h)f(x)h\text{chord slope} = \frac{f(x+h) - f(x)}{h}
hh
a small nudge in x - how far apart the two points are
f(x+h)f(x)f(x+h) - f(x)
how much the function value changes over that nudge

Now shrink — from 0.1, to 0.01, to 0.001. The chord rotates, getting closer and closer to the tangent line at xx. In the limit:

f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}
f(x)f'(x)
the derivative - the exact slope at x
lim\lim
'the limit as h approaches zero' - the theoretical endpoint of shrinking h forever

A Concrete Example: f(x) = x²

The derivative of f(x)=x2f(x) = x^2 is f'(x) = 2x. (We'll learn why in the next lesson — for now, use it.)

xf(x) = x²f'(x) = 2xWhat this means
-24-4Steeply falling
000Flat — bottom of the bowl
112Rising moderately
396Rising steeply

The because that's where the parabola turns from falling to rising.

Four Ways to Write a Derivative

All of these mean exactly the same thing:

  • f'(x) — "f prime of x." Compact, clean.
  • df/dx — "the derivative of f with respect to x." Explicitly names what's being differentiated.
  • dL/dw — you'll see this constantly. "How does loss L change when we nudge weight w?"
  • ∂L/∂w notation. Used when the function has multiple inputs (which every real ML model does).

The last two — dLdw\frac{dL}{dw} and Lw\frac{\partial L}{\partial w} — are the beating heart of gradient descent.

The Sign Tells You the Direction

The :

  • f'(x) > 0: function is increasing. Moving right goes uphill.
  • f'(x) < 0: function is decreasing. Moving right goes downhill.
  • f'(x) = 0: momentarily flat. Could be a minimum, maximum, or inflection point.

The ML Connection

Your model has parameters — weights and biases . The loss function measures prediction error. You want to minimize L.

The derivative tells you exactly how to adjust each parameter:

wwαLww \leftarrow w - \alpha \cdot \frac{\partial L}{\partial w}
ww
a parameter (weight or bias)
α\alpha
learning rate - how big a step to take (small positive number)
L/w\partial L / \partial w
gradient of loss with respect to w - the derivative

Without derivatives, you'd have no way to know which direction to adjust billions of weights. Derivatives are what make learning from data possible.

# Numerical derivative: approximate f'(x) using a tiny step h
def numerical_deriv(f, x, h=1e-5):
    return (f(x + h) - f(x - h)) / (2 * h)

f = lambda x: x**2
print(numerical_deriv(f, 1))  # → ≈ 2.0   (f'(1) = 2·1 = 2)
print(numerical_deriv(f, 3))  # → ≈ 6.0   (f'(3) = 2·3 = 6)

# In practice, PyTorch computes exact derivatives automatically
import torch

w = torch.tensor(0.5, requires_grad=True)
b = torch.tensor(0.0, requires_grad=True)
x, y = 2.0, 3.0

loss = (y - w * x - b) ** 2
loss.backward()        # applies chain rule through every operation

print(f"∂L/∂w = {w.grad.item():.4f}")   # exact gradient, no hand math
print(f"∂L/∂b = {b.grad.item():.4f}")

Quiz

1 / 4

The derivative of f(x) at a point tells us...