The One Rule That Unlocks Most of Calculus
The is the workhorse of differentiation. Learn it once and you can differentiate any polynomial instantly.
- exponent - any real number
- the variable
The exponent slides down to become a coefficient, and the original exponent decreases by one. That is the entire rule.
Every ML loss function you will use — mean squared error, regularization terms, polynomial activations — is built from expressions the power rule can differentiate exactly. Learning this rule once makes gradient computation for all of them trivial.
Four Core Examples
Example 1:
Bring the 3 down, subtract 1: .
Example 2:
Example 3:
The derivative of is always 1. Makes sense: is a straight line with slope 1.
Example 4:
- the square root written as a power
The power rule works for any real exponent - fractional, negative, or zero.
rising — increasing x increases loss
Drag left or right. The orange dashed line is the tangent — its slope is the derivative. At x = 0 the derivative is 0: that's the minimum.
The Constant Rule
The derivative of any standalone constant is zero:
- any constant number
Why? A constant function is a flat horizontal line. Slope . There is nothing to change, so the rate of change is zero.
The Coefficient Rule: Constants Slide Out
The passes cleanly through differentiation:
- constant coefficient
- exponent
Examples:
- 5x²:
- 3x⁴:
- -2x³:
- 100x:
The intuition: if a function is scaled by a constant, its rate of change is scaled by the same constant.
The Sum Rule: Differentiate Term by Term
The says each term of a polynomial can be differentiated independently:
- first function
- second function
Full example:
- variable
Walk each term: , , , and (constant).
A Real ML Example: Differentiating a Loss Function
Your model predicts with input , true label , and squared-error :
- weight parameter
- bias parameter
- scalar loss value
Substitute :
Now differentiate with respect to using the sum rule (one term at a time):
- shorthand for 3 - b, treated as constant w.r.t. w
At : .
The update with learning rate :
- learning rate
- current weight
The negative gradient pushes upward - exactly the direction that reduces this loss. That is learning.
With these four rules you can differentiate any polynomial. Most ML loss functions - MSE, L2 regularization - are polynomials or closely related. The next lessons extend this to nested functions (chain rule) and multi-variable functions (partial derivatives).
import torch
# PyTorch applies all these rules automatically via autograd
w = torch.tensor(0.5, requires_grad=True)
b = torch.tensor(0.0, requires_grad=True)
x, y = 2.0, 3.0
loss = (y - w * x - b) ** 2 # polynomial in w and b
loss.backward()
print(f"∂L/∂w = {w.grad.item():.2f}") # → -8.0 (matches our hand calc: -4(3-0)+8·0.5)
print(f"∂L/∂b = {b.grad.item():.2f}") # → -6.0
rising — increasing x increases loss
Drag left or right. The orange dashed line is the tangent — its slope is the derivative. At x = 0 the derivative is 0: that's the minimum.