Skip to content
Math Foundation Derivatives
Lesson 2 ⏱ 10 min

The power rule

Video coming soon

The Power Rule: Unlocking Polynomial Derivatives

Step-by-step walkthrough of the power rule, coefficient rule, and sum rule. Ends with a concrete ML loss-gradient example.

⏱ ~6 min

🧮

Quick refresher

What a derivative measures

The derivative f'(x) measures the slope of f at a given point - how fast the function is changing. Positive = increasing, negative = decreasing, zero = flat.

Example

For f(x) = x², f'(x) = 2x.

At x=3, the slope is 6.

The One Rule That Unlocks Most of Calculus

The is the workhorse of differentiation. Learn it once and you can differentiate any polynomial instantly.

ddxxn=nxn1\frac{d}{dx} x^n = n \cdot x^{n-1}
nn
exponent - any real number
xx
the variable

The exponent slides down to become a coefficient, and the original exponent decreases by one. That is the entire rule.

Every ML loss function you will use — mean squared error, regularization terms, polynomial activations — is built from expressions the power rule can differentiate exactly. Learning this rule once makes gradient computation for all of them trivial.

Four Core Examples

Example 1: ddxx3\frac{d}{dx} x^3

Bring the 3 down, subtract 1: 3x31=3x23 \cdot x^{3-1} = 3x^2.

Example 2: ddxx4=4x3\frac{d}{dx} x^4 = 4x^3

Example 3: ddxx=ddxx1=1x0=1\frac{d}{dx} x = \frac{d}{dx} x^1 = 1 \cdot x^0 = 1

The derivative of xx is always 1. Makes sense: f(x)=xf(x) = x is a straight line with slope 1.

Example 4: ddxx=ddxx0.5\frac{d}{dx} \sqrt{x} = \frac{d}{dx} x^{0.5}

ddxx0.5=0.5x0.5=12x\frac{d}{dx} x^{0.5} = 0.5 \cdot x^{-0.5} = \frac{1}{2\sqrt{x}}
x0.5x^{0.5}
the square root written as a power

The power rule works for any real exponent - fractional, negative, or zero.

InteractiveDrag the point — watch the tangent line
xf(x)-2-112f(x) = x²f'(1.2) = 2.4
x =1.20
f(x) =1.44
f'(x) = 2x =2.40

rising — increasing x increases loss

Drag left or right. The orange dashed line is the tangent — its slope is the derivative. At x = 0 the derivative is 0: that's the minimum.

The Constant Rule

The derivative of any standalone constant is zero:

ddxc=0\frac{d}{dx}\thinspace c = 0
cc
any constant number

Why? A constant function is a flat horizontal line. Slope =0= 0. There is nothing to change, so the rate of change is zero.

The Coefficient Rule: Constants Slide Out

The passes cleanly through differentiation:

ddx(cxn)=cnxn1\frac{d}{dx}(c \cdot x^n) = c \cdot n \cdot x^{n-1}
cc
constant coefficient
nn
exponent

Examples:

  • 5x²: 52x=10x5 \cdot 2x = 10x
  • 3x⁴: 34x3=12x33 \cdot 4x^3 = 12x^3
  • -2x³: 23x2=6x2-2 \cdot 3x^2 = -6x^2
  • 100x: 1001=100100 \cdot 1 = 100

The intuition: if a function is scaled by a constant, its rate of change is scaled by the same constant.

The Sum Rule: Differentiate Term by Term

The says each term of a polynomial can be differentiated independently:

\frac{d}{dx}[f(x) + g(x)] = f'(x) + g'(x)
f(x)f(x)
first function
g(x)g(x)
second function

Full example:

ddx(4x53x2+7x2)=20x46x+7\frac{d}{dx}(4x^5 - 3x^2 + 7x - 2) = 20x^4 - 6x + 7
xx
variable

Walk each term: 45x4=20x44 \cdot 5x^4 = 20x^4, 32x=6x-3 \cdot 2x = -6x, 71=77 \cdot 1 = 7, and 20-2 \to 0 (constant).

A Real ML Example: Differentiating a Loss Function

Your model predicts y^=wx+b\hat{y} = wx + b with input x=2x = 2, true label y=3y = 3, and squared-error :

L(w,b)=(yy^)2=(32wb)2L(w, b) = (y - \hat{y})^2 = (3 - 2w - b)^2
ww
weight parameter
bb
bias parameter
LL
scalar loss value

Substitute y^=wx+b=2w+b\hat{y} = wx + b = 2w + b:

Now differentiate with respect to using the sum rule (one term at a time):

Lw=ddw(k2)0+ddw(4kw)4k+ddw(4w2)8w=4k+8w=4(3b)+8w\frac{\partial L}{\partial w} = \underbrace{\frac{d}{dw}(k^2)}{0} + \underbrace{\frac{d}{dw}(-4kw)}{-4k} + \underbrace{\frac{d}{dw}(4w^2)}_{8w} = -4k + 8w = -4(3 - b) + 8w
kk
shorthand for 3 - b, treated as constant w.r.t. w

At w=0.5,b=0w = 0.5,\thinspace b = 0: L/w=12+4=8\partial L/\partial w = -12 + 4 = -8.

The update with learning rate :

wwα(8)=w+8αw \leftarrow w - \alpha \cdot (-8) = w + 8\alpha
α\alpha
learning rate
ww
current weight

The negative gradient pushes ww upward - exactly the direction that reduces this loss. That is learning.

With these four rules you can differentiate any polynomial. Most ML loss functions - MSE, L2 regularization - are polynomials or closely related. The next lessons extend this to nested functions (chain rule) and multi-variable functions (partial derivatives).

import torch

# PyTorch applies all these rules automatically via autograd
w = torch.tensor(0.5, requires_grad=True)
b = torch.tensor(0.0, requires_grad=True)

x, y = 2.0, 3.0
loss = (y - w * x - b) ** 2   # polynomial in w and b

loss.backward()
print(f"∂L/∂w = {w.grad.item():.2f}")   # → -8.0  (matches our hand calc: -4(3-0)+8·0.5)
print(f"∂L/∂b = {b.grad.item():.2f}")   # → -6.0
InteractiveDrag the point — watch the tangent line
xf(x)-2-112f(x) = x²f'(1.2) = 2.4
x =1.20
f(x) =1.44
f'(x) = 2x =2.40

rising — increasing x increases loss

Drag left or right. The orange dashed line is the tangent — its slope is the derivative. At x = 0 the derivative is 0: that's the minimum.

Quiz

1 / 3

What is d/dx x⁴?