Gradient Descent

How the machine learns

10 lessons · 120 min total · Prereq: Derivatives

Lessons

The loss landscape

The gradient

The update rule: w ← w − α∇L

Learning rate

Stochastic & mini-batch GD

Convergence

Momentum: adding velocity to gradient steps

RMSprop: adaptive per-parameter rates

Adam: the full derivation

Learning rate schedules