Lessons
1
The loss landscape
2
The gradient
3
The update rule: w ← w − α∇L
4
Learning rate
5
Stochastic & mini-batch GD
6
Convergence
7
Momentum: adding velocity to gradient steps
8
RMSprop: adaptive per-parameter rates
9
Adam: the full derivation
10
Learning rate schedules