Lessons
1
The single neuron
2
Activation functions
3
Layers: h = σ(Wx + b)
4
The forward pass
5
The loss surface for deep networks
6
Batch normalization
7
Layer normalization
8
Weight initialization: Xavier and He
9
Modern activations: GELU, Swish, GLU