The chain rule, applied
Which weights caused the error?
The chain rule in networks
Computing gradients layer by layer
The full training loop
Vanishing and exploding gradients