Normalization & Initialization

Making deep networks trainable

7 lessons · 82 min total · Prereq: Neural Networks , Backpropagation

Lessons

The activation distribution problem

Batch normalization: the algorithm

BatchNorm at inference time

Layer normalization

Instance, group, and weight normalization

Why weight initialization matters

Xavier and He initialization: the math