Neural Networks

Stacking transformations

9 lessons · 106 min total · Prereq: Derivatives , Vectors & Matrices

Lessons

The single neuron

Activation functions

Layers: h = σ(Wx + b)

The forward pass

The loss surface for deep networks

Batch normalization

Layer normalization

Weight initialization: Xavier and He

Modern activations: GELU, Swish, GLU