Attention & Transformers

The math behind modern AI

6 lessons · 76 min total · Prereq: Vectors & Matrices , Classification

Lessons

The sequence problem

Attention as weighted averaging

Q, K, V: the full attention formula

Multi-head attention

Positional encoding

The full transformer block