Language Models

The math behind GPT and beyond

Lessons

Language modeling: predicting the next token

Tokenization: BPE and SentencePiece

Autoregressive generation

Decoding strategies: greedy, beam, nucleus

Instruction tuning and RLHF

Scaling laws: more compute, more capability

Emergent abilities and in-context learning