Lessons
1
Language modeling: predicting the next token
2
Tokenization: BPE and SentencePiece
3
Autoregressive generation
4
Decoding strategies: greedy, beam, nucleus
5
Instruction tuning and RLHF
6
Scaling laws: more compute, more capability
7
Emergent abilities and in-context learning