Learning from rewards, not labels
The RL framework: agents, states, rewards
Markov Decision Processes
Q-learning: learning action values
Deep Q-Networks (DQN)
Policy gradients: REINFORCE
Actor-critic methods
RL in modern ML: RLHF and AlphaGo