Reinforcement Learning

Learning from rewards, not labels

7 lessons · 96 min total · Prereq: Probability & Statistics , Neural Networks

Lessons

The RL framework: agents, states, rewards

Markov Decision Processes

Q-learning: learning action values

Deep Q-Networks (DQN)

Policy gradients: REINFORCE

Actor-critic methods

RL in modern ML: RLHF and AlphaGo