6.7920 Fall 2023      

Reinforcement Learning: Foundations and Methods

 

Note: The schedule is subject to minor changes, and will be updated periodically with lecture slides and readings.
Date Tuesday Date Thursday Friday
PART 1: Dynamic Programming
09/07 L01. What is sequential decision making?
Introduction, finite-horizon problem formulation, Markov decision processes, dynamic programming algorithm, sequential decision making as shortest path, course overview
Readings: N1 §3, N2 §1-3, N3 §1; DPOC 1.1-1.3, 2.1; SB Ch1 (skim) [L] [R]
R01.
09/12 L02. Dynamic Programming: What makes sequential decision-making problems hard?
DP algorithm, sequential decision making as shortest path
Readings: DPOC 3.3-3.4 [L] [R]
09/14 L03. Special structures: What makes some sequential decision-making problems easy?
DP arguments, optimal stopping,Linear quadratic regulator
Readings: N3 §2; DPOC 3.1 [L] [R]
R02.
HW0 (not due)
09/19 L04. Discounted infinite horizon problem: tl;dr; DP still works
Bellman equations, value iteration
Readings: DPOC2 1.1-1.2, 1.5, 2.1-2.3 [L] [R]
09/21 L05. Discounted infinite horizon problems: tl;dr: DP still works (Part 2)
Policy iteration, geometric interpretation
Readings: DPOC2 1.1-1.2, 1.5, 2.1-2.3 [R]
HW1 [1,2,3]
Student Holiday
PART 2: Reinforcement Learning
09/26 L06. MDPs and (PO)MDPs: Nuances, simplifications, generalizations
MDP assumptions, policy classes, imperfect state information, separation principle
Readings: DPOC 1.4, 4.1-4.2 [L] [R]
09/28 L07. Model-free methods - From DP to RL
Monte Carlo, stochastic approximation of a mean, temporal differences
Readings: NDP 5.1-5.3, SB 12.1-12.2 [L] [R]
R03.
10/03 L08. Value-based reinforcement learning: All about "Q"
Q-learning, stochastic approximation of a fixed points
Readings: NDP 5.6, 4.1-4.3 [L] [R]
10/05 L09. Value-based reinforcement learning: All about "Q" (Part 2)
Stochastic approximation of a fixed point (cont...)
Readings: NDP 4.1-4.3, 6.1-6.2; DPOC2 6.3 [R]
R04.
HW2 [4,5,6]
10/10 NO CLASS (Student Holiday) 10/12 L10. Advanced value-based RL methods
Function approximation, approximate policy evaluation
Readings: DPOC2 2.5.3; NDP 3.1-3.2, SB 16.5[L] [R]
R05.
10/17 L11. Advanced value-based RL methods
Approximate VI, fitted Q iteration, DQN, DDQN and friends
Readings: DPOC2 2.5.3; NDP 3.1-3.2, SB 13 [R]
10/19 L12. Policy gradient - Simplicity at the cost of variance
Approximate policy iteration, policy gradient, variance reduction, actor-critic
NDP 6.1; SB Ch13 [L] [R]
R06.
HW3 [7,8,9,10,11]
10/24 L13. Advanced policy gradient methods - Managing exploration vs exploitation
Conservative policy space methods, TRPO, PPO
Readings: RLTA 11.1-11.2, Ch12 [L] [R]
10/26 L14. Design and analysis of experiments in RL
DOX process, factors
Readings: Montgomery (Ch1-2), Cookbook [L] [R]
R07.
Project proposal
PART 3: Special topics
10/31 L15. Design and analysis of experiments in RL (Part 2)
Statistical testing, implementation best practices, response variables, choice of design s
Readings: Montgomery (Ch1-2), Cookbook [R]
11/02 L16. Monte Carlo Tree Search
Online planning, MCTS, AlphaGo, AlphaGoZero, MuZero
Readings: SB 16.6 [L] [R]
R08.
HW4 [12, 13]
11/07 L17. Quiz
11/09 L18. Multi-arm bandits - The prototypical exploration-exploitation dilemma
Readings: Bandits Ch 1, Ch 8 [L] [R]
11/14 L19. Recommendation Systems - Between the Theory & Practice of Contextual Bandits
(Guest Lecture: Lihong Li, Amazon)
Readings: News article recommendation [L] [R]
11/16 L20. Connections between Control and Learning
(Guest Lecture: Prof. Anuradha Annaswamy, MIT)
Readings: Adaptive control and RL [L] [R]
R9.
11/21 L21. Learning for discrete optimization
Discrete optimization, integer programming, attention, learning-guided vs construction
[L] [R] HW5 [14, 15, 18]
11/23 NO CLASS (Thanksgiving)
11/28 L22. Distributional RL
(Guest Lecture: Marc G. Bellemare, Reliant AI; formerly, Google Brain)
Readings:Bellemare, Dabney, Rowland [L] [R]
11/30 L23. Reinforcement learning from human feedback
(Guest Lecture: Dylan Hadfield-Menell, MIT)
Inverse RL, RLHF
Open problems in RLHF [L] [R]
12/05 NO CLASS
See merged class on 12/07
12/07 L24+25. Project presentations (double lecture)
Slides due (before class), Reports due (Wed 12/13 5pm)

R10. Project presentations (2x)