6.7920

Note: The schedule is subject to minor changes, and will be updated periodically with lecture slides and readings.

Date	Tuesday	Date	Thursday	Friday
PART 1: Dynamic Programming
		09/05	L01. What is sequential decision making? Introduction, finite-horizon problem formulation, Markov decision processes, course overview Readings: N1 §3, N2 §1-3, N3 §1; DPOC 1.1-1.3, 2.1; SB Ch1 (skim) [R]	R01. Inventory control. Readings: N3 §3; DPOC 3.2
09/10	L02. Dynamic Programming: What makes sequential decision-making hard? Finite horizon dynamic programming algorithm, sequential decision making as shortest path Readings: DPOC 3.3-3.4 [R] HW1 assigned [1, 2]	09/12	L03. Special structures: What makes some sequential decision-making problems easy? DP arguments, inventory problem, optimal stopping Readings: N3 §3; DPOC 3.2 [R]	R02. Combinatorial optimization as DP. Readings: DPOC App-B
09/17	L04. Special Structures Linear quadratic regulator Reading: N3 §2; DPOC 3.1 [R] HW1 due, HW2 assigned [3,4]	09/19	L05. Non-discounted Infinite Horizon Problem Linear quadratic regulator Readings: DPOC 1.4, 4.1-4.2 [R]	No class (student holiday)
09/24	L06. Discounted infinite horizon problems - tl;dr: DP still works Bellman equations, value iteration Readings: DPOC2 1.1-1.2, 1.5, 2.1-2.3 [R] HW2 due, HW3 assigned[5,6]	09/26	L07. Discounted infinite horizon problems - tl;dr: DP still works (Part 2) Policy iteration, geometric interpretation Readings: DPOC2 1.1-1.2, 1.5, 2.1-2.3 [R]	R03. Linear quadratic regulator Project ideas posted
PART 2: Reinforcement Learning
10/01	L08. Model-free methods - From DP to RL Monte Carlo, policy evaluation, stochastic approximation of a mean, temporal differences Readings: NDP 5.1-5.3, SB 12.1-12.2 [L] [R] HW3 due, HW4 assigned [7,8]	10/03	L09. Value-based reinforcement learning - Policy learning without knowing how the world works State-action value function, Q-learning Readings: NDP 5.6, 4.1-4.3 [L] [R]	R04.
10/8	L10. Value-based reinforcement learning - Policy learning without knowing how the world works (Part 2) Stochastic approximation of a fixed point Readings: NDP 4.1-4.3, 6.1-6.2; DPOC2 6.3 [L] [R] HW4 due, HW5 assigned [9,10]	10/10	L11. Approximate value-based RL - How to approximately solve an RL problem Function approximation, approximate policy evaluation Readings: DPOC2 2.5.3; NDP 3.1-3.2, SB 16.5 [L] [R]	R05.
10/15	NO CLASS (Student Holiday)	10/17	L12. Approximate value-based RL - How to approximately solve an RL problem (Part 2) Approximate VI, fitted Q iteration, DQN, DDQN and friends Readings: DPOC2 2.5.3; NDP 3.1-3.2, SB 16.5 [L] [R] HW5 due, HW6 assigned [11,12]	R06.
10/22	L13. Policy gradient - Simplicity at the cost of variance Approximate policy iteration, policy gradient, variance reduction Readings: NDP 6.1; SB Ch13.1-13.4 [L] [R]	10/24	L14. Actor-critic Methods - Bringing together value-based and policy-based RL Compatible function approximation, A2C, A3C, DDPG, SAC Reaqdings: SB Ch13.5-13.8 [L] [R] HW6 due, HW7 assigned [13,14]	R07. Project proposal
10/29	L15. Advanced policy gradient methods - Managing exploration vs exploitation Conservative policy iteration, NRG, TRPO, PPO Readings: RLTA 11.1-11.2, Ch12 [L] [R]	10/31	L16. Multi-arm bandits - The prototypical exploration-exploitation dilemma Readings: Bandits Ch 1, Ch 8 [L] [R] HW7 due	R08.
PART 3: Special topics
11/05	L17. Quiz	11/07	L18. Evaluation in RL - Is the RL method working? Sensitivity analysis of RL, benchmarking, statistical methods, overfitting to benchmarks, model-based transfer learning Readings: Lecture Appendices A-C [L] [R] HW8 assigned [15,16,18]	No class
11/12	L19. Applications of Reinforcement Learning in Criminal Justice and Healthcare (Guest lecture: Pengyi Shi, Purdue) Readings: (a) (b) [L] [R]	11/14	L20. Monte Carlo Tree Search Online planning, MCTS, AlphaGo, AlphaGoZero, MuZero Readings: SB 16.6 [L] [R] HW8 due	No class
11/19	L21. Rethinking the Theoretical Foundation of Reinforcement Learning (Guest lecture, Nan Jiang, UIUC) Readings: (a) (b) [L] [R]	11/21	L22. Representation-based Reinforcement Learning (Guest lecture: Bo Dai, Georgia Tech) Readings: (a) (b) [L] [R]	R09
11/26	L23. Finite-time Guarantees of Contractive Stochastic Approximation (Guest lecture: Siva Theja Maguluri, Georgia Tech) Readings: (a) (b) [L] [R]	11/28	NO CLASS (Thanksgiving)	No class
12/03	NO CLASS See merged class on Thu	12/05	L24+25. Project presentations (double lecture) Slides due (before class), Reports due (Wed 12/11 5pm)	R10. Project presentations (2x)
12/10	NO CLASS See session on previous Friday