Note: The schedule is subject to minor changes, and will be updated periodically with lecture slides and readings.
Note: All lecture slides will be posted in Canvas.
Note: All lecture slides will be posted in Canvas.
Date | Tuesday | Date | Thursday | Friday | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||
09/04 | L01. What is sequential decision making? Introduction, finite-horizon problem formulation, Markov decision processes, course overview Readings: N1 §3, N2 §1-3, N3 §1; DPOC 1.1-1.3, 2.1; SB Ch1 (skim) [Lec] |
R01. Inventory control. Readings: N3 §3; DPOC 3.2 |
||||||||||||
09/09 | L02. Dynamic programming: What makes sequential decision-making hard? Finite horizon dynamic programming (DP) algorithm, sequential decision making as shortest path Readings: N2 §1-3; DPOC 3.3-3.4 HW1 assigned [1,2] |
09/11 | L03. Special structures: What makes some sequential decision-making problems easy? DP arguments, inventory problem, optimal stopping Readings: N2 §1-3, N3 §2-3; DPOC 3.2 |
R02. Combinatorial optimization as DP. Readings: DPOC App-B |
||||||||||
09/16 | L04. Special Structures Linear quadratic regulator Reading: N3 §1-2; DPOC 3.1 HW1 due, HW2 assigned [3,4] |
09/18 | L05. Non-discounted Infinite Horizon Problem Linear quadratic regulator Readings: N3 §1-2; DPOC 1.4, 4.1-4.2 |
No class (student holiday) | ||||||||||
09/23 | L06. Discounted infinite horizon problems - tl;dr: DP still works Bellman equations, value iteration Readings: N5 §1-8; DPOC2 1.1-1.2, 1.5, 2.1-2.3 HW2 due, HW3 assigned [5,6] |
09/25 | L07. Discounted infinite horizon problems - tl;dr: DP still works (Part 2) Policy iteration, geometric interpretation Readings: N5 §1-8; DPOC2 1.1-1.2, 1.5, 2.1-2.3 |
R03. Modified policy iteration | ||||||||||
09/30 | L08. Model-free Policy Evaluation - Policy evaluation without knowing how the world works Monte Carlo, policy evaluation, stochastic approximation of a mean, temporal differences (TD), TD(λ) Readings: NDP 5.1-5.3, SB 12.1-12.2 HW3 due, HW4 assigned [7,8] Project guidelines posted |
10/02 | L09. Model-free Policy Learning - Policy learning without knowing how the world works State-action value function, Q-learning Readings: NDP 5.6, 4.1-4.3 |
R04. Convergence of stochastic value iteration | ||||||||||
10/07 | L10. Convergence of TD Methods - tl;dr: Noisy, bootstrapped updates work Stochastic approximation of a fixed point Readings: NDP 4.1-4.3, 6.1-6.2; DPOC2 6.3 HW4 due, HW5 assigned [9,10] |
10/09 | L11. Convergence of TD Methods - tl;dr: Noisy, bootstrapped updates work (Part 2) Stochastic approximation of a fixed point Readings: NDP 4.1-4.3, 6.1-6.2; DPOC2 6.3 |
R05. Approximate value iteration | ||||||||||
10/14 | L12. Approximate value-based RL - How to approximately solve an RL problem Function approximation, approximate policy evaluation Readings: DPOC2 2.5.3; NDP 3.1-3.2, SB 16.5 HW5 due, HW6 assigned [11,12] |
10/16 | L13. Approximate value-based RL - How to approximately solve an RL problem (Part 2) Approximate VI, fitted Q iteration, DQN, DDQN and friends Readings: DPOC2 2.5.3; NDP 3.1-3.2, SB 16.5 |
R06. Performance difference lemma | ||||||||||
10/21 | L14. Policy gradient - Simplicity at the cost of variance Approximate policy iteration, policy gradient, variance reduction Readings: NDP 6.1; SB Ch13.1-13.4 HW6 due, HW7 assigned [13,14] |
10/23 | L15. Actor-critic Methods - Bringing together value-based and policy-based RL Compatible function approximation, A2C, A3C, DDPG, SAC Reaqdings: SB Ch13.5-13.8 |
R07. Project proposal due |
||||||||||
10/28 |
L16. Advanced policy gradient methods - Managing exploration vs exploitation Conservative policy iteration, NRG, TRPO, PPO Readings: RLTA 11.1-11.2, Ch12 HW7 due HW8 assigned [15,16,18] |
10/30 |
L17. Multi-arm bandits - The prototypical exploration-exploitation dilemma Readings: Bandits Ch 1, Ch 8 |
R08. | ||||||||||
11/04 | L18. Quiz |
11/06 | L19. Evaluation in RL - Is the RL method working? Sensitivity analysis of RL, benchmarking, statistical methods, overfitting to benchmarks, model-based transfer learning Readings: Lecture Appendices A-C |
R09 | ||||||||||
11/11 | NO CLASS (Student Holiday) | 11/13 | L20. Guest Lecture HW8 due |
R10 | ||||||||||
11/18 | L21. Guest Lecture |
11/20 | L22. Guest Lecture |
R11 | ||||||||||
11/25 | L23. Monte Carlo Tree Search Online planning, MCTS, AlphaGo, AlphaGoZero, MuZero Readings: SB 16.6 |
11/27 | NO CLASS (Thanksgiving) | No class | ||||||||||
12/02 | L24. Guest Lecture |
12/04 | Final Project Presentations (~3 hours) Presentations due (before class), Reports due (Wed 12/10 5pm) |
|||||||||||
12/09 | Final Project Presentations (~3 hours) |