Note: The schedule is subject to minor changes, and will be updated periodically with lecture slides and readings.
Date | Tuesday | Date | Thursday | Friday | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||
09/05 | L01. What is sequential decision making? Introduction, finite-horizon problem formulation, Markov decision processes, course overview Readings: N1 §3, N2 §1-3, N3 §1; DPOC 1.1-1.3, 2.1; SB Ch1 (skim) [R] |
R01. Inventory control. Readings: N3 §3; DPOC 3.2 |
||||||||||||
09/10 | L02. Dynamic Programming: What makes sequential decision-making hard? Finite horizon dynamic programming algorithm, sequential decision making as shortest path Readings: DPOC 3.3-3.4 [R] HW1 assigned [1, 2] |
09/12 | L03. Special structures: What makes some sequential decision-making problems easy? DP arguments, inventory problem, optimal stopping Readings: N3 §3; DPOC 3.2 [R] |
R02. Combinatorial optimization as DP. Readings: DPOC App-B |
||||||||||
09/17 | L04. Special Structures Linear quadratic regulator Reading: N3 §2; DPOC 3.1 [R] HW1 due, HW2 assigned [3,4] |
09/19 | L05. Non-discounted Infinite Horizon Problem Linear quadratic regulator Readings: DPOC 1.4, 4.1-4.2 [R] |
No class (student holiday) | ||||||||||
09/24 | L06. Discounted infinite horizon problems - tl;dr: DP still works Bellman equations, value iteration Readings: DPOC2 1.1-1.2, 1.5, 2.1-2.3 [R] HW2 due, HW3 assigned[5,6] |
09/26 | L07. Discounted infinite horizon problems - tl;dr: DP still works (Part 2) Policy iteration, geometric interpretation Readings: DPOC2 1.1-1.2, 1.5, 2.1-2.3 [R] |
R03. Linear quadratic regulator Project ideas posted |
||||||||||
10/01 | L08. Model-free methods - From DP to RL Monte Carlo, policy evaluation, stochastic approximation of a mean, temporal differences Readings: NDP 5.1-5.3, SB 12.1-12.2 [L] [R] HW3 due, HW4 assigned [7,8] |
10/03 | L09. Value-based reinforcement learning - Policy learning without knowing how the world works State-action value function, Q-learning Readings: NDP 5.6, 4.1-4.3 [L] [R] |
R04. | ||||||||||
10/8 | L10. Value-based reinforcement learning - Policy learning without knowing how the world works (Part 2) Stochastic approximation of a fixed point Readings: NDP 4.1-4.3, 6.1-6.2; DPOC2 6.3 [L] [R] HW4 due, HW5 assigned [9,10] |
10/10 | L11. Approximate value-based RL - How to approximately solve an RL problem Function approximation, approximate policy evaluation Readings: DPOC2 2.5.3; NDP 3.1-3.2, SB 16.5 [L] [R] |
R05. | ||||||||||
10/15 | NO CLASS (Student Holiday) |
10/17 | L12. Approximate value-based RL - How to approximately solve an RL problem (Part 2) Approximate VI, fitted Q iteration, DQN, DDQN and friends Readings: DPOC2 2.5.3; NDP 3.1-3.2, SB 16.5 [L] [R] HW5 due, HW6 assigned [11,12] |
R06. | ||||||||||
10/22 | L13. Policy gradient - Simplicity at the cost of variance Approximate policy iteration, policy gradient, variance reduction Readings: NDP 6.1; SB Ch13.1-13.4 [L] [R] |
10/24 | L14. Actor-critic Methods - Bringing together value-based and policy-based RL Compatible function approximation, A2C, A3C, DDPG, SAC Reaqdings: SB Ch13.5-13.8 [L] [R] HW6 due, HW7 assigned [13,14] |
R07. Project proposal |
||||||||||
10/29 | L15. Advanced policy gradient methods - Managing exploration vs exploitation Conservative policy iteration, NRG, TRPO, PPO Readings: RLTA 11.1-11.2, Ch12 [L] [R] |
10/31 | L16. Multi-arm bandits - The prototypical exploration-exploitation dilemma Readings: Bandits Ch 1, Ch 8 [L] [R] HW7 due |
R08. | ||||||||||
11/05 | L17. Quiz |
11/07 | L18. Evaluation in RL - Is the RL method working? Sensitivity analysis of RL, benchmarking, statistical methods, overfitting to benchmarks, model-based transfer learning Readings: Lecture Appendices A-C [L] [R] HW8 assigned [15,16,18] |
No class | ||||||||||
11/12 | L19. Applications of Reinforcement Learning in Criminal Justice and Healthcare (Guest lecture: Pengyi Shi, Purdue) Readings: (a) (b) [L] [R] |
11/14 | L20. Monte Carlo Tree Search Online planning, MCTS, AlphaGo, AlphaGoZero, MuZero Readings: SB 16.6 [L] [R] HW8 due |
No class | ||||||||||
11/19 | L21. Rethinking the Theoretical Foundation of Reinforcement Learning (Guest lecture, Nan Jiang, UIUC) Readings: (a) (b) [L] [R] |
11/21 | L22. Representation-basedĀ Reinforcement Learning (Guest lecture: Bo Dai, Georgia Tech) Readings: (a) (b) [L] [R] |
R09 | ||||||||||
11/26 | L23. Finite-time Guarantees of Contractive Stochastic Approximation (Guest lecture: Siva Theja Maguluri, Georgia Tech) Readings: (a) (b) [L] [R] |
11/28 | NO CLASS (Thanksgiving) | No class | ||||||||||
12/03 | NO CLASS See merged class on Thu |
12/05 | L24+25. Project presentations (double lecture) Slides due (before class), Reports due (Wed 12/11 5pm) |
R10. Project presentations (2x) | ||||||||||
12/10 | NO CLASS See session on previous Friday |