6.7920 Fall 2023      

Reinforcement Learning: Foundations and Methods


Instructor: Cathy Wu < cathywu at mit dot edu >

TAs: Chanwoo Park, Gilhyun Ryou

Staff email: 6-7920-staff at mit dot edu. Please include “[6.7920]” in your email subject line.

Cross-listed course numbers: 1.127/IDS.140J

Course numbers for earlier versions of the course: 6.7950, 6.246

Lectures: TR 2:30 - 4:00 pm (4-237)


F 10:00 - 11:00 am (32-155), 1:00 - 2:00 pm (32-155)

Office hours:

Cathy Wu: TR 4:00 - 4:30 pm (4-237)
Chanwoo Park: W 2:45 - 3:45 pm (32-D 6th floor lounge)
Gilhyun Ryou: Thu 1:00 - 2:00 pm (32-D 6th floor lounge)

Note on Doctoral Requirements: This course satisfies requirements for the Systems Engineering Core in CEE and the Performance and Optimization Area in the MST/PhD.

Course overview and scope

With a fast-moving field like reinforcement learning (RL), what is an appropriate foundational course to advance research and practice in sequential decision making? This course will be 2/3 exploitation; that is, topics that we know and understand about RL. And it will be 1/3 exploration; that is, selected up-and-coming topics.

Exploitation will consist of a mathematical introduction to RL. Topics include: dynamic programming, special structures, finite and infinite horizon Markov Decision Processes, value and policy iteration, Monte Carlo methods, temporal differences, Q-learning, stochastic approximation, bandits, and finite sample analysis. We will also cover approximate dynamic programming, including value-based methods and policy space methods. While the focus is mathematical, we will supplement with computational exercises.

Exploration may change from offering to offering. In addition to empirical rigor in RL, a variety of applications, recent theoretical results, a mini-theme we are exploring this year is how RL can best cope with problem scale and diversity (think: many agents, generalization, resource allocation problems).

This class is most suitable for graduate or advanced undergraduate students who are interested in advancing the research and practice of reinforcement learning---be it theory, methods, or applications. Please note that this is not primarily a deep RL course, though we will have coverage of some deep RL topics.

Big picture

The schedule is subject to minor changes.


Students should be comfortable at the level of receiving an A grade in probability (6.041 or equivalent) and should be familiar with programming (Python). An analysis prerequisite is suggested but not required; mathematical maturity is necessary.

Textbooks and readings

Useful references (recommended but not required)
  1. Dynamic Programming and Optimal Control (2007), Vol. I, 4th Edition by Dimitri P. Bertsekas. [DPOC]
  2. Dynamic Programming and Optimal Control (2005), Vol. II, 4th Edition by Dimitri P. Bertsekas. [DPOC2]
  3. Neuro Dynamic Programming (1996) by Dimitri P. Bertsekas and John N. Tsitsiklis. [NDP]

Readings: We will give pointers to these references. Some additional readings / notes may be posted.

A note on notation. We will be using contemporary notation (e.g. s, a, V), which differs from notation from these texts (e.g. x, u, J). We will be maximizing instead of minimizing, etc.

Course pointers

  1. Website: https://web.mit.edu/6.7920/www/ (for class materials & info)
  2. Piazza: https://piazza.com/mit/fall2023/67920 (for announcements, HW, solutions, readings). The Piazza is a great resource for you to collaborate with one another. We have a small staff who cannot address every question. Please come to office hours with questions. For obvious reasons, don't post homework answers before the due date in Piazza.
  3. Gradescope: https://www.gradescope.com/courses/605198 (for HW/quiz submissions)
  4. Email: You can reach the staff generally via office hours or via 6-7920-staff at mit dot edu. Please include “[6.7920]” in your email subject line.


Grades will be determined according to the following weights:


Submissions are through Gradescope and due dates will be provided. Please register for the course on Gradescope with your MIT email.

We allow 4 late days across all homeworks. Solutions for homework will be released shortly after the deadline. Late submitters must abide by honor code.

If you are interested in finding pset partners, check out https://psetpartners.mit.edu. Sign up early; matching will be done at the end of the first week of classes.


There will be one in-class quiz: Tuesday 11/07 (tentatively).

Final class project

A class project will be required. Projects can be either individual or 2-person teams. Of course, the expectations for 2-person projects will be higher. We have a strong preference for projects that apply RL to a concrete setting: for example, formulate a problem motivated by some problem that interests you, and study it, analytically or computationally. This year, we are trying out providing some sample project topics, some with volunteer project mentors.

A one-page project proposal is due TBD. There will be project presentations during the last week of classes (12/07). Depending on the number of groups presenting, this may take place during an extended class session. Presentations are due before class, the day of the presentation. Project reports are due 5 pm on Wednesday 12/13; this is a hard deadline, we cannot accommodate extensions.

Class participation

Class participation includes:
  1. Live participation during lectures.
  2. Answering questions for fellow students on Piazza.
  3. Attending office hours and recitation.

Statement on collaboration and academic honesty

If you do collaborate on homework, you must cite, in your written solution, your collaborators. Also, if you use sources beyond the course materials in one of your solutions, e.g., a “friendly expert,” another text, or a ”bible”, be sure to cite the source. There is no penalty for such collaboration or use of other sources, as long as it is disclosed.

We encourage you to collaborate on homework. Study groups can be an excellent means to master course material. However, you must write up solutions on your own, neither copying solutions nor providing solutions to be copied. Duplicating a solution that someone else has written (verbatim or edited), or providing solutions for a fellow-student to copy, is not acceptable.

In general, we expect students to adhere to basic, common sense concepts of academic honesty. Presenting somebody else’s work as if it were your own, or cheating in exams, is of course unacceptable.