6.7920

Note on Doctoral Requirements: This course satisfies requirements for the Systems Engineering Core in CEE, Operations Research Center (ORC) doctoral students, the SES PhD program in IDSS, and the Performance and Optimization Area in MST/PhD in Transportation.

Course overview and scope

With a fast-moving field like reinforcement learning (RL), what is an appropriate foundational course to advance research and practice in sequential decision making? This course will be 2/3 exploitation; that is, topics that we know and understand about RL. And it will be 1/3 exploration; that is, selected up-and-coming topics.

Exploitation will consist of a mathematical introduction to RL. Topics include: dynamic programming, special structures, finite and infinite horizon Markov Decision Processes, value and policy iteration, Monte Carlo methods, temporal differences, Q-learning, stochastic approximation, bandits, and finite sample analysis. We will also cover approximate dynamic programming, including value-based methods and policy space methods. While the focus is mathematical, we will supplement with computational exercises.

Exploration may change from offering to offering. In addition to empirical rigor in RL, a variety of applications, recent theoretical results, a mini-theme we are exploring this year is how RL can best cope with problem scale and diversity (think: many agents, generalization, resource allocation problems).

This class is most suitable for graduate or advanced undergraduate students who are interested in advancing the research and practice of reinforcement learning---be it theory, methods, or applications. Please note that this is not primarily a deep RL course, though we will have coverage of some deep RL topics.

Big picture

Dynamic programming (7 lectures)
Core reinforcement learning (9 lectures)
Special topics (~2 lectures)
Guest lectures (~4 lectures)
Final project presentations

The schedule is subject to minor changes.

Prerequisites

Students should be comfortable at the level of receiving an A grade in probability (6.041 or equivalent) and should be familiar with programming (Python). An analysis prerequisite is suggested but not required; mathematical maturity is necessary.

Textbooks and readings

Useful references (recommended but not required)

Dynamic Programming and Optimal Control (2007), Vol. I, 4th Edition by Dimitri P. Bertsekas. [DPOC]
Dynamic Programming and Optimal Control (2005), Vol. II, 4th Edition by Dimitri P. Bertsekas. [DPOC2]
Neuro-Dynamic Programming (1996) by Dimitri P. Bertsekas and John N. Tsitsiklis. [NDP] Available online: https://web.mit.edu/dimitrib/www/NDP.pdf

Readings: We will give pointers to these references. Some additional readings / notes may be posted.

A note on notation. We will be using contemporary notation (e.g. s, a, V), which diﬀers from notation from these texts (e.g. x, u, J). We will be maximizing instead of minimizing, etc.

Course pointers

Website (you are here): https://web.mit.edu/6.7920/www/ (for class info, schedule)
Canvas: https://canvas.mit.edu/courses/33930 (for announcements, HW, solutions, readings)
Piazza: https://piazza.com/mit/fall2025/67920 (for Q&A, discussions). Piazza is a great resource for you to collaborate with one another. We have a small staff who cannot address every question. Please come to office hours with questions. For obvious reasons, don't post homework answers before the due date in Piazza.
Gradescope: https://www.gradescope.com/courses/1111679 (for HW/quiz submissions)
Email: You can reach the staff generally via office hours or via 6-7920-staff at mit dot edu. Please include “[6.7920]” in your email subject line.

Grading

Grades will be determined according to the following weights:

8 homework assignments (30%)
1 in-class quiz (25%)
Class project (35%)
Class participation (10%)

Homework

Submissions are through Gradescope and due dates will be provided. Please register for the course on Gradescope with your MIT email.

We allow 4 late days across all homeworks. After that, late homework will be penalized 10% every 24 hours. Solutions for homework will be released shortly after the deadline. Late submitters must abide by honor code.

If you are interested in finding pset partners, check out https://psetpartners.mit.edu. Sign up early; matching will be done at the end of the first week of classes.

Quiz

There will be one in-class quiz: Tuesday 11/04 (tentatively).

Final class project

A class project will be required. Projects can be either individual or 2-person teams. Of course, the expectations for 2-person projects will be higher. We have a strong preference for projects that apply RL to a concrete setting: for example, formulate a problem motivated by some problem that interests you, and study it, analytically or computationally. We will provide some sample project topics, some with volunteer project mentors.

A one-page project proposal is required (see schedule). There will be project presentations during the last couple class sessions. Depending on the number of groups presenting, this may take place during an extended class session and/or a poster session. Presentations are due before class, the day of the presentation. Project reports are due 5 pm on the last day of classes; this is a hard deadline, we cannot accommodate extensions.

Class participation

Class participation includes:

Live participation during lectures.
Answering questions for fellow students on Piazza.
Attending office hours and recitation.
Attending guest lectures.

Statement on collaboration and academic honesty

If you do collaborate on homework, you must cite, in your written solution, your collaborators. Also, if you use sources beyond the course materials in one of your solutions, e.g., a “friendly expert,” another text, a “bible”, or generative AI/LLM tools, be sure to cite the source. There is no penalty for such collaboration or use of other sources, as long as it is disclosed.

We encourage you to collaborate on homework. Study groups can be an excellent means to master course material. However, you must write up solutions on your own, neither copying solutions nor providing solutions to be copied. Duplicating a solution that someone else has written (verbatim or edited), or providing solutions for a fellow-student to copy, is not acceptable.

In general, we expect students to adhere to basic, common sense concepts of academic honesty: see the MIT Academic Integrity Handbook. Presenting somebody else’s work as if it were your own, or cheating in exams, is of course unacceptable.

Instructors:
Cathy Wu < cathywu at mit dot edu >

Munther A Dahleh < dahleh at mit dot edu >

Instructor office hours: TR 4-4:30pm (34-101)

Lectures: TR 2:30 - 4:00 pm (34-101)

Recitations:

Cross-listed course numbers: 1.127/IDS.140J

Course numbers for earlier versions of the course:
6.7950, 6.246

Staff email: 6-7920-staff at mit dot edu. Please include “[6.7920]” in your email subject line.

Course overview and scope

Big picture

Prerequisites

Textbooks and readings

Course pointers

Grading

Homework

Quiz

Final class project

Class participation

Statement on collaboration and academic honesty

Instructors: Cathy Wu < cathywu at mit dot edu > Munther A Dahleh < dahleh at mit dot edu > Instructor office hours: TR 4-4:30pm (34-101)

Lectures: TR 2:30 - 4:00 pm (34-101)

Recitations:

Cross-listed course numbers: 1.127/IDS.140J

Course numbers for earlier versions of the course: 6.7950, 6.246

Staff email: 6-7920-staff at mit dot edu. Please include “[6.7920]” in your email subject line.

Course overview and scope

Big picture

Prerequisites

Textbooks and readings

Course pointers

Grading

Homework

Quiz

Final class project

Class participation

Statement on collaboration and academic honesty

Instructors:
Cathy Wu < cathywu at mit dot edu >

Munther A Dahleh < dahleh at mit dot edu >

Instructor office hours: TR 4-4:30pm (34-101)

Course numbers for earlier versions of the course:
6.7950, 6.246