6.7950

Note: This subject is approved for TQE substitution for 6.231/6.7940 in EECS. Pending confirmation, this course is permitted for the Systems Engineering Core in CEE.

Course overview and scope

With a fast-moving field like reinforcement learning (RL), what is an appropriate foundational course to advance research and practice in sequential decision making? This course will be 2/3 exploitation; that is, topics that we know and understand about RL. And it will be 1/3 exploration; that is, selected up-and-coming topics.

Exploitation will consist of a mathematical introduction to RL. Topics include: dynamic programming, special structures, finite and infinite horizon Markov Decision Processes, value and policy iteration, Monte Carlo methods, temporal differences, Q-learning, stochastic approximation, bandits, and finite sample analysis. We will also cover approximate dynamic programming, including value-based methods and policy space methods. While the focus is mathematical, we will supplement with computational exercises.

Exploration may change from offering to offering. In addition to empirical rigor in RL, a variety of applications, recent theoretical results, a mini-theme we are exploring this year is how RL can best cope with problem scale and diversity (think: many agents, generalization, resource allocation problems).

This class is most suitable for graduate or advanced undergraduate students who are interested in advancing the research and practice of reinforcement learning---be it theory, methods, or applications. Please note that this is not primarily a deep RL course, though we will have coverage of some deep RL topics.

Big picture

Dynamic programming (5 lectures)
Approximate dynamic programming (9 lectures)
Special topics (~7 lectures)
Guest lectures (~3 lectures)
Final project presentations

The schedule is subject to minor changes.

Prerequisites

Students should be comfortable at the level of receiving an A grade in probability (6.041 or equivalent) and should be familiar with programming (Python). While an analysis prerequisite is not required, mathematical maturity is required.

Textbooks and readings

Useful references (recommended but not required)

Dynamic Programming and Optimal Control (2007), Vol. I, 4th Edition, ISBN-13: 978-1-886529-43-4 by Dimitri P. Bertsekas. [DPOC]
The second volume of the text is a useful and comprehensive reference. [DPOC2]
Neuro Dynamic Programming (1996) by Dimitri P. Bertsekas and John N. Tsitsiklis. [NDP]

Readings: We will give pointers to these references. Some additional readings / notes may be posted.

A note on notation: We will be using contemporary notation (e.g. s, a, V), which diﬀers from notation from these texts (e.g. x, u, J). We will be maximizing instead of minimizing, etc.

Course pointers

Website (you are here 🙂): https://web.mit.edu/6.7950/www/ (for lecture materials & general info)
Piazza: https://piazza.com/mit/fall2022/67950 (for class announcements, HW, solutions, readings). The Piazza is a great resource for you to collaborate with one another. We have a small staff who cannot address every question. Please come to office hours with questions. For obvious reasons, don't post homework answers before the due date in Piazza.
Gradescope: https://www.gradescope.com/courses/439106/ (for HW/quiz submissions)
Email: You can reach the staff generally via office hours or via 6-7950-staff at mit dot edu. Please include “[6.7950]” in your email subject line.

Grading

Grades will be determined according to the following weights:

7 homework assignments (30%)
1 in-class quiz (25%)
Class project + final project presentation (35%)
Class participation (10%)

Homework

Submissions are through Gradescope and due dates will be provided. Please register for the course on Gradescope with your MIT email.

We allow 4 late days across all homeworks. Solutions for homework will be released shortly after the deadline. Late submitters must abide by honor code.

If you are interested in finding pset partners, check out https://psetpartners.mit.edu. Sign up early; matching will be done at the end of the first week of classes.

Quiz

There will be one in-class quiz: Tuesday 11/08 (tentatively).

Final class project

A class project will be required. Projects can be either individual or 2-person teams. Of course, the expectations for 2-person projects will be higher. We have a strong preference for projects that apply RL to a concrete setting: for example, formulate a problem motivated by some problem that interests you, and study it, analytically or computationally. This year, we are trying out providing some sample project topics, some with volunteer project mentors.

A one-page project proposal is due TBD. There will be project presentations during the last week of classes (12/13). Depending on the number of groups presenting, this may take place during an extended class session. Presentations are due before class, the day of the presentation. Project reports are due 5 pm on Wednesday 12/14; this is a hard deadline, we cannot accommodate extensions.

Class participation

Class participation includes:

Live participation during lectures.
Answering questions for fellow students on Piazza.
Attending office hours and recitation.

Statement on collaboration and academic honesty

If you do collaborate on homework, you must cite, in your written solution, your collaborators. Also, if you use sources beyond the course materials in one of your solutions, e.g., a “friendly expert,” another text, or a ”bible”, be sure to cite the source. There is no penalty for such collaboration or use of other sources, as long as it is disclosed.

We encourage you to collaborate on homework. Study groups can be an excellent means to master course material. However, you must write up solutions on your own, neither copying solutions nor providing solutions to be copied. Duplicating a solution that someone else has written (verbatim or edited), or providing solutions for a fellow-student to copy, is not acceptable.

In general, we expect students to adhere to basic, common sense concepts of academic honesty. Presenting somebody else’s work as if it were your own, or cheating in exams, is of course unacceptable.

Instructor: Cathy Wu < cathywu at mit dot edu >

TA: Guilherme Cavalheiro < guivenca at mit dot edu >

Staff email: <6-7950-staff at mit dot edu>. Please include “[6.7950]” in your email subject line.

Lectures: TR 2:30 - 4:00 pm (4-237)

Recitations:

Office hours: