6.246 Spring 21      

Reinforcement Learning: Foundations and Methods


Instructor: Cathy Wu < cathywu at mit dot edu >

TAs: Tiancheng Yu, Sirui Li

Staff email: 6-246-staff at mit dot edu. Please include “[6.246]” in your email subject line.

Lectures: TR 1:00 - 2:30 pm (Zoom)


F 9:00 - 10:00 am (Zoom)
F 3:00 - 4:00 pm (Zoom)

Office hours:

Prof. Wu: TR 2:30-3pm (Zoom)
Sirui: M 3-4pm (Zoom)
Tiancheng: W 4-5pm (Zoom)

Course description

This subject counts as a Control concentration subject. Reinforcement learning (RL) as a methodology for approximately solving sequential decision-making under uncertainty, with foundations in optimal control and machine learning. Finite horizon and infinite horizon dynamic programming, focusing on discounted Markov decision processes. Value and policy iteration. Monte Carlo, temporal differences, Q-learning, and stochastic approximation. Approximate dynamic programming, including value-based methods and policy space methods. Special topics at the boundary of theory and practice in RL. Applications and examples drawn from diverse domains. While an analysis prerequisite is not required, mathematical maturity is necessary. Enrollment limited.

Course format and scope

This course will be half mathematical foundations of RL, and half exploring the interface of the foundations with special topics. This experimental course is meant to be an advanced graduate course, to explore possible alternative ways and perspectives on studying reinforcement learning.

Big picture

Expectations and prerequisites

There is a large class participation component, including student lecture presentations and asynchronous post-lecture discussion on Piazza. In terms of prerequisites, students should be comfortable at the level of receiving an A grade in probability (6.041 or equivalent), machine learning (6.867 or equivalent), convex optimization (from 6.255 / 6.036 / 6.867 or equivalent), linear algebra (18.06 or equivalent), and programming (Python). Mathematical maturity is required. This is not a Deep RL course. This class is most suitable for PhD students who have already been exposed to the basics of reinforcement learning and deep learning (as in 6.036 / 6.867 / 1.041 / 1.200), and are conducting or have conducted research in these topics.

Textbooks and readings

  1. Required: Dynamic Programming and Optimal Control (2007), Vol. I, 4th Edition, ISBN-13: 978-1-886529-43-4 by Dimitri P. Bertsekas. [DPOC]
  2. The second volume of the text is a useful and comprehensive reference. It is recommended, but not required.
  3. Required: Neuro Dynamic Programming (1996) by Dimitri P. Bertsekas and John N. Tsitsiklis. [NDP]

Course pointers

  1. Website: https://web.mit.edu/6.246/www/ (for class materials & info)
  2. Canvas: https://canvas.mit.edu/courses/7560 (for Zoom links/recordings)
  3. Piazza: http://www.piazza.com/mit/spring2021/6246 (for class announcements, solutions)
    The Piazza is also a resource for you to collaborate with one another. For obvious reasons, don't post answers in Piazza. The staff will monitor but have minimal involvement. Please come to office hours with questions.
  4. Gradescope: https://www.gradescope.com/courses/246411 (for HW/quiz submissions)
  5. Email: You can reach the staff generally via office hours or via 6-246-staff at mit dot edu. Please include “[6.246]” in your email subject line.


Grades will be determined according to the following weights:


Assignments will be released ~1.5 weeks ahead of the due date (typical coverage of 2-3 lectures). Submissions are through Gradescope and due dates will be provided. Please register for the course on Gradescope with your MIT email. Late homework will be penalized 10% every 24 hours. The solutions for homework will be released shortly after the deadline (those submitting late must abide by honor code).

If you are interested in finding pset partners, check out https://psetpartners.mit.edu. Sign up early; matching will be done at the end of the first week of classes.


There will be one in-class quiz: Tuesday 04/13 (tentatively).

Final class project

A class project will be required. Projects can be either individual or from 2-person teams. Of course, the expectations for 2-person projects will be higher. We have a strong preference for projects that apply (approximate) dynamic programming to a concrete setting: for example, formulate a problem motivated by some application that interests you, and study it, analytically or computationally. More detailed instructions, together with pointers to the literature and possible topics will be provided in due time.

A one-page project proposal is due on Tuesday 04/06. There will be project presentations during the last weeks of classes (05/18, 05/20). Depending on the number of groups presenting, we may opt to do a (virtual) poster session instead. Presentation slides are due before class the day of your presentation. Project reports will be due 5 pm on Thursday 5/20.

Student lecture presentations & panels

In the second half of the course, each student will give a short lecture presentation (~10 minutes) which extends the core material or connects the core material to more recent advances. Students will have an opportunity to inspect modern topics from a foundational lens and/or explore the boundary of theory and practice in reinforcement learning. The presentations will be grouped by topic and the speakers will engage in a “panel” discussion with the rest of the class following the presentations.

For this, students should be prepared to read and synthesize theoretical and/or empirical research papers and materials into informative lectures (and recitations, as needed). To aid in preparation, there will be a TA mentoring process for each lecture, and multiple checkpoints. Details will be provided in due time. Specific topics will be selected through a combination of staff and student interest, and may include deep RL, state abstractions, hierarchy, exploration, off-policy learning, transfer learning, finite sample analysis, combinatorial optimization, model-based RL, multi-agent RL, and game theory.

Class participation

Class participation includes:
  1. Asychronous discussion (on Piazza) following student lecture presentations.
  2. Live participation during lectures.
  3. Answering questions for fellow students on Piazza.
  4. Attending office hours and recitation.

Lecture scribing (extra credit)

You may sign up to scribe for lectures in LaTeX. The TAs will send out a spreadsheet corresponding allowing students to sign up to scribe for lectures. Up to three people can sign up to scribe each lecture as a team on a first-come-first-serve basis. We expect teams to coordinate on which parts of the lecture to scribe to avoid overlap. We ask each team to provide a list of contributions by each team member, as well as the scribe for each section.

Each lecture is worth up to 5% of extra credit, depending on scribe quality. We credit each student with the portions of the lecture that they scribe. As an example, if three people contribute equally to a well-scribed lecture, they each receive 1.66% of extra credit. Each student may receive extra credit from multiple lectures, but we cap the total extra credit to 5%. Students who have not yet scribed will receive priority over those who have scribed already. In the event of a dispute, please contact the staff.

Statement on collaboration and academic honesty

If you do collaborate on homework, you must cite, in your written solution, your collaborators. Also, if you use sources beyond the course materials in one of your solutions, e.g., a “friendly expert,” another text, or a ”bible”, be sure to cite the source. There is no penalty for such collaboration or use of other sources, as long as it is disclosed.

We encourage you to collaborate on homework. Study groups can be an excellent means to master course material. However, you must write up solutions on your own, neither copying solutions nor providing solutions to be copied. Duplicating a solution that someone else has written (verbatim or edited), or providing solutions for a fellow-student to copy, is not acceptable.

In general, we expect students to adhere to basic, common sense concepts of academic honesty. Presenting somebody else’s work as if it were your own, or cheating in exams, is of course unacceptable.