Just Published by Athena Scientific: August 2020


The book is now available from the publishing company Athena Scientific, and from

This is a research monograph at the forefront of research on reinforcement learning, also referred to by other names such as approximate dynamic programming and neuro-dynamic programming. The purpose of the monograph is to develop in greater depth some of the methods from the author's recently published textbook on Reinforcement Learning (Athena Scientific, 2019). In particular, we present new research, relating to systems involving multiple agents, partitioned architectures, and distributed asynchronous computation. We pay special attention to the contexts of dynamic programming/policy iteration and control theory/model predictive control. We also discuss in some detail the application of the methodology to challenging discrete/combinatorial optimization problems, such as routing, scheduling, assignment, and mixed integer programming, including the use of neural network approximations within these contexts.

The book focuses on the fundamental idea of policy iteration, i.e., start from some policy, and successively generate one or more improved policies. If just one improved policy is generated, this is called rollout, which, based on broad and consistent computational experience, appears to be one of the most versatile and reliable of all reinforcement learning methods. Among others, it can be applied on-line using easily implementable simulation, and it can be used for discrete deterministic combinatorial optimization, as well as for stochastic Markov decision problems. Moreover, rollout can make on-line use of the policy produced off-line by policy iteration or by any other method (including a policy gradient method), and improve on the performance of that policy,

Much of the new research is inspired by the remarkable AlphaZero chess program, where policy iteration, value and policy networks, approximate lookahead minimization, and parallel computation all play an important role. In addition to the fundamental process of successive policy iteration/improvement, this program includes the use of deep neural networks for representation of both value functions and policies, the extensive use of large scale parallelization, and the simplification of lookahead minimization, through methods involving Monte Carlo tree search and pruning of the lookahead tree. In this book, we also focus on policy iteration, value and policy neural network representations, parallel and distributed computation, and lookahead simplification. Thus while there are significant differences, the principal design ideas that form the core of this monograph are shared by the AlphaZero architecture, except that we develop these ideas in a broader and less application-specific framework.

Among its special features, the book:

  • Presents new research relating to distributed asynchronous computation, partitioned architectures, and multiagent systems, with application to challenging large scale optimization problems, such as partially observed Markov decision problems.
  • Describes variants of rollout and policy iteration for problems with a multiagent structure, which allow a dramatic reduction of the computational requirements for lookahead minimization.
  • Establishes a connection of rollout with model predictive control, one of the most prominent control system design methodologies.
  • Describes the application of constrained and multiagent forms of rollout to challenging discrete and combinatorial optimization problems.

    Click here for preface and table of contents.


    ASU, 2020

    Dimitri P. Bertsekas


    Lecture slides from a course (2020) on Topics in Reinforcement Learning at Arizona State University (abbreviated due to the corona virus health crisis):

    Slides-Lecture 1, Slides-Lecture 2, Slides-Lecture 3, Slides-Lecture 4, Slides-Lecture 5, Slides-Lecture 6, Slides-Lecture 8.

    Video of an Overview Lecture on Distributed RL from IPAM workshop at UCLA, Feb. 2020 (Slides).

    Video of an Overview Lecture on Multiagent RL from a lecture at ASU, Oct. 2020 (Slides).


    The following papers and reports have a strong connection to material in the book, and amplify on its analysis and its range of applications.

  • Bertsekas, D., "Multiagent Reinforcement Learning: Rollout and Policy Iteration," ASU Report Sept. 2020; to be published in IEEE/CAA Journal of Automatica Sinica.

  • Bertsekas, D., "Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning," arXiv preprint, arXiv:2005.01627, April 2020.

  • Bertsekas, D., "Multiagent Rollout Algorithms and Reinforcement Learning," arXiv preprint arXiv:1910.00120, September 2019 (revised April 2020).

  • Bertsekas, D., "Constrained Multiagent Rollout and Multidimensional Assignment with the Auction Algorithm," arXiv preprint, arXiv:2002.07407 February 2020.

  • Bhattacharya, S., Badyal, S., Wheeler, W., Gil, S., Bertsekas, D.,"Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems," IEEE Robotics and Automation Letters, Vol. 5, pp. 3967-3974, 2020.

  • D. P. Bertsekas and H. Yu, "Distributed Asynchronous Policy Iteration in Dynamic Programming," Proc. of 2010 Allerton Conference on Communication, Control, and Computing, Allerton Park, ILL, Sept. 2010. (Related Lecture Slides) (An extended version with additional algorithmic analysis) (A counterexample by Williams and Baird that motivates in part this paper).


    Reinforcement Learning and Optimal Control


    Visits since February 15, 2020