The book is available from the publishing company Athena Scientific, and from Amazon.com. It is also available as an Ebook from Google Books.
This is a research monograph at the forefront of research on reinforcement learning, also referred to by other names such as approximate dynamic programming and neuro-dynamic programming. The purpose of the monograph is to develop in greater depth some of the methods from the author's recently published textbook on Reinforcement Learning (Athena Scientific, 2019). In particular, we present new research, relating to systems involving multiple agents, partitioned architectures, and distributed asynchronous computation. We pay special attention to the contexts of dynamic programming/policy iteration and control theory/model predictive control. We also discuss in some detail the application of the methodology to challenging discrete/combinatorial optimization problems, such as routing, scheduling, assignment, and mixed integer programming, including the use of neural network approximations within these contexts.
The book focuses on the fundamental idea of policy iteration, i.e., start from some policy, and successively generate one or more improved policies. If just one improved policy is generated, this is called rollout, which, based on broad and consistent computational experience, appears to be one of the most versatile and reliable of all reinforcement learning methods. Among others, it can be applied on-line using easily implementable simulation, and it can be used for discrete deterministic combinatorial optimization, as well as for stochastic Markov decision problems. Moreover, rollout can make on-line use of the policy produced off-line by policy iteration or by any other method (including a policy gradient method), and improve on the performance of that policy,
Much of the new research is inspired by the remarkable AlphaZero chess program, where policy iteration, value and policy networks, approximate lookahead minimization, and parallel computation all play an important role. In addition to the fundamental process of successive policy iteration/improvement, this program includes the use of deep neural networks for representation of both value functions and policies, the extensive use of large scale parallelization, and the simplification of lookahead minimization, through methods involving Monte Carlo tree search and pruning of the lookahead tree. In this book, we also focus on policy iteration, value and policy neural network representations, parallel and distributed computation, and lookahead simplification. Thus while there are significant differences, the principal design ideas that form the core of this monograph are shared by the AlphaZero architecture, except that we develop these ideas in a broader and less application-specific framework.
Among its special features, the book:
The book was developed during the 2020 offering of the Topics in Reinforcement Learning course at Arizona State University (abbreviated due to the corona virus health crisis).
Lecture slides from the 2020 course, and two related videolectures:
Slides-Lecture 1, Slides-Lecture 2, Slides-Lecture 3, Slides-Lecture 4, Slides-Lecture 5, Slides-Lecture 6, Slides-Lecture 8.
Video of an Overview Lecture on Distributed RL from IPAM workshop at UCLA, Feb. 2020 (Slides).
Video of an Overview Lecture on Multiagent RL from a lecture at ASU, Oct. 2020 (Slides).
The following papers and reports have a strong connection to material in the book, and amplify on its analysis and its range of applications.
Reinforcement Learning and Optimal Control
Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control
Visits since February 15, 2020