The book is a comprehensive and theoretically sound treatment of the mathematical foundations of stochastic optimal control of discrete-time systems, including the treatment of the intricate measure-theoretic issues.
"Bertsekas and Shreve have written a fine book. The exposition is extremely clear
and a helpful introductory chapter provides orientation and a guide to the rather intimidating mass of literature
on the subject. Apart from anything else, the book serves as an excellent introduction to the arcane world of
analytic sets and other lesser known byways of measure theory."
Mark H. A. Davis, Imperial College, in IEEE
Trans. on Automatic Control "
A quotation relating to the book from p. 14 of the author's monograph "Lessons from AlphaZero for Optimal. Model Predictive, and Adaptive Control," published 45 years later:
The rigorous mathematical theory of stochastic optimal control, including the development of an appropriate measure-theoretic framework, dates to the 60s and 70s. It culminated in the monograph [BeS78], which provides the now "standard" framework, based on the formalism of Borel spaces, lower semianalytic functions, and universally measurable policies. This development involves daunting mathematical complications, which stem, among others, from the fact that when a Borel measurable function F(x,u), of the two variables x and u, is minimized with respect to u, the resulting function
G(x) = minimum over u of F(x,u)
need not be Borel measurable (it is lower semianalytic). Moreover, even if the minimum is attained by several functions/policies m, i.e., G(x)=F(x,m(x)) for all x, it is possible that none of these policies is Borel measurable (however, there does exist a minimizing policy that belongs to the broader class of universally measurable policies). Thus, starting with a Borel measurability framework for cost functions and policies, we quickly get outside that framework when executing DP algorithms, such as value and policy iteration. The broader framework of universal measurability is required to correct this deficiency, in the absence of additional (fairly strong) assumptions.
The monograph [BeS78] provides an extensive treatment of these issues, while Appendix A of the DP textbook [Ber12] provides a tutorial introduction. The followup work by Huizhen Yu and the author [YuB15] resolves the special measurability issues that relate to policy iteration, and provides additional analysis relating to value iteration. In the RL literature, the mathematical difficulties around measurability are usually neglected (as they are in the present book), and this is fine because they do not play an important role in applications. Moreover, measurability issues do not arise for problems involving finite or countably infinite state and control spaces. We note, however, that there are quite a few published works in RL as well as exact DP, which purport to address measurability issues with a mathematical narrative that is either confusing or plain incorrect.
Stochastic Optimal Control: The Discrete-Time Case
Monotone Mappings Underlying Dynamic Programming Models
Infinite Horizon Models under a Contraction Assumption
Infinite Horizon Models under Monotonicity Assumptions
A Generalized Abstract Dynamic Programming Model
Borel Spaces and their Probability Measures
The Finite Horizon Borel Model
The Infinite Horizon Borel Models
The Imperfect State Information Model
Appendix A: The Outer Integral
Appendix B: Additional Measurability Properties of Borel Spaces
Appendix C: The Hausdorff Metric and the Exponential Topology
References
Index