Stochastic Optimal Control: The Discrete-Time Case

Dimitri P. Bertsekas and Steven E. Shreve


This book was originally published by Academic Press in 1978, and republished by Athena Scientific in 1996 in paperback form. It can be purchased from Athena Scientific or it can be freely downloaded from here.

The book is a comprehensive and theoretically sound treatment of the mathematical foundations of stochastic optimal control of discrete-time systems, including the treatment of the intricate measure-theoretic issues.


For a summary of the mathematical framework of the book:

See D. P. Bertsekas, and S. E. Shreve, "Mathematical Issues in Dynamic Programming," an unpublished expository paper that provides orientation on the central mathematical issues for a comprehensive and rigorous theory of dynamic programming and stochastic control, as given in the authors' book "Stochastic Optimal Control: The Discrete-Time Case," Bertsekas and Shreve, Academic Press, 1978 (republished by Athena Scientific, 1996).

For an extended version of the summary, which also describes recent research by H. Yu and D. Bertsekas, see the Appendix of the book Dynamic Programming and Optimal Control, Vol. II, 4th Edition (by D. Bertsekas, Athena Scientific, 2012).


A quotation relating to the book from p. 14 of the author's monograph "Lessons from AlphaZero for Optimal. Model Predictive, and Adaptive Control," 2022:

The rigorous mathematical theory of stochastic optimal control, including the development of an appropriate measure-theoretic framework, dates to the 60s and 70s. It culminated in the monograph [BeS78], which provides the now "standard" framework, based on the formalism of Borel spaces, lower semianalytic functions, and universally measurable policies. This development involves daunting mathematical complications, which stem, among others, from the fact that when a Borel measurable function F(x,u), of the two variables x and u, is minimized with respect to u, the resulting function

G(x) = minimum over u of F(x,u)

need not be Borel measurable (it is lower semianalytic). Moreover, even if the minimum is attained by several functions/policies m, i.e., G(x)=F(x,m(x)) for all x, it is possible that none of these policies is Borel measurable (however, there does exist a minimizing policy that belongs to the broader class of universally measurable policies). Thus, starting with a Borel measurability framework for cost functions and policies, we quickly get outside that framework when executing DP algorithms, such as value and policy iteration. The broader framework of universal measurability is required to correct this deficiency, in the absence of additional (fairly strong) assumptions.

The monograph [BeS78] provides an extensive treatment of these issues, while Appendix A of the DP textbook [Ber12] provides a tutorial introduction. The followup work by Huizhen Yu and the author [YuB15] resolves the special measurability issues that relate to policy iteration, and provides additional analysis relating to value iteration. In the RL literature, the mathematical difficulties around measurability are usually neglected (as they are in the present book), and this is fine because they do not play an important role in applications. Moreover, measurability issues do not arise for problems involving finite or countably infinite state and control spaces. We note, however, that there are quite a few published works in RL as well as exact DP, which purport to address measurability issues with a mathematical narrative that is either confusing or plain incorrect.


Review :

"Bertsekas and Shreve have written a fine book. The exposition is extremely clear and a helpful introductory chapter provides orientation and a guide to the rather intimidating mass of literature on the subject. Apart from anything else, the book serves as an excellent introduction to the arcane world of analytic sets and other lesser known byways of measure theory."
Mark H. A. Davis, Imperial College, in IEEE Trans. on Automatic Control "


Table of Contents: Stochastic Optimal Control: The Discrete-Time Case


  1. Introduction

    1. Structure of Sequential Decision Problems
    2. Discrete-Time Optimal Control Problems - Measurability Questions
    3. The Present Work Related to the Literature

  2. Monotone Mappings Underlying Dynamic Programming Models

    1. Notation and Assumptions
    2. Main Results
    3. Application to Specific Models
      1. Deterministic Optimal Control
      2. Stochastic Optimal Control - Countable Disturbance Space
      3. Stochastic Optimal Control - Outer Integral Formulation
      4. Stochastic Optimal Control - Multiplicable Cost Functional
      5. Minimax Control

  3. Finite Horizon Models

    1. General Results and Assumptions
    2. Main Results
    3. Application to Specific Models

  4. Infinite Horizon Models under a Contraction Assumption

    1. General Results and Assumptions
    2. Convergence and Existence Results
    3. Computational Methods
      1. Successive Approximation
      2. Policy Iteration
      3. Mathematical Programming
    4. Application to Specific Models

  5. Infinite Horizon Models under Monotonicity Assumptions

    1. General Results and Assumptions
    2. The Optimality Equation
    3. Characterization of Optimal Policies
    4. Convergence of the Dynamic Programming Algorithm - Existence of Stationary Policies
    5. Application to Specific Models

  6. A Generalized Abstract Dynamic Programming Model

    1. General Results and Assumptions
    2. Analysis of Finite Horizon Models
    3. Analysis of Infinite Horizon Models under a Contraction Assumption

  7. Borel Spaces and their Probability Measures

    1. Notation
    2. Metrizable Spaces
    3. Borel Spaces
    4. Probability Measures on Borel Spaces
      1. Characterization of Probability Measures
      2. The Weak Topology
      3. Stochastic Kernels
      4. Integration
    5. Semicontinuous Functions and Borel-Measurable Selection
    6. Analytic Sets
      1. Equivalent Definitions of Analytic Sets
      2. Measurability Properties of Analytic Sets
      3. An Analytic Set of Probability Measures
    7. Lower Semianalytic Functions and Universally Measurable Selection

  8. The Finite Horizon Borel Model

    1. The Model
    2. The Dynamic Programming Algorithm - Existence of Optimal and epsilon-Optimal Policies
    3. The Semincontinuous Models

  9. The Infinite Horizon Borel Models

    1. The Stochastic Model
    2. The Deterministic Model
    3. Relations Between the Models
    4. The Optimality Equation - Characterization of Optimal Policies
    5. Convergence of the Dynamic Programming Algorithm - Existence of Stationary Optimal Policies
    6. Existence of epsilon-Optimal Policies

  10. The Imperfect State Information Model

    1. Reduction of the Nonstationary Model - State Augmentation
    2. Reduction of the Imperfect State Information Model - Sufficient Statistics
    3. Existence of Sufficient Statistics for Control
      1. Filtering and the Conditional Distribution of the States
      2. The Identity Mappings

  11. Miscellaneous

    1. Limit-Measurable Policies
    2. Analytically Measurable Policies
    3. Models with Multiplicative Cost

  12. Appendix A: The Outer Integral

  13. Appendix B: Additional Measurability Properties of Borel Spaces

  14. Appendix C: The Hausdorff Metric and the Exponential Topology

  • References

  • Index