The fourth edition (February 2017) contains a substantial amount of new material, particularly on approximate DP in Chapter 6. This chapter was thoroughly reorganized and rewritten, to bring it in line, both with the contents of Vol. II, whose latest edition appeared in 2012, and with recent developments, which have propelled approximate DP to the forefront of attention.

Some of the highlights of the revision of Chapter 6 are an increased emphasis on one-step and multistep lookahead methods, parametric approximation architectures, neural networks, rollout, and Monte Carlo tree search. Among other applications, these methods have been instrumental in the recent spectacular success of computer Go programs. The material on approximate DP also provides an introduction and some perspective for the more analytically oriented treatment of Vol. II.

Click here for direct ordering from the publisher and preface, table of contents, supplementary educational material, lecture slides, videos, etc

Dynamic Programming and Optimal Control, Vol. I, ISBN-13: 978-1-886529-43-4, 576 pp., hardcover, 2017

The fourth edition of Vol. II of the two-volume DP textbook was published in June 2012. This is a major revision of Vol. II and contains a substantial amount of new material, as well as a reorganization of old material. The length has increased by more than 60% from the third edition, and most of the old material has been restructured and/or revised. Volume II now numbers more than 700 pages and is larger in size than Vol. I. It can arguably be viewed as a new book!

Approximate DP has become the central focal point of this volume, and occupies more than half of the book (the last two chapters, and large parts of Chapters 1-3). Thus one may also view this new edition as a followup of the author's 1996 book "Neuro-Dynamic Programming" (coauthored with John Tsitsiklis). A lot of new material, the outgrowth of research conducted in the six years since the previous edition, has been included.

A new printing of the fourth edition (January 2018) contains some updated material, particularly on undiscounted problems in Chapter 4, and approximate DP in Chapter 6. References were also made to the contents of the 2017 edition of Vol. I, and to high profile developments in deep reinforcement learning, which have brought approximate DP to the forefront of attention.

Dynamic Programming and Optimal Control, Vol. II: Approximate Dynamic Programming, ISBN-13: 978-1-886529-44-1, 712 pp., hardcover, 2012

Click here for an updated version of Chapter 4, which incorporates recent research on a variety of undiscounted problem topics, including

Click here for preface and detailed information.

Click here to order at Amazon.com

Lectures on Exact and Approximate Finite Horizon DP: Videos from a 4-lecture, 4-hour short course at the University of Cyprus on finite horizon DP, Nicosia, 2017. Videos from Youtube. (Lecture Slides: Lecture 1, Lecture 2, Lecture 3, Lecture 4.)

Videos from a 6-lecture, 12-hour short course at Tsinghua Univ., Beijing, China, 2014. From the Tsinghua course site, and from Youtube. Click here to download Approximate Dynamic Programming Lecture slides, for this 12-hour video course.

Click here to download lecture slides for a 7-lecture short course on Approximate Dynamic Programming, Caradache, France, 2012.

Click here to download lecture slides for the MIT course "Dynamic Programming and Stochastic Control (6.231), Dec. 2015. The last six lectures cover a lot of the approximate dynamic programming material.

Click here to download research papers and other material on Dynamic Programming and Approximate Dynamic Programming.

** Contents: **

**Discounted Problems - Theory**- Minimization of Total Cost - Introduction
- The Finite-Horizon DP Algorithm
- Shorthand Notation and Monotonicity
- A Preview of Infinite Horizon Results
- The Finite-Horizon DP Algorithm
- Randomized and History-Dependent Policies

- Discounted Problems - Bounded Cost per Stage
- Scheduling and Multiarmed Bandit Problems
- Discounted Continuous-Time Problems
- The Role of Contraction Mappings
- Sup-Norm Contractions
- Discounted Problems - Unbounded Cost per Stage

- General Forms of Discounted Dynamic Programming
- Basic Results Under Contraction and Monotonicity
- Discounted Dynamic Games

- Notes, Sources, and Exercises

- Minimization of Total Cost - Introduction
**Discounted Problems - Computational Methods**- Markovian Decision Problems
- Value Iteration
- Monotonic Error Bounds for Value Iteration
- Variants of Value Iteration
- Q-Learning

- Policy Iteration
- Policy Iteration for Costs
- Policy Iteration for Q-Factors
- Optimistic Policy Iteration
- Limited Lookahead Policies and Rollout

- Linear Programming Methods
- Methods for General Discounted Problems
- Limited Lookahead Policies and Approximations
- Generalized Value Iteration
- Approximate Value Iteration
- Generalized Policy Iteration
- Generalized Optimistic Policy Iteration
- Approximate Policy Iteration
- Mathematical Programming

- Asynchronous Algorithms
- Asynchronous Value Iteration
- Asynchronous Policy Iteration
- Policy Iteration with a Uniform Fixed Point

- Notes, Sources, and Exercises

**Stochastic Shortest Path Problems**- Problem Formulation
- Main Results
- Underlying Contraction Properties
- Value Iteration
- Conditions for Finite Termination
- Asynchronous Value Iteration

- Policy Iteration
- Optimistic Policy Iteration
- Approximate Policy Iteration
- Policy Iteration with Improper Policies
- Policy Iteration with a Uniform Fixed Point

- Countable State Spaces
- Notes, Sources, and Exercises

**Undiscounted Problems**- Unbounded Costs per Stage
- Main Results
- Value Iteration
- Other Computational Methods

- Linear Systems and Quadratic Cost
- Inventory Control
- Optimal Stopping
- Optimal Gambling Strategies
- Nonstationary and Periodic Problems
- Notes, Sources, and Exercises

- Unbounded Costs per Stage
**Average Cost per Stage Problems**- Finite-Spaces Average Cost Models
- Relation with the Discounted Cost Problem
- Blackwell Optimal Policies
- Optimality Equations

- Conditions for Equal Average Cost for all Initial States
- Value Iteration
- Single-Chain Value Iteration
- Multi-Chain Value Iteration

- Policy Iteration
- Single-Chain Policy Iteration
- Multi-Chain Policy Iteration

- Linear Programming
- Infinite-Spaces Problems
- A Sufficient Condition for Optimality
- Finite State Space and Infinite Control Space
- Countable States -- Vanishing Discount Approach
- Countable States -- Contraction Approach
- Linear Systems with Quadratic Cost

- Notes, Sources, and Exercises

- Finite-Spaces Average Cost Models
**Approximate Dynamic Programming - Discounted Models**- General Issues of Simulation-Based Cost Approximation
- Approximation Architectures
- Simulation-Based Approximate Policy Iteration
- Direct and Indirect Approximation
- Monte Carlo Simulation
- Simplifications

- Direct Policy Evaluation - Gradient Methods
- Projected Equation Methods for Policy Evaluation
- The Projected Bellman Equation
- The Matrix Form of the Projected Equation
- Simulation-Based Methods
- LSTD, LSPE, and TD(0) Methods
- Optimistic Versions
- Multistep Simulation-Based Methods
- A Synopsis

- Policy Iteration Issues
- Exploration Enhancement by Geometric Sampling
- Exploration Enhancement by Off-Policy Methods
- Policy Oscillations - Chattering

- Aggregation Methods
- Cost Approximation via the Aggregate Problem
- Cost Approximation via the Enlarged Problem
- Multistep Aggregation
- Asynchronous Distributed Aggregation

- Q-Learning
- Q-Learning: A Stochastic VI Algorithm
- Q-Learning and Policy Iteration
- Q-Factor Approximation and Projected Equations
- Q-Learning for Optimal Stopping Problems
- Q-Learning and Aggregation
- Finite Horizon Q-Learning

- Notes, Sources, and Exercises

- General Issues of Simulation-Based Cost Approximation
**Approximate Dynamic Programming - Nondiscounted Models and Generalizations**- Stochastic Shortest Path Problems
- Average Cost Problems
- Approximate Policy Evaluation
- Approximate Policy Iteration
- Q-Learning for Average Cost Problems

- General Problems and Monte Carlo Linear Algebra
- Projected Equations
- Matrix Inversion and Iterative Methods
- Multistep Methods
- Extension of Q-Learning for Optimal Stopping
- Equation Error Methods
- Oblique Projections
- Generalized Aggregation
- Deterministic Methods for Singular Linear Systems
- Stochastic Methods for Singular Linear Systems

- Approximation in Policy Space
- The Gradient Formula
- Computing the Gradient by Simulation
- Essential Features for Gradient Evaluation
- Approximations in Policy and Value Space

- Notes, Sources, and Exercises

**Appendix A: Measure-Theoretic Issues in Dynamic Programming**- A Two-Stage Example
- Resolution of the Measurability Issues