Rollout, Policy Iteration, and Distributed Reinforcement Learning

ISBN-10: 1886529078
ISBN-13: 9781886529076
Category: Computers
Pages: 498
Language: English
Published: 2021-08-20
Publisher: Athena Scientific
Author: Dimitri Bertsekas

Description

The purpose of this book is to develop in greater depth some of the methods from the author's Reinforcement Learning and Optimal Control recently published textbook (Athena Scientific, 2019). In particular, we present new research, relating to systems involving multiple agents, partitioned architectures, and distributed asynchronous computation. We pay special attention to the contexts of dynamic programming/policy iteration and control theory/model predictive control. We also discuss in some detail the application of the methodology to challenging discrete/combinatorial optimization problems, such as routing, scheduling, assignment, and mixed integer programming, including the use of neural network approximations within these contexts. The book focuses on the fundamental idea of policy iteration, i.e., start from some policy, and successively generate one or more improved policies. If just one improved policy is generated, this is called rollout, which, based on broad and consistent computational experience, appears to be one of the most versatile and reliable of all reinforcement learning methods. In this book, rollout algorithms are developed for both discrete deterministic and stochastic DP problems, and the development of distributed implementations in both multiagent and multiprocessor settings, aiming to take advantage of parallelism. Approximate policy iteration is more ambitious than rollout, but it is a strictly off-line method, and it is generally far more computationally intensive. This motivates the use of parallel and distributed computation. One of the purposes of the monograph is to discuss distributed (possibly asynchronous) methods that relate to rollout and policy iteration, both in the context of an exact and an approximate implementation involving neural networks or other approximation architectures. Much of the new research is inspired by the remarkable AlphaZero chess program, where policy iteration, value and policy networks, approximate lookahead minimization, and parallel computation all play an important role.

Get the book

Similar books

Reinforcement Learning and Optimal Control
By Dimitri Bertsekas
This book considers large and challenging multistage decision problems, which can be solved in principle by dynamic programming (DP), but their exact solution is computationally intractable.
Rollout, Policy Iteration, and Distributed Reinforcement Learning: Ce Lüe Qian Zhan, Ce Lüe Die Dai Yu Fen Bu Shi Qiang...
By Dimitri P. Bertsekas
Rollout, Policy Iteration, and Distributed Reinforcement Learning: Ce Lüe Qian Zhan, Ce Lüe Die Dai Yu Fen Bu Shi Qiang...
sgfrgds
By Dimitri P. Bertsekas, John N. Tsitsiklis
This is historically the first book that fully explained the neuro-dynamic programming/reinforcement learning methodology, a breakthrough in the practical application of neural networks and dynamic programming to complex problems of ...
Reinforcement Learning, second edition: An Introduction
By Richard S. Sutton, Andrew G. Barto
Technical Report CUED/F-INFENG/TR 166. Engineering Department, Cambridge ... Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., Wen, Z. (2018). ... Saddoris, M. P., Cacciapaglia, F., Wightmman, R. M., Carelli, R. M. (2015).
Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control
By Dimitri Bertsekas
“TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play,” Neural Computation, Vol. 6, pp. 215-219. [Tes95] Tesauro, G. J., 1995. ... Neural Approximations for Optimal Control and Decision, Springer.
Abstract Dynamic Programming: 3rd Edition
By Dimitri Bertsekas
This is the 3rd edition of a research monograph providing a synthesis of old research on the foundations of dynamic programming (DP), with the modern theory of approximate DP and new research on semicontractive models.
Efficient Reinforcement Learning Using Gaussian Processes
By Marc Peter Deisenroth
In Bishop, C. M. and Frey, B. J., editors, Ninth International Workshop on Artificial Intelligence and Statistics. Society for Artificial Intelligence and Statistics. Cited on pp. 26 and 28. Silverman, B. W. (1985).
Convex Optimization Theory
By Dimitri Bertsekas
The book may be used as a text for a theoretical convex optimization course; the author has taught several variants of such a course at MIT and elsewhere over the last ten years.
A Concise Introduction to Decentralized POMDPs
By Christopher Amato, Frans A. Oliehoek
This book introduces multiagent planning under uncertainty as formalized by decentralized partially observable Markov decision processes (Dec-POMDPs).
Algorithms for Reinforcement Learning
By Csaba Szepesvári
This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming.