Rollout, Policy Iteration, and Distributed Reinforcement Learning

Rollout, Policy Iteration, and Distributed Reinforcement Learning
ISBN-10
1886529078
ISBN-13
9781886529076
Category
Computers
Pages
498
Language
English
Published
2021-08-20
Publisher
Athena Scientific
Author
Dimitri Bertsekas

Description

The purpose of this book is to develop in greater depth some of the methods from the author's Reinforcement Learning and Optimal Control recently published textbook (Athena Scientific, 2019). In particular, we present new research, relating to systems involving multiple agents, partitioned architectures, and distributed asynchronous computation. We pay special attention to the contexts of dynamic programming/policy iteration and control theory/model predictive control. We also discuss in some detail the application of the methodology to challenging discrete/combinatorial optimization problems, such as routing, scheduling, assignment, and mixed integer programming, including the use of neural network approximations within these contexts. The book focuses on the fundamental idea of policy iteration, i.e., start from some policy, and successively generate one or more improved policies. If just one improved policy is generated, this is called rollout, which, based on broad and consistent computational experience, appears to be one of the most versatile and reliable of all reinforcement learning methods. In this book, rollout algorithms are developed for both discrete deterministic and stochastic DP problems, and the development of distributed implementations in both multiagent and multiprocessor settings, aiming to take advantage of parallelism. Approximate policy iteration is more ambitious than rollout, but it is a strictly off-line method, and it is generally far more computationally intensive. This motivates the use of parallel and distributed computation. One of the purposes of the monograph is to discuss distributed (possibly asynchronous) methods that relate to rollout and policy iteration, both in the context of an exact and an approximate implementation involving neural networks or other approximation architectures. Much of the new research is inspired by the remarkable AlphaZero chess program, where policy iteration, value and policy networks, approximate lookahead minimization, and parallel computation all play an important role.

Similar books

  • Reinforcement Learning and Optimal Control
    By Dimitri Bertsekas

    This book considers large and challenging multistage decision problems, which can be solved in principle by dynamic programming (DP), but their exact solution is computationally intractable.

  • Rollout, Policy Iteration, and Distributed Reinforcement Learning: Ce Lüe Qian Zhan, Ce Lüe Die Dai Yu Fen Bu Shi Qiang...
    By Dimitri P. Bertsekas

    Rollout, Policy Iteration, and Distributed Reinforcement Learning: Ce Lüe Qian Zhan, Ce Lüe Die Dai Yu Fen Bu Shi Qiang...

  • sgfrgds
    By Dimitri P. Bertsekas, John N. Tsitsiklis

    This is historically the first book that fully explained the neuro-dynamic programming/reinforcement learning methodology, a breakthrough in the practical application of neural networks and dynamic programming to complex problems of ...

  • Reinforcement Learning, second edition: An Introduction
    By Richard S. Sutton, Andrew G. Barto

    Technical Report CUED/F-INFENG/TR 166. Engineering Department, Cambridge ... Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., Wen, Z. (2018). ... Saddoris, M. P., Cacciapaglia, F., Wightmman, R. M., Carelli, R. M. (2015).

  • Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control
    By Dimitri Bertsekas

    “TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play,” Neural Computation, Vol. 6, pp. 215-219. [Tes95] Tesauro, G. J., 1995. ... Neural Approximations for Optimal Control and Decision, Springer.

  • Abstract Dynamic Programming: 3rd Edition
    By Dimitri Bertsekas

    This is the 3rd edition of a research monograph providing a synthesis of old research on the foundations of dynamic programming (DP), with the modern theory of approximate DP and new research on semicontractive models.

  • Efficient Reinforcement Learning Using Gaussian Processes
    By Marc Peter Deisenroth

    In Bishop, C. M. and Frey, B. J., editors, Ninth International Workshop on Artificial Intelligence and Statistics. Society for Artificial Intelligence and Statistics. Cited on pp. 26 and 28. Silverman, B. W. (1985).

  • Convex Optimization Theory
    By Dimitri Bertsekas

    The book may be used as a text for a theoretical convex optimization course; the author has taught several variants of such a course at MIT and elsewhere over the last ten years.

  • A Concise Introduction to Decentralized POMDPs
    By Christopher Amato, Frans A. Oliehoek

    This book introduces multiagent planning under uncertainty as formalized by decentralized partially observable Markov decision processes (Dec-POMDPs).

  • Algorithms for Reinforcement Learning
    By Csaba Szepesvári

    This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming.