Dynamic programming is an extremely powerful optimization approach used for the solution of problems which can be formulated to exhibit a serial stage-state structure. However, many design problems are not serial but have highly connected interdependent structures. Existing methods for the solution of nonserial problems require the problem to possess a certain structure or limit the size of the problem due to storage and computational time requirements. The aim of this paper is to show that nonserial problems can be solved by the use of dynamic programming incorporating algorithms based on heuristics. Two such algorithms are developed using artificial intelligence concepts of estimating the likelihood of future results on present decisions. The algorithms are explained in detail. A small problem is solved and the results of testing them on large scale problems are given. The method is then used to solve a problem drawn from the literature.
Although this textbook is intended for use in a two-semester sequence of courses introducing the mathematical methods of operations research, Part I can also be used alone for a one-semester course on linear programming.
Rollout, Policy Iteration, and Distributed Reinforcement Learning: Ce Lüe Qian Zhan, Ce Lüe Die Dai Yu Fen Bu Shi Qiang...
Stochastic Optimal Control: The Discrete Time Case