Simulation‐based Uniform Value Function Estimates of Markov Decision Processes

2006 ◽  
Vol 45 (5) ◽  
pp. 1633-1656 ◽  
Author(s):  
Rahul Jain ◽  
Pravin P. Varaiya
2006 ◽  
Vol 43 (3) ◽  
pp. 603-621 ◽  
Author(s):  
Huw W. James ◽  
E. J. Collins

This paper is concerned with the analysis of Markov decision processes in which a natural form of termination ensures that the expected future costs are bounded, at least under some policies. Whereas most previous analyses have restricted attention to the case where the set of states is finite, this paper analyses the case where the set of states is not necessarily finite or even countable. It is shown that all the existence, uniqueness, and convergence results of the finite-state case hold when the set of states is a general Borel space, provided we make the additional assumption that the optimal value function is bounded below. We give a sufficient condition for the optimal value function to be bounded below which holds, in particular, if the set of states is countable.


2020 ◽  
Vol 34 (10) ◽  
pp. 13845-13846
Author(s):  
Nishanth Kumar ◽  
Michael Fishman ◽  
Natasha Danas ◽  
Stefanie Tellex ◽  
Michael Littman ◽  
...  

We propose an abstraction method for open-world environments expressed as Factored Markov Decision Processes (FMDPs) with very large state and action spaces. Our method prunes state and action variables that are irrelevant to the optimal value function on the state subspace the agent would visit when following any optimal policy from the initial state. This method thus enables tractable fast planning within large open-world FMDPs.


2007 ◽  
Vol 7 (1) ◽  
pp. 59-92 ◽  
Author(s):  
Hyeong Soo Chang ◽  
Michael C. Fu ◽  
Jiaqiao Hu ◽  
Steven I. Marcus

Author(s):  
Hyeong Soo Chang ◽  
Jiaqiao Hu ◽  
Michael C. Fu ◽  
Steven I. Marcus

2020 ◽  
Vol 68 (4) ◽  
pp. 1231-1237
Author(s):  
Avery Haviv

Markov decision processes are commonly used to model forward-looking behavior. However, cyclic terms, including seasonality, are often omitted from these models because of the increase in computational burden. This paper develops a cyclic value function iteration (CVFI), an adjustment to the standard value function iteration. By updating states in a specific order, CVFI allows cyclic variables to be included in the state space with no increase in the computational cost. This result is proved theoretically and shown to hold closely in Monte Carlo simulations.


Sign in / Sign up

Export Citation Format

Share Document