Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes

We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an elegant formalism that capture behavioral equivalence between states and provide strong theoretical guarantees on differences in optimal behaviour. Unfortunately, their computation is expensive and requires a tabular representation of the states, which has thus far rendered them impractical for large problems. In this paper we present a new version of the metric that is tied to a behavior policy in an MDP, along with an analysis of its theoretical properties. We then present two new algorithms for approximating bisimulation metrics in large, deterministic MDPs. The first does so via sampling and is guaranteed to converge to the true metric. The second is a differentiable loss which allows us to learn an approximation even for continuous state MDPs, which prior to this work had not been possible.

Download Full-text

A novel Q-learning algorithm with function approximation for constrained Markov decision processes

2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) ◽

10.1109/allerton.2012.6483246 ◽

2012 ◽

Cited By ~ 3

Author(s):

K. Lakshmanan ◽

Shalabh Bhatnagar

Keyword(s):

Markov Decision Processes ◽

Function Approximation ◽

Learning Algorithm ◽

Decision Processes ◽

Q Learning ◽

Constrained Markov Decision Processes ◽

Markov Decision

Download Full-text

Simulation‐based Uniform Value Function Estimates of Markov Decision Processes

SIAM Journal on Control and Optimization ◽

10.1137/040619508 ◽

2006 ◽

Vol 45 (5) ◽

pp. 1633-1656 ◽

Cited By ~ 13

Author(s):

Rahul Jain ◽

Pravin P. Varaiya

Keyword(s):

Markov Decision Processes ◽

Value Function ◽

Decision Processes ◽

Uniform Value ◽

Simulation Based ◽

Markov Decision

Download Full-text

Maintenance planning using continuous-state partially observable Markov decision processes and non-linear action models

Structure and Infrastructure Engineering ◽

10.1080/15732479.2015.1076485 ◽

2015 ◽

Vol 12 (8) ◽

pp. 977-994 ◽

Cited By ~ 15

Author(s):

Roland Schöbi ◽

Eleni N. Chatzi

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Maintenance Planning ◽

Linear Action ◽

Continuous State ◽

Non Linear ◽

Markov Decision ◽

Action Models ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Value Function Discovery in Markov Decision Processes With Evolutionary Algorithms

IEEE Transactions on Systems Man and Cybernetics Systems ◽

10.1109/tsmc.2015.2475716 ◽

2016 ◽

Vol 46 (9) ◽

pp. 1190-1201 ◽

Cited By ~ 5

Author(s):

Martijn Onderwater ◽

Sandjai Bhulai ◽

Rob van der Mei

Keyword(s):

Evolutionary Algorithms ◽

Markov Decision Processes ◽

Value Function ◽

Decision Processes ◽

Markov Decision ◽

Function Discovery

Download Full-text

An analysis of transient Markov decision processes

Journal of Applied Probability ◽

10.1239/jap/1158784933 ◽

2006 ◽

Vol 43 (3) ◽

pp. 603-621 ◽

Cited By ~ 5

Author(s):

Huw W. James ◽

E. J. Collins

Keyword(s):

Markov Decision Processes ◽

Value Function ◽

Decision Processes ◽

Optimal Value Function ◽

Natural Form ◽

Convergence Results ◽

Optimal Value ◽

Finite State ◽

Markov Decision ◽

Bounded Below

This paper is concerned with the analysis of Markov decision processes in which a natural form of termination ensures that the expected future costs are bounded, at least under some policies. Whereas most previous analyses have restricted attention to the case where the set of states is finite, this paper analyses the case where the set of states is not necessarily finite or even countable. It is shown that all the existence, uniqueness, and convergence results of the finite-state case hold when the set of states is a general Borel space, provided we make the additional assumption that the optimal value function is bounded below. We give a sufficient condition for the optimal value function to be bounded below which holds, in particular, if the set of states is countable.

Download Full-text

Kalman Temporal Differences

Journal of Artificial Intelligence Research ◽

10.1613/jair.3077 ◽

2010 ◽

Vol 39 ◽

pp. 483-532 ◽

Cited By ~ 29

Author(s):

M. Geist ◽

O. Pietquin

Keyword(s):

Markov Decision Processes ◽

Function Approximation ◽

Approximation Scheme ◽

State Of The Art ◽

Decision Processes ◽

Temporal Differences ◽

Special Cases ◽

Markov Decision ◽

Biased Estimates ◽

Q Function

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.

Download Full-text

Max-Plus Linear Approximations for Deterministic Continuous-State Markov Decision Processes

IEEE Control Systems Letters ◽

10.1109/lcsys.2020.2973199 ◽

2020 ◽

Vol 4 (3) ◽

pp. 767-772

Author(s):

Eloise Berthier ◽

Francis Bach

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Linear Approximations ◽

Continuous State ◽

Markov Decision

Download Full-text

Task Scoping for Efficient Planning in Open Worlds (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7195 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13845-13846

Author(s):

Nishanth Kumar ◽

Michael Fishman ◽

Natasha Danas ◽

Stefanie Tellex ◽

Michael Littman ◽

...

Keyword(s):

Markov Decision Processes ◽

Value Function ◽

Decision Processes ◽

Initial State ◽

Open World ◽

Optimal Value ◽

Markov Decision ◽

Efficient Planning ◽

Action Spaces ◽

Action Variables

We propose an abstraction method for open-world environments expressed as Factored Markov Decision Processes (FMDPs) with very large state and action spaces. Our method prunes state and action variables that are irrelevant to the optimal value function on the state subspace the agent would visit when following any optimal policy from the initial state. This method thus enables tractable fast planning within large open-world FMDPs.

Download Full-text