Approximate Dynamic Programming with (min; +) linear function approximation for Markov decision processes

53rd IEEE Conference on Decision and Control ◽

10.1109/cdc.2014.7039626 ◽

2014 ◽

Cited By ~ 1

Author(s):

L. Chandrashekar ◽

Shalabh Bhatnagar

Keyword(s):

Dynamic Programming ◽

Linear Function ◽

Markov Decision Processes ◽

Function Approximation ◽

Approximate Dynamic Programming ◽

Decision Processes ◽

Linear Function Approximation ◽

Markov Decision

Download Full-text

A Performance Gradient Perspective on Approximate Dynamic Programming and its Application to Partially Observable Markov Decision Processes

IEEE International Symposium on Intelligent Control ◽

10.1109/isic.2006.285595 ◽

2006 ◽

Cited By ~ 3

Author(s):

James Dankert ◽

Lei Yang ◽

Jennie Si

Keyword(s):

Dynamic Programming ◽

Markov Decision Processes ◽

Approximate Dynamic Programming ◽

Decision Processes ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

A Performance

Download Full-text

New approximate dynamic programming algorithms for large-scale undiscounted Markov decision processes and their application to optimize a production and distribution system

European Journal of Operational Research ◽

10.1016/j.ejor.2015.07.026 ◽

2016 ◽

Vol 249 (1) ◽

pp. 22-31 ◽

Cited By ~ 13

Author(s):

Katsuhisa Ohno ◽

Toshitaka Boh ◽

Koichi Nakade ◽

Takayoshi Tamura

Keyword(s):

Dynamic Programming ◽

Markov Decision Processes ◽

Distribution System ◽

Large Scale ◽

Approximate Dynamic Programming ◽

Decision Processes ◽

Production And Distribution ◽

Markov Decision ◽

Programming Algorithms

Download Full-text

A performance gradient perspective on approximate dynamic programming and its application to partially observable Markov decision processes

10.1109/cacsd-cca-isic.2006.4776689 ◽

2006 ◽

Cited By ~ 3

Author(s):

James Dankert ◽

Lei Yang ◽

Jennie Si

Keyword(s):

Dynamic Programming ◽

Markov Decision Processes ◽

Approximate Dynamic Programming ◽

Decision Processes ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

A Performance

Download Full-text

A dynamic programming algorithm for decentralized Markov decision processes with a broadcast structure

49th IEEE Conference on Decision and Control (CDC) ◽

10.1109/cdc.2010.5718187 ◽

2010 ◽

Cited By ~ 12

Author(s):

Jeff Wu ◽

Sanjay Lall

Keyword(s):

Dynamic Programming ◽

Markov Decision Processes ◽

Dynamic Programming Algorithm ◽

Decision Processes ◽

Programming Algorithm ◽

Markov Decision

Download Full-text

A novel Q-learning algorithm with function approximation for constrained Markov decision processes

2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) ◽

10.1109/allerton.2012.6483246 ◽

2012 ◽

Cited By ~ 3

Author(s):

K. Lakshmanan ◽

Shalabh Bhatnagar

Keyword(s):

Markov Decision Processes ◽

Function Approximation ◽

Learning Algorithm ◽

Decision Processes ◽

Q Learning ◽

Constrained Markov Decision Processes ◽

Markov Decision

Download Full-text

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Technometrics ◽

10.1080/00401706.1995.10484354 ◽

1995 ◽

Vol 37 (3) ◽

pp. 353-353 ◽

Cited By ~ 6

Author(s):

Laurence A. Baxter

Keyword(s):

Dynamic Programming ◽

Markov Decision Processes ◽

Stochastic Dynamic Programming ◽

Decision Processes ◽

Stochastic Dynamic ◽

Markov Decision

Download Full-text

Learning and Optimal Control of Imprecise Markov Decision Processes by Dynamic Programming Using the Imprecise Dirichlet Model

Soft Methodology and Random Information Systems ◽

10.1007/978-3-540-44465-7_16 ◽

2004 ◽

pp. 141-148 ◽

Cited By ~ 1

Author(s):

Matthias C. M. Troffaes

Keyword(s):

Optimal Control ◽

Dynamic Programming ◽

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision ◽

Imprecise Dirichlet Model ◽

Dirichlet Model

Download Full-text

Kalman Temporal Differences

Journal of Artificial Intelligence Research ◽

10.1613/jair.3077 ◽

2010 ◽

Vol 39 ◽

pp. 483-532 ◽

Cited By ~ 29

Author(s):

M. Geist ◽

O. Pietquin

Keyword(s):

Markov Decision Processes ◽

Function Approximation ◽

Approximation Scheme ◽

State Of The Art ◽

Decision Processes ◽

Temporal Differences ◽

Special Cases ◽

Markov Decision ◽

Biased Estimates ◽

Q Function

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.

Download Full-text

Cost rate heuristics for semi-Markov decision processes

Journal of Applied Probability ◽

10.1017/s002190020004345x ◽

1992 ◽

Vol 29 (03) ◽

pp. 633-644

Author(s):

K. D. Glazebrook ◽

Michael P. Bailey ◽

Lyn R. Whitaker

Keyword(s):

Dynamic Programming ◽

Markov Decision Processes ◽

Preventive Maintenance ◽

Decision Processes ◽

Cost Rate ◽

Backwards Induction ◽

Optimal Policies ◽

Markov Decision ◽

Speed Of Evolution

In response to the computational complexity of the dynamic programming/backwards induction approach to the development of optimal policies for semi-Markov decision processes, we propose a class of heuristics resulting from an inductive process which proceeds forwards in time. These heuristics always choose actions in such a way as to minimize some measure of the current cost rate. We describe a procedure for calculating such cost rate heuristics. The quality of the performance of such policies is related to the speed of evolution (in a cost sense) of the process. A simple model of preventive maintenance is described in detail. Cost rate heuristics for this problem are calculated and assessed computationally.

Download Full-text

Dynamic Programming and Markov Decision Processes

The New Palgrave Dictionary of Economics ◽

10.1057/978-1-349-95189-5_80 ◽

2018 ◽

pp. 3158-3164

Author(s):

Steven A. Lippman

Keyword(s):

Dynamic Programming ◽

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision

Download Full-text