Kalman Temporal Differences

Journal of Artificial Intelligence Research ◽

10.1613/jair.3077 ◽

2010 ◽

Vol 39 ◽

pp. 483-532 ◽

Cited By ~ 29

Author(s):

M. Geist ◽

O. Pietquin

Keyword(s):

Markov Decision Processes ◽

Function Approximation ◽

Approximation Scheme ◽

State Of The Art ◽

Decision Processes ◽

Temporal Differences ◽

Special Cases ◽

Markov Decision ◽

Biased Estimates ◽

Q Function

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.

Download Full-text

A novel Q-learning algorithm with function approximation for constrained Markov decision processes

2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) ◽

10.1109/allerton.2012.6483246 ◽

2012 ◽

Cited By ~ 3

Author(s):

K. Lakshmanan ◽

Shalabh Bhatnagar

Keyword(s):

Markov Decision Processes ◽

Function Approximation ◽

Learning Algorithm ◽

Decision Processes ◽

Q Learning ◽

Constrained Markov Decision Processes ◽

Markov Decision

Download Full-text

Simple Regret Optimization in Online Planning for Markov Decision Processes

Journal of Artificial Intelligence Research ◽

10.1613/jair.4432 ◽

2014 ◽

Vol 51 ◽

pp. 165-205 ◽

Cited By ~ 5

Author(s):

Z. Feldman ◽

C. Domshlak

Keyword(s):

Markov Decision Processes ◽

State Of The Art ◽

Search Algorithm ◽

Empirical Evaluation ◽

Decision Processes ◽

Monte Carlo Tree Search ◽

Performance Loss ◽

Online Planning ◽

Markov Decision ◽

High Level

We consider online planning in Markov decision processes (MDPs). In online planning, the agent focuses on its current state only, deliberates about the set of possible policies from that state onwards and, when interrupted, uses the outcome of that exploratory deliberation to choose what action to perform next. Formally, the performance of algorithms for online planning is assessed in terms of simple regret, the agent's expected performance loss when the chosen action, rather than an optimal one, is followed. To date, state-of-the-art algorithms for online planning in general MDPs are either best effort, or guarantee only polynomial-rate reduction of simple regret over time. Here we introduce a new Monte-Carlo tree search algorithm, BRUE, that guarantees exponential-rate and smooth reduction of simple regret. At a high level, BRUE is based on a simple yet non-standard state-space sampling scheme, MCTS2e, in which different parts of each sample are dedicated to different exploratory objectives. We further extend BRUE with a variant of ``learning by forgetting.'' The resulting parametrized algorithm, BRUE(alpha), exhibits even more attractive formal guarantees than BRUE. Our empirical evaluation shows that both BRUE and its generalization, BRUE(alpha), are also very effective in practice and compare favorably to the state-of-the-art.

Download Full-text

Function approximation for large markov decision processes using self-organizing neural networks

2015 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2015.7280608 ◽

2015 ◽

Author(s):

Teck-Hou Teng

Keyword(s):

Neural Networks ◽

Markov Decision Processes ◽

Function Approximation ◽

Decision Processes ◽

Markov Decision ◽

Self Organizing

Download Full-text

An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

Journal of Optimization Theory and Applications ◽

10.1007/s10957-012-9989-5 ◽

2012 ◽

Vol 153 (3) ◽

pp. 688-708 ◽

Cited By ~ 7

Author(s):

Shalabh Bhatnagar ◽

K. Lakshmanan

Keyword(s):

Markov Decision Processes ◽

Function Approximation ◽

Decision Processes ◽

Constrained Markov Decision Processes ◽

Markov Decision

Download Full-text

Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes

Robotics: Science and Systems XVI ◽

10.15607/rss.2020.xvi.050 ◽

2020 ◽

Author(s):

Junhong Xu ◽

Kai Yin ◽

Lantao Liu

Keyword(s):

Markov Decision Processes ◽

Function Approximation ◽

Value Function ◽

Decision Processes ◽

Value Function Approximation ◽

Continuous State ◽

Markov Decision

Download Full-text

An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes

Systems & Control Letters ◽

10.1016/j.sysconle.2010.08.013 ◽

2010 ◽

Vol 59 (12) ◽

pp. 760-766 ◽

Cited By ~ 13

Author(s):

Shalabh Bhatnagar

Keyword(s):

Markov Decision Processes ◽

Function Approximation ◽

Decision Processes ◽

Discounted Cost ◽

Constrained Markov Decision Processes ◽

Markov Decision

Download Full-text

Risk-sensitive semi-Markov decision processes with general utilities and multiple criteria

Advances in Applied Probability ◽

10.1017/apr.2018.36 ◽

2018 ◽

Vol 50 (3) ◽

pp. 783-804

Author(s):

Yonghui Huang ◽

Zhaotong Lian ◽

Xianping Guo

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Finite Horizon ◽

Performance Criteria ◽

Occupation Measure ◽

Constrained Problems ◽

Constrained Problem ◽

Risk Sensitive ◽

Special Cases ◽

Markov Decision

Abstract In this paper we investigate risk-sensitive semi-Markov decision processes with a Borel state space, unbounded cost rates, and general utility functions. The performance criteria are several expected utilities of the total cost in a finite horizon. Our analysis is based on a type of finite-horizon occupation measure. We express the distribution of the finite-horizon cost in terms of the occupation measure for each policy, wherein the discount is not needed. For unconstrained and constrained problems, we establish the existence and computation of optimal policies. In particular, we develop a linear program and its dual program for the constrained problem and, moreover, establish the strong duality between the two programs. Finally, we provide two special cases of our results, one of which concerns the discrete-time model, and the other the chance-constrained problem.

Download Full-text

Approximate Dynamic Programming with (min; +) linear function approximation for Markov decision processes

53rd IEEE Conference on Decision and Control ◽

10.1109/cdc.2014.7039626 ◽

2014 ◽

Cited By ~ 1

Author(s):

L. Chandrashekar ◽

Shalabh Bhatnagar

Keyword(s):

Dynamic Programming ◽

Linear Function ◽

Markov Decision Processes ◽

Function Approximation ◽

Approximate Dynamic Programming ◽

Decision Processes ◽

Linear Function Approximation ◽

Markov Decision

Download Full-text

State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms

Management Science ◽

10.1287/mnsc.28.1.1 ◽

1982 ◽

Vol 28 (1) ◽

pp. 1-16 ◽

Cited By ~ 425

Author(s):

George E. Monahan

Keyword(s):

Markov Decision Processes ◽

State Of The Art ◽

Decision Processes ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Learning Control of Dynamical Systems Based on Markov Decision Processes: Research Frontiers and Outlooks

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2012.00673 ◽

2012 ◽

Vol 38 (5) ◽

pp. 673-687 ◽

Cited By ~ 1

Author(s):

Xin XU ◽

Dong SHEN ◽

Yan-Qing GAO ◽

Kai WANG

Keyword(s):

Dynamical Systems ◽

Markov Decision Processes ◽

Learning Control ◽

Decision Processes ◽

Markov Decision ◽

Research Frontiers

Download Full-text