Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)

Journal of Artificial Intelligence Research ◽

10.1613/jair.5242 ◽

2017 ◽

Vol 59 ◽

pp. 229-264 ◽

Cited By ~ 5

Author(s):

Asrar Ahmed ◽

Pradeep Varakantham ◽

Meghna Lowalekar ◽

Yossiri Adulyasak ◽

Patrick Jaillet

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Mixed Integer ◽

Benchmark Problems ◽

Minimax Regret ◽

Mixed Integer Linear Program ◽

Worst Case ◽

Integer Linear Program Formulation ◽

Markov Decision ◽

Reward Functions

Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies have focused on the computation of maximin policies which maximize the value in the worst case amongst all realisations of uncertainty. Given the overly conservative nature of maximin policies, recent work has proposed minimax regret as an ideal alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only and they are also limited in their scalability. Therefore, we provide a general model of uncertain MDPs that considers uncertainty over both transition and reward functions. Furthermore, we also consider dependence of the uncertainty across different states and decision epochs. We also provide a mixed integer linear program formulation for minimizing regret given a set of samples of the transition and reward functions in the uncertain MDP. In addition, we provide two myopic variants of regret, namely Cumulative Expected Myopic Regret (CEMR) and One Step Regret (OSR) that can be optimized in a scalable manner. Specifically, we provide dynamic programming and policy iteration based algorithms to optimize CEMR and OSR respectively. Finally, to demonstrate the effectiveness of our approaches, we provide comparisons on two benchmark problems from literature. We observe that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret.

Download Full-text

Functional Reward Markov Decision Processes: Theory and Applications

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213017600144 ◽

2017 ◽

Vol 26 (03) ◽

pp. 1760014

Author(s):

Paul Weng ◽

Olivier Spanjaard

Keyword(s):

Markov Decision Processes ◽

Infinite Horizon ◽

Standard Form ◽

Sufficient Conditions ◽

Decision Processes ◽

Markov Decision ◽

Standard Models ◽

Reward Functions ◽

Planning Problems ◽

Horizon Case

Markov decision processes (MDP) have become one of the standard models for decisiontheoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that several variants of MDPs presented in the literature can be instantiated in this setting. We then identify sufficient conditions on these reward functions for dynamic programming to be valid. We also discuss the infinite horizon case and the case where a maximum operator does not exist. In order to show the potential of our framework, we conclude the paper by presenting several illustrative examples.

Download Full-text

Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/676 ◽

2018 ◽

Cited By ~ 1

Author(s):

Shun Zhang ◽

Edmund H. Durfee ◽

Satinder Singh

Keyword(s):

Side Effects ◽

Markov Decision Processes ◽

Decision Processes ◽

Minimax Regret ◽

Negative Side ◽

Markov Decision ◽

Planning Algorithm ◽

Negative Side Effects

As it achieves a goal on behalf of its human user, an autonomous agent's actions may have side effects that change features of its environment in ways that negatively surprise its user. An agent that can be trusted to operate safely should thus only change features the user has explicitly permitted. We formalize this problem, and develop a planning algorithm that avoids potentially negative side effects given what the agent knows about (un)changeable features. Further, we formulate a provably minimax-regret querying strategy for the agent to selectively ask the user about features that it hasn't explicitly been told about. We empirically show how much faster it is than a more exhaustive approach and how much better its queries are than those found by the best known heuristic.

Download Full-text

Nash ε-equilibria for stochastic games with total reward functions: an approach through Markov decision processes

Kybernetika ◽

10.14736/kyb-2019-1-0152 ◽

2019 ◽

pp. 152-165

Author(s):

Francisco J. González-Padilla ◽

Raúl Montes-de-Oca

Keyword(s):

Markov Decision Processes ◽

Stochastic Games ◽

Decision Processes ◽

Total Reward ◽

Markov Decision ◽

Reward Functions

Download Full-text

Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes

Journal of Artificial Intelligence Research ◽

10.1613/jair.761 ◽

2001 ◽

Vol 14 ◽

pp. 29-51 ◽

Cited By ~ 39

Author(s):

N. L. Zhang ◽

W. Zhang

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Benchmark Problems ◽

Test Problems ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

Number Of Iterations

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems.

Download Full-text

Learning Control of Dynamical Systems Based on Markov Decision Processes: Research Frontiers and Outlooks

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2012.00673 ◽

2012 ◽

Vol 38 (5) ◽

pp. 673-687 ◽

Cited By ~ 1

Author(s):

Xin XU ◽

Dong SHEN ◽

Yan-Qing GAO ◽

Kai WANG

Keyword(s):

Dynamical Systems ◽

Markov Decision Processes ◽

Learning Control ◽

Decision Processes ◽

Markov Decision ◽

Research Frontiers

Download Full-text

A Framework for Modeling Bounded Rationality: Mis-Specified Bayesian-Markov Decision Processes

SSRN Electronic Journal ◽

10.2139/ssrn.2710475 ◽

2016 ◽

Cited By ~ 1

Author(s):

Ignacio Esponda ◽

Demian Pouzo

Keyword(s):

Bounded Rationality ◽

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision

Download Full-text

A Vector Minimum Superharmonic Approach to Solving Infinite-Horizon Discounted Markov Decision Processes

Journal of the Operational Research Society ◽

10.1038/sj/jors/0431109 ◽

1992 ◽

Vol 43 (11) ◽

pp. 1095-1102

Author(s):

D J White

Keyword(s):

Markov Decision Processes ◽

Infinite Horizon ◽

Decision Processes ◽

Markov Decision

Download Full-text

A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion

SIAM Journal on Control and Optimization ◽

10.1137/19m1255811 ◽

2020 ◽

Vol 58 (4) ◽

pp. 2535-2566

Author(s):

François Dufour ◽

Alexandre Genadot

Keyword(s):

Convex Programming ◽

Markov Decision Processes ◽

Discrete Time ◽

Decision Processes ◽

Programming Approach ◽

Total Reward ◽

Markov Decision ◽

Reward Criterion

Download Full-text

Extreme-point solutions in Markov decision processes

Journal of Applied Probability ◽

10.1017/s002190020002413x ◽

1983 ◽

Vol 20 (04) ◽

pp. 835-842

Author(s):

David Assaf

Keyword(s):

Convex Function ◽

Extreme Point ◽

Markov Decision Processes ◽

Convex Functions ◽

Sufficient Conditions ◽

Decision Processes ◽

Markov Decision ◽

Full Solution

The paper presents sufficient conditions for certain functions to be convex. Functions of this type often appear in Markov decision processes, where their maximum is the solution of the problem. Since a convex function takes its maximum at an extreme point, the conditions may greatly simplify a problem. In some cases a full solution may be obtained after the reduction is made. Some illustrative examples are discussed.

Download Full-text

Singularly perturbed Markov chains. II. Applications to controlled dynamic systems and Markov decision processes

Proceedings of the 36th IEEE Conference on Decision and Control ◽

10.1109/cdc.1997.657595 ◽

2002 ◽

Author(s):

Q. Zhang ◽

G. Yin

Keyword(s):

Markov Chains ◽

Markov Decision Processes ◽

Dynamic Systems ◽

Decision Processes ◽

Singularly Perturbed ◽

Markov Decision

Download Full-text