Solving Transition Independent Decentralized Markov Decision Processes

Journal of Artificial Intelligence Research ◽

10.1613/jair.1497 ◽

2004 ◽

Vol 22 ◽

pp. 423-455 ◽

Cited By ~ 51

Author(s):

R. Becker ◽

S. Zilberstein ◽

V. Lesser ◽

C. V. Goldman

Keyword(s):

Markov Decision Processes ◽

Optimal Algorithm ◽

Decision Processes ◽

Specific Class ◽

Multi Agent Systems ◽

Sequential Decision ◽

Anytime Algorithm ◽

Reward Function ◽

Markov Decision ◽

Multi Agent

Formal treatment of collaborative multi-agent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents' transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To our best knowledge, this is the first algorithm to optimally solve a non-trivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms.

Download Full-text

Interval-Based Markov Decision Processes for Regulating Interactions Between Two Agents in Multi-agent Systems

Applied Parallel Computing. State of the Art in Scientific Computing - Lecture Notes in Computer Science ◽

10.1007/11558958_12 ◽

2006 ◽

pp. 102-111 ◽

Cited By ~ 3

Author(s):

Graçaliz P. Dimuro ◽

Antônio C. R. Costa

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Multi Agent Systems ◽

Agent Systems ◽

Two Agents ◽

Markov Decision ◽

Multi Agent

Download Full-text

Online learning for Markov decision processes applied to multi-agent systems

2017 IEEE 56th Annual Conference on Decision and Control (CDC) ◽

10.1109/cdc.2017.8263879 ◽

2017 ◽

Author(s):

Mahmoud El Chamie ◽

Behcet Acikmese ◽

Mehran Mesbahi

Keyword(s):

Online Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Multi Agent Systems ◽

Agent Systems ◽

Markov Decision ◽

Multi Agent

Download Full-text

Enforcing Almost-Sure Reachability in POMDPs

Computer Aided Verification - Lecture Notes in Computer Science ◽

10.1007/978-3-030-81688-9_28 ◽

2021 ◽

pp. 602-625

Author(s):

Sebastian Junges ◽

Nils Jansen ◽

Sanjit A. Seshia

Keyword(s):

Markov Decision Processes ◽

Empirical Evaluation ◽

Decision Processes ◽

Limited Information ◽

Sequential Decision ◽

Goal State ◽

Learning Agent ◽

Markov Decision ◽

System Configurations ◽

Partially Observable

AbstractPartially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.

Download Full-text

A Novel Heterogeneous Swarm Reinforcement Learning Method for Sequential Decision Making Problems

Machine Learning and Knowledge Extraction ◽

10.3390/make1020035 ◽

2019 ◽

Vol 1 (2) ◽

pp. 590-610

Author(s):

Zohreh Akbari ◽

Rainer Unland

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Single Agent ◽

Sequential Decision Making ◽

Multi Agent Systems ◽

Sequential Decision ◽

Agent Systems ◽

Novel Approach ◽

Markov Decision ◽

Multi Agent

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.

Download Full-text

CHQ: a multi-agent reinforcement learning scheme for partially observable markov decision processes

Proceedings. IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2004. (IAT 2004). ◽

10.1109/iat.2004.1342918 ◽

2004 ◽

Author(s):

H. Osada ◽

S. Fujita

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Learning Scheme ◽

Markov Decision ◽

Multi Agent ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Quantile Markov Decision Processes

Operations Research ◽

10.1287/opre.2021.2123 ◽

2021 ◽

Author(s):

Xiaocheng Li ◽

Huaiyang Zhong ◽

Margaret L. Brandeau

Keyword(s):

Markov Decision Process ◽

Markov Decision Processes ◽

Decision Process ◽

Value At Risk ◽

Infinite Horizon ◽

Decision Processes ◽

Conditional Value At Risk ◽

Sequential Decision ◽

Optimal Drug ◽

Markov Decision

Title: Sequential Decision Making Using Quantiles The goal of a traditional Markov decision process (MDP) is to maximize the expectation of cumulative reward over a finite or infinite horizon. In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward. For example, a physician may want to determine the optimal drug regime for a risk-averse patient with the objective of maximizing the 0.10 quantile of the cumulative reward; this is the cumulative improvement in health that is expected to occur with at least 90% probability for the patient. In “Quantile Markov Decision Processes,” X. Li, H. Zhong, and M. Brandeau provide analytic results to solve the quantile Markov decision process (QMDP) problem. They develop an efficient dynamic programming procedure that finds the optimal QMDP value function for all states and quantiles in one pass. The algorithm also extends to the MDP problem with a conditional value-at-risk objective.

Download Full-text

CHQ: A Multi-Agent Reinforcement Learning Scheme for Partially Observable Markov Decision Processes

IEICE Transactions on Information and Systems ◽

10.1093/ietisy/e88-d.5.1004 ◽

2005 ◽

Vol E88-D (5) ◽

pp. 1004-1011 ◽

Cited By ~ 3

Author(s):

H. OSADA

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Learning Scheme ◽

Markov Decision ◽

Multi Agent ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Communication in multi-agent Markov decision processes

Proceedings Fourth International Conference on MultiAgent Systems ◽

10.1109/icmas.2000.858528 ◽

2002 ◽

Cited By ~ 3

Author(s):

Ping Xuan ◽

V. Lesser ◽

S. Zilberstein

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision ◽

Multi Agent

Download Full-text

Sensitivity Analysis in Markov Decision Processes with Uncertain Reward Parameters

Journal of Applied Probability ◽

10.1017/s002190020000855x ◽

2011 ◽

Vol 48 (04) ◽

pp. 954-967 ◽

Cited By ~ 1

Author(s):

Chin Hon Tan ◽

Joseph C. Hartman

Keyword(s):

Sensitivity Analysis ◽

Markov Decision Processes ◽

Lot Sizing ◽

Optimal Solution ◽

Decision Processes ◽

Model Parameters ◽

Sequential Decision ◽

Estimation Errors ◽

Bellman Equations ◽

Markov Decision

Sequential decision problems can often be modeled as Markov decision processes. Classical solution approaches assume that the parameters of the model are known. However, model parameters are usually estimated and uncertain in practice. As a result, managers are often interested in how estimation errors affect the optimal solution. In this paper we illustrate how sensitivity analysis can be performed directly for a Markov decision process with uncertain reward parameters using the Bellman equations. In particular, we consider problems involving (i) a single stationary parameter, (ii) multiple stationary parameters, and (iii) multiple nonstationary parameters. We illustrate the applicability of this work through a capacitated stochastic lot-sizing problem.

Download Full-text