A Framework for Sequential Planning in Multi-Agent Settings

Journal of Artificial Intelligence Research ◽

10.1613/jair.1579 ◽

2005 ◽

Vol 24 ◽

pp. 49-79 ◽

Cited By ~ 93

Author(s):

P. J. Gmytrasiewicz ◽

P. Doshi

Keyword(s):

Traditional Approach ◽

Value Functions ◽

Value Iteration ◽

Markov Decision ◽

Multi Agent ◽

Carry Over ◽

The Cost ◽

Partially Observable ◽

Belief States ◽

Do So

This paper extends the framework of partially observable Markov decision processes (POMDPs) to multi-agent settings by incorporating the notion of agent models into the state space. Agents maintain beliefs over physical states of the environment and over models of other agents, and they use Bayesian updates to maintain their beliefs over time. The solutions map belief states to actions. Models of other agents may include their belief states and are related to agent types considered in games of incomplete information. We express the agents' autonomy by postulating that their models are not directly manipulable or observable by other agents. We show that important properties of POMDPs, such as convergence of value iteration, the rate of convergence, and piece-wise linearity and convexity of the value functions carry over to our framework. Our approach complements a more traditional approach to interactive settings which uses Nash equilibria as a solution paradigm. We seek to avoid some of the drawbacks of equilibria which may be non-unique and do not capture off-equilibrium behaviors. We do so at the cost of having to represent, process and continuously revise models of other agents. Since the agent's beliefs may be arbitrarily nested, the optimal solutions to decision making problems are only asymptotically computable. However, approximate belief updates and approximately optimal plans are computable. We illustrate our framework using a simple application domain, and we show examples of belief updates and value functions.

Download Full-text

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.4623 ◽

2016 ◽

Vol 55 ◽

pp. 443-497 ◽

Cited By ~ 4

Author(s):

Jilles Steeve Dibangoye ◽

Christopher Amato ◽

Olivier Buffet ◽

François Charpillet

Keyword(s):

Heuristic Search ◽

Piecewise Linear ◽

Optimal Solution ◽

Value Iteration ◽

Compact Representations ◽

Continuous State ◽

Markov Decision ◽

Feature Based ◽

Multi Agent ◽

Partially Observable

Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be decentralized. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. To provide scalability, we refine this approach by combining heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to converge to an optimal solution. In particular, we introduce a feature-based heuristic search value iteration (FB-HSVI) algorithm that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that FB-HSVI terminates in finite time with an optimal solution. We include an extensive empirical analysis using well-known benchmarks, thereby demonstrating that our approach provides significant scalability improvements compared to the state of the art.

Download Full-text

IPOMDP-Net: A Deep Neural Network for Partially Observable Multi-Agent Planning Using Interactive POMDPs

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016062 ◽

2019 ◽

Vol 33 ◽

pp. 6062-6069 ◽

Cited By ~ 1

Author(s):

Yanlin Han ◽

Piotr Gmytrasiewicz

Keyword(s):

Neural Network ◽

Network Architecture ◽

State Of The Art ◽

Neural Computing ◽

Neural Network Architecture ◽

Markov Decision ◽

Planning Algorithm ◽

Multi Agent ◽

Partially Observable ◽

Multi Agent Planning

This paper introduces the IPOMDP-net, a neural network architecture for multi-agent planning under partial observability. It embeds an interactive partially observable Markov decision process (I-POMDP) model and a QMDP planning algorithm that solves the model in a neural network architecture. The IPOMDP-net is fully differentiable and allows for end-to-end training. In the learning phase, we train an IPOMDP-net on various fixed and randomly generated environments in a reinforcement learning setting, assuming observable reinforcements and unknown (randomly initialized) model functions. In the planning phase, we test the trained network on new, unseen variants of the environments under the planning setting, using the trained model to plan without reinforcements. Empirical results show that our model-based IPOMDP-net outperforms the other state-of-the-art modelfree network and generalizes better to larger, unseen environments. Our approach provides a general neural computing architecture for multi-agent planning using I-POMDPs. It suggests that, in a multi-agent setting, having a model of other agents benefits our decision-making, resulting in a policy of higher quality and better generalizability.

Download Full-text

Multiobjective genetic algorithm to allocate budgetary resources for condition assessment of water and sewer networks1This paper is one of a selection of papers in this Special Issue on Construction Engineering and Management.

Canadian Journal of Civil Engineering ◽

10.1139/l2012-049 ◽

2012 ◽

Vol 39 (9) ◽

pp. 978-992 ◽

Cited By ~ 11

Author(s):

Ahmed Atef ◽

Hesham Osman ◽

Osama Moselhi

Keyword(s):

Genetic Algorithm ◽

Indirect Costs ◽

Cost Effective ◽

Condition Assessment ◽

Multiobjective Genetic Algorithms ◽

Markov Decision ◽

The Cost ◽

Condition State ◽

Partially Observable ◽

Computational Platform

This paper presents a framework for optimizing condition assessment policies by balancing the revealed value of information with the cost of obtaining such information. The computational platform is based on augmenting the asset condition state with an expected level of accuracy. Inaccuracies due to condition assessment reliability are evaluated using the partially observable Markov decision process. The single objective genetic algorithm is used to select the most cost-effective assets to assess considering information inaccuracy under a fixed budget. The model is extended using multiobjective genetic algorithms and fuzzy set theory to include minimizing the risk exposure based on asset consequence of failure. This methodology takes into consideration direct and indirect costs of sudden infrastructure failure and reduced level of service costs. A case study is presented using the City of Hamilton, Canada, water network to demonstrate the capabilities of the model.

Download Full-text

COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/638 ◽

2017 ◽

Author(s):

Madison Clark-Turner ◽

Christopher Amato

Keyword(s):

Markov Decision Process ◽

Real World ◽

Decision Process ◽

Extended Version ◽

Continuous Observation ◽

Solution Methods ◽

Markov Decision ◽

Multi Agent ◽

Partially Observable Markov ◽

Partially Observable

The decentralized partially observable Markov decision process (Dec-POMDP) is a powerful model for representing multi-agent problems with decentralized behavior. Unfortunately, current Dec-POMDP solution methods cannot solve problems with continuous observations, which are common in many real-world domains. To that end, we present a framework for representing and generating Dec-POMDP policies that explicitly include continuous observations. We apply our algorithm to a novel tagging problem and an extended version of a common benchmark, where it generates policies that meet or exceed the values of equivalent discretized domains without the need for finding an adequate discretization.

Download Full-text

Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/662 ◽

2018 ◽

Cited By ~ 1

Author(s):

Karel Horák ◽

Branislav Bošanský ◽

Krishnendu Chatterjee

Keyword(s):

Heuristic Search ◽

Infinite Horizon ◽

Decision Processes ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Total Cost ◽

Markov Decision ◽

Standard Models ◽

Target States ◽

Partially Observable

Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.

Download Full-text

How to Do Things with Words: A Bayesian Approach

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11951 ◽

2020 ◽

Vol 68 ◽

pp. 753-776

Author(s):

Piotr Gmytrasiewicz

Keyword(s):

Bayes Theorem ◽

Theoretic Approach ◽

Value Iteration ◽

Bayesian Decision ◽

Optimality Principle ◽

Communicative Acts ◽

Decision Theoretic Approach ◽

Multi Agent ◽

Communicative Act ◽

Belief States

Communication changes the beliefs of the listener and of the speaker. The value of a communicative act stems from the valuable belief states which result from this act. To model this we build on the Interactive POMDP (IPOMDP) framework, which extends POMDPs to allow agents to model others in multi-agent settings, and we include communication that can take place between the agents to formulate Communicative IPOMDPs (CIPOMDPs). We treat communication as a type of action and therefore, decisions regarding communicative acts are based on decision-theoretic planning using the Bellman optimality principle and value iteration, just as they are for all other rational actions. As in any form of planning, the results of actions need to be precisely specified. We use the Bayes’ theorem to derive how agents update their beliefs in CIPOMDPs; updates are due to agents’ actions, observations, messages they send to other agents, and messages they receive from others. The Bayesian decision-theoretic approach frees us from the commonly made assumption of cooperative discourse – we consider agents which are free to be dishonest while communicating and are guided only by their selfish rationality. We use a simple Tiger game to illustrate the belief update, and to show that the ability to rationally communicate allows agents to improve efficiency of their interactions.

Download Full-text

Perception-Aware Point-Based Value Iteration for Partially Observable Markov Decision Processes

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/329 ◽

2019 ◽

Author(s):

Mahsa Ghasemi ◽

Ufuk Topcu

Keyword(s):

Markov Decision Processes ◽

Active Role ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Greedy Strategy ◽

Markov Decision ◽

Partially Observable Markov ◽

Observation Selection ◽

Partially Observable

In conventional partially observable Markov decision processes, the observations that the agent receives originate from fixed known distributions. However, in a variety of real-world scenarios, the agent has an active role in its perception by selecting which observations to receive. We avoid combinatorial expansion of the action space from integration of planning and perception decisions, through a greedy strategy for observation selection that minimizes an information-theoretic measure of the state uncertainty. We develop a novel point-based value iteration algorithm that incorporates this greedy strategy to pick perception actions for each sampled belief point in each iteration. As a result, not only the solver requires less belief points to approximate the reachable subspace of the belief simplex, but it also requires less computation per iteration. Further, we prove that the proposed algorithm achieves a near-optimal guarantee on value function with respect to an optimal perception strategy, and demonstrate its performance empirically.

Download Full-text

An Algorithm for Belief States Space Compression Using Non-negative Matrix Factorization Updating Rules in Partially Observable Markov Decision Processes

JOURNAL OF ELECTRONICS INFORMATION TECHNOLOGY ◽

10.3724/sp.j.1146.2012.01670 ◽

2014 ◽

Vol 35 (12) ◽

pp. 2901-2907

Author(s):

Bo Wu ◽

Xin Chen ◽

Hong-yan Zheng ◽

Yan-peng Feng

Keyword(s):

Markov Decision Processes ◽

Matrix Factorization ◽

Decision Processes ◽

Markov Decision ◽

Space Compression ◽

Partially Observable Markov ◽

Partially Observable ◽

Belief States ◽

Non Negative Matrix Factorization

Download Full-text

CHQ: a multi-agent reinforcement learning scheme for partially observable markov decision processes

Proceedings. IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2004. (IAT 2004). ◽

10.1109/iat.2004.1342918 ◽

2004 ◽

Author(s):

H. Osada ◽

S. Fujita

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Learning Scheme ◽

Markov Decision ◽

Multi Agent ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Point-Based Monte Carto Online Planning in POMDPs

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.846-847.1388 ◽

2013 ◽

Vol 846-847 ◽

pp. 1388-1391

Author(s):

Bo Wu ◽

Yan Peng Feng ◽

Hong Yan Zheng

Keyword(s):

Mean Squared Error ◽

Search Algorithm ◽

Search Tree ◽

Real Time System ◽

Monte Carlo Tree Search ◽

Online Planning ◽

Markov Decision ◽

Partially Observable ◽

Belief States ◽

Tree Search Algorithm

The online planning and learning in partially observable Markov decision processes are often intractable because belief states space has two curses: dimensionality and history. In order to address this problem, this paper proposes a point-based Monte Carto online planning approach in POMDPs. This approach involves performing value backup at specific reachable belief points, rather than over the entire belief simplex, to speed up computation processes. Then Monte Carlo tree search algorithm is exploited to share the value of actions across each subtree of the search tree so as to minimise the mean squared error. The experimental results show that the proposed algorithm is effective in real-time system.

Download Full-text