scholarly journals Monte Carlo Sampling Methods for Approximating Interactive POMDPs

2009 ◽  
Vol 34 ◽  
pp. 297-337 ◽  
Author(s):  
P. Doshi ◽  
P. J Gmytrasiewicz

Partially observable Markov decision processes (POMDPs) provide a principled framework for sequential planning in uncertain single agent settings. An extension of POMDPs to multiagent settings, called interactive POMDPs (I-POMDPs), replaces POMDP belief spaces with interactive hierarchical belief systems which represent an agent’s belief about the physical world, about beliefs of other agents, and about their beliefs about others’ beliefs. This modification makes the difficulties of obtaining solutions due to complexity of the belief and policy spaces even more acute. We describe a general method for obtaining approximate solutions of I-POMDPs based on particle filtering (PF). We introduce the interactive PF, which descends the levels of the interactive belief hierarchies and samples and propagates beliefs at each level. The interactive PF is able to mitigate the belief space complexity, but it does not address the policy space complexity. To mitigate the policy space complexity – sometimes also called the curse of history – we utilize a complementary method based on sampling likely observations while building the look ahead reachability tree. While this approach does not completely address the curse of history, it beats back the curse’s impact substantially. We provide experimental results and chart future work.

Author(s):  
Daxue Liu ◽  
Jun Wu ◽  
Xin Xu

Multi-agent reinforcement learning (MARL) provides a useful and flexible framework for multi-agent coordination in uncertain dynamic environments. However, the generalization ability and scalability of algorithms to large problem sizes, already problematic in single-agent RL, is an even more formidable obstacle in MARL applications. In this paper, a new MARL method based on ordinal action selection and approximate policy iteration called OAPI (Ordinal Approximate Policy Iteration), is presented to address the scalability issue of MARL algorithms in common-interest Markov Games. In OAPI, an ordinal action selection and learning strategy is integrated with distributed approximate policy iteration not only to simplify the policy space and eliminate the conflicts in multi-agent coordination, but also to realize the approximation of near-optimal policies for Markov Games with large state spaces. Based on the simplified policy space using ordinal action selection, the OAPI algorithm implements distributed approximate policy iteration utilizing online least-squares policy iteration (LSPI). This resulted in multi-agent coordination with good convergence properties with reduced computational complexity. The simulation results of a coordinated multi-robot navigation task illustrate the feasibility and effectiveness of the proposed approach.


Author(s):  
J.D Annan ◽  
J.C Hargreaves

In this paper, we review progress towards efficiently estimating parameters in climate models. Since the general problem is inherently intractable, a range of approximations and heuristic methods have been proposed. Simple Monte Carlo sampling methods, although easy to implement and very flexible, are rather inefficient, making implementation possible only in the very simplest models. More sophisticated methods based on random walks and gradient-descent methods can provide more efficient solutions, but it is often unclear how to extract probabilistic information from such methods and the computational costs are still generally too high for their application to state-of-the-art general circulation models (GCMs). The ensemble Kalman filter is an efficient Monte Carlo approximation which is optimal for linear problems, but we show here how its accuracy can degrade in nonlinear applications. Methods based on particle filtering may provide a solution to this problem but have yet to be studied in any detail in the realm of climate models. Statistical emulators show great promise for future research and their computational speed would eliminate much of the need for efficient sampling techniques. However, emulation of a full GCM has yet to be achieved and the construction of such represents a substantial computational task in itself.


2007 ◽  
Author(s):  
John TO Kirk

How did the cosmos, and our own special part of it, come to be? How did life emerge and how did we arise within it? What can we say about the essential nature of the physical world? What can be said about the physical basis of consciousness? What can science tell or not tell us about the nature and origin of physical and biological reality? Science and Certainty clears away the many misunderstandings surrounding these questions. The book addresses why certain areas of science cause concern to many people today – in particular, those which seem to have implications for the meaning of human existence, and for our significance on this planet and in the universe as a whole. It also examines the tension that can exist between scientific and religious belief systems. Science and Certainty offers an account of what science does, in fact, ask us to believe about the most fundamental aspects of reality and, therefore, the implications of accepting the scientific world view. The author also includes a historical and philosophical background to a number of environmental issues and argues that it is only through science that we can hope to solve these problems. This book will appeal to popular science readers, those with an interest in the environment and the implications of science for the meaning of human existence, as well as students of environmental studies, philosophy, ethics and theology.


Author(s):  
Gregory Bartram ◽  
Sankaran Mahadevan

This paper proposes a methodology for probabilistic prognosis of a system using a dynamic Bayesian network (DBN). Dynamic Bayesian networks are suitable for probabilistic prognosis because of their ability to integrate information in a variety of formats from various sources and give a probabilistic representation of the system state. Further, DBNs provide a platform naturally suited for seamless integration of diagnosis, uncertainty quantification, and prediction. In the proposed methodology, a DBN is used for online diagnosis via particle filtering, providing a current estimate of the joint distribution over the system variables. The information available in the state estimate also helps to quantify the uncertainty in diagnosis. Next, based on this probabilistic state estimate, future states of the system are predicted using the DBN and sequential or recursive Monte Carlo sampling. Prediction in this manner provides the necessary information to estimate the distribution of remaining use life (RUL). The prognosis procedure, which is system specific, is validated using a suite of offline hierarchical metrics. The prognosis methodology is demonstrated on a hydraulic actuator subject to a progressive seal wear that results in internal leakage between the chambers of the actuator.


2019 ◽  
Vol 48 (4) ◽  
pp. 335-350 ◽  
Author(s):  
Akuro Big-Alabo

This paper presents approximate periodic solutions to the anharmonic (i.e. not harmonic or non-sinusoidal) response of a simple pendulum undergoing moderate- to large-amplitude oscillations. The approximate solutions were derived by using a modified continuous piecewise linearization method that enabled very accurate solutions to the pendulum oscillations for the entire range of possible amplitudes i.e. [Formula: see text]. The present solution method is very simple and can be used to obtain amplitude-frequency solutions as well as the displacement and velocity histories of the simple pendulum without the need for a complementary method. The purpose of this paper is to present simple and accurate approximate analytical solutions to the large-amplitude oscillations of the simple pendulum that can be applied by undergraduates.


2008 ◽  
Vol 32 ◽  
pp. 169-202 ◽  
Author(s):  
C. V. Goldman ◽  
S. Zilberstein

Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions.


2021 ◽  
Vol 24 (2) ◽  
pp. 1814-1820
Author(s):  
Brenda Ng ◽  
Carol Meyers ◽  
Kofi Boakye ◽  
John Nitao

We examine the suitability of using decision processes to model real-world systems of intelligent adversaries. Decision processes have long been used to study cooperative multiagent interactions, but their practical applicability to adversarial problems has received minimal study. We address the pros and cons of applying sequential decision-making in this area, using the crime of money laundering as a specific example. Motivated by case studies, we abstract out a model of the money laundering process, using the framework of interactive partially observable Markov decision processes (I-POMDPs). We address why this framework is well suited for modeling adversarial interactions. Particle filtering and value iteration are used to solve the model, with the application of different pruning and look-ahead strategies to assess the tradeoffs between solution quality and algorithmic run time. Our results show that there is a large gap in the level of realism that can currently be achieved by such decision models, largely due to computational demands that limit the size of problems that can be solved. While these results represent solutions to a simplified model of money laundering, they illustrate nonetheless the kinds of agent interactions that cannot be captured by standard approaches such as anomaly detection. This implies that I-POMDP methods may be valuable in the future, when algorithmic capabilities have further evolved.


Sign in / Sign up

Export Citation Format

Share Document