scholarly journals Efficient Planning under Uncertainty with Macro-actions

2011 ◽  
Vol 40 ◽  
pp. 523-570 ◽  
Author(s):  
R. He ◽  
E. Brunskill ◽  
N. Roy

Deciding how to act in partially observable environments remains an active area of research. Identifying good sequences of decisions is particularly challenging when good control performance requires planning multiple steps into the future in domains with many states. Towards addressing this challenge, we present an online, forward-search algorithm called the Posterior Belief Distribution (PBD). PBD leverages a novel method for calculating the posterior distribution over beliefs that result after a sequence of actions is taken, given the set of observation sequences that could be received during this process. This method allows us to efficiently evaluate the expected reward of a sequence of primitive actions, which we refer to as macro-actions. We present a formal analysis of our approach, and examine its performance on two very large simulation experiments: scientific exploration and a target monitoring domain. We also demonstrate our algorithm being used to control a real robotic helicopter in a target monitoring experiment, which suggests that our approach has practical potential for planning in real-world, large partially observable domains where a multi-step lookahead is required to achieve good performance.

Author(s):  
Roman Andriushchenko ◽  
Milan Češka ◽  
Sebastian Junges ◽  
Joost-Pieter Katoen

AbstractThis paper presents a novel method for the automated synthesis of probabilistic programs. The starting point is a program sketch representing a finite family of finite-state Markov chains with related but distinct topologies, and a reachability specification. The method builds on a novel inductive oracle that greedily generates counter-examples (CEs) for violating programs and uses them to prune the family. These CEs leverage the semantics of the family in the form of bounds on its best- and worst-case behaviour provided by a deductive oracle using an MDP abstraction. The method further monitors the performance of the synthesis and adaptively switches between inductive and deductive reasoning. Our experiments demonstrate that the novel CE construction provides a significantly faster and more effective pruning strategy leading to an accelerated synthesis process on a wide range of benchmarks. For challenging problems, such as the synthesis of decentralized partially-observable controllers, we reduce the run-time from a day to minutes.


2016 ◽  
Vol 64 (12) ◽  
Author(s):  
Sven Bodenburg ◽  
Jan Lunze

AbstractThis paper proposes a novel method to organise the reconfiguration process of decentralised controllers after actuator failures have occurred in an interconnected system. If an actuator fails in a subsystem, only the corresponding control station should be reconfigured, although the fault has effects on other subsystems through the physical couplings. The focus of this paper is on the organisation of the reconfiguration process without a central coordinator. Design agents exist for each of the subsystems which store the subsystem model. A local algorithm is presented to gather models from neighbouring design agents with the aim to set-up a model which describes the behaviour of the faulty subsystem including its neighbours. Furthermore, local reconfiguration conditions are proposed to design a virtual actuator so as to guarantee stability of the overall system. As a consequence, the design agents “play” together to gather the model of the faulty subsystem before the reconfigured control station is “plugged-in” the control hardware. Plug-and-play reconfiguration is illustrated by an interconnected tank system.


Author(s):  
Cong Chen ◽  
Changhe Yuan

Much effort has been directed at developing algorithms for learning optimal Bayesian network structures from data. When given limited or noisy data, however, the optimal Bayesian network often fails to capture the true underlying network structure. One can potentially address the problem by finding multiple most likely Bayesian networks (K-Best) in the hope that one of them recovers the true model. However, it is often the case that some of the best models come from the same peak(s) and are very similar to each other; so they tend to fail together. Moreover, many of these models are not even optimal respective to any causal ordering, thus unlikely to be useful. This paper proposes a novel method for finding a set of diverse top Bayesian networks, called modes, such that each network is guaranteed to be optimal in a local neighborhood. Such mode networks are expected to provide a much better coverage of the true model. Based on a globallocal theorem showing that a mode Bayesian network must be optimal in all local scopes, we introduce an A* search algorithm to efficiently find top M Bayesian networks which are highly probable and naturally diverse. Empirical evaluations show that our top mode models have much better diversity as well as accuracy in discovering true underlying models than those found by K-Best.


2014 ◽  
Vol 644-650 ◽  
pp. 2169-2172
Author(s):  
Zhi Kong ◽  
Guo Dong Zhang ◽  
Li Fu Wang

This paper develops an improved novel global harmony search (INGHS) algorithm for solving optimization problems. INGHS employs a novel method for generating new solution vectors that enhances accuracy and convergence rate of novel global harmony search (NGHS) algorithm. Simulations for five benchmark test functions show that INGHS possesses better ability to find the global optimum than that of harmony search (HS) algorithm. Compared with NGHS and HS, INGHS is better in terms of robustness and efficiency.


Energies ◽  
2018 ◽  
Vol 11 (6) ◽  
pp. 1328 ◽  
Author(s):  
Thang Nguyen ◽  
Dieu Vo ◽  
Nguyen Vu Quynh ◽  
Le Van Dai

Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 71
Author(s):  
Zhiyu Xia ◽  
Zhengyi Xu ◽  
Dan Li ◽  
Jianming Wei

Chemical industrial parks, which act as critical infrastructures in many cities, need to be responsive to chemical gas leakage accidents. Once a chemical gas leakage accident occurs, risks of poisoning, fire, and explosion will follow. In order to meet the primary emergency response demands in chemical gas leakage accidents, source tracking technology of chemical gas leakage has been proposed and evolved. This paper proposes a novel method, Outlier Mutation Optimization (OMO) algorithm, aimed to quickly and accurately track the source of chemical gas leakage. The OMO algorithm introduces a random walk exploration mode and, based on Swarm Intelligence (SI), increases the probability of individual mutation. Compared with other optimization algorithms, the OMO algorithm has the advantages of a wider exploration range and more convergence modes. In the algorithm test session, a series of chemical gas leakage accident application examples with random parameters are first assumed based on the Gaussian plume model; next, the qualitative experiments and analysis of the OMO algorithm are conducted, based on the application example. The test results show that the OMO algorithm with default parameters has superior comprehensive performance, including the extremely high average calculation accuracy: the optimal value, which represents the error between the final objective function value obtained by the optimization algorithm and the ideal value, reaches 2.464e-15 when the number of sensors is 16; 2.356e-13 when the number of sensors is 9; and 5.694e-23 when the number of sensors is 4. There is a satisfactory calculation time: 12.743 s/50 times when the number of sensors is 16; 10.304 s/50 times when the number of sensors is 9; and 8.644 s/50 times when the number of sensors is 4. The analysis of the OMO algorithm’s characteristic parameters proves the flexibility and robustness of this method. In addition, compared with other algorithms, the OMO algorithm can obtain an excellent leakage source tracing result in the application examples of 16, 9 and 4 sensors, and the accuracy exceeds the direct search algorithm, evolutionary algorithm, and other swarm intelligence algorithms.


Author(s):  
Karel Horák ◽  
Branislav Bošanský ◽  
Krishnendu Chatterjee

Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.


2018 ◽  
Vol 38 (2-3) ◽  
pp. 162-181 ◽  
Author(s):  
Yuanfu Luo ◽  
Haoyu Bai ◽  
David Hsu ◽  
Wee Sun Lee

The partially observable Markov decision process (POMDP) provides a principled general framework for robot planning under uncertainty. Leveraging the idea of Monte Carlo sampling, recent POMDP planning algorithms have scaled up to various challenging robotic tasks, including, real-time online planning for autonomous vehicles. To further improve online planning performance, this paper presents IS-DESPOT, which introduces importance sampling to DESPOT, a state-of-the-art sampling-based POMDP algorithm for planning under uncertainty. Importance sampling improves DESPOT’s performance when there are critical, but rare events, which are difficult to sample. We prove that IS-DESPOT retains the theoretical guarantee of DESPOT. We demonstrate empirically that importance sampling significantly improves the performance of online POMDP planning for suitable tasks. We also present a general method for learning the importance sampling distribution.


2011 ◽  
Vol 40 ◽  
pp. 95-142 ◽  
Author(s):  
J. Veness ◽  
K.S. Ng ◽  
M. Hutter ◽  
W. Uther ◽  
D. Silver

This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. Our approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a new Monte-Carlo Tree Search algorithm along with an agent-specific extension to the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a variety of stochastic and partially observable domains. We conclude by proposing a number of directions for future research.


Sign in / Sign up

Export Citation Format

Share Document