Column Generation Algorithms for Constrained POMDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11216 ◽

2018 ◽

Vol 62 ◽

pp. 489-533 ◽

Cited By ~ 1

Author(s):

Erwin Walraven ◽

Matthijs T. J. Spaan

Keyword(s):

Column Generation ◽

Decision Process ◽

State Of The Art ◽

Single Agent ◽

Current State ◽

Exact Evaluation ◽

Markov Decision ◽

Secondary Objective ◽

Partially Observable ◽

Planning Problems

In several real-world domains it is required to plan ahead while there are finite resources available for executing the plan. The limited availability of resources imposes constraints on the plans that can be executed, which need to be taken into account while computing a plan. A Constrained Partially Observable Markov Decision Process (Constrained POMDP) can be used to model resource-constrained planning problems which include uncertainty and partial observability. Constrained POMDPs provide a framework for computing policies which maximize expected reward, while respecting constraints on a secondary objective such as cost or resource consumption. Column generation for linear programming can be used to obtain Constrained POMDP solutions. This method incrementally adds columns to a linear program, in which each column corresponds to a POMDP policy obtained by solving an unconstrained subproblem. Column generation requires solving a potentially large number of POMDPs, as well as exact evaluation of the resulting policies, which is computationally difficult. We propose a method to solve subproblems in a two-stage fashion using approximation algorithms. First, we use a tailored point-based POMDP algorithm to obtain an approximate subproblem solution. Next, we convert this approximate solution into a policy graph, which we can evaluate efficiently. The resulting algorithm is a new approximate method for Constrained POMDPs in single-agent settings, but also in settings in which multiple independent agents share a global constraint. Experiments based on several domains show that our method outperforms the current state of the art.

Download Full-text

Universal Reinforcement Learning Algorithms: Survey and Experiments

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/194 ◽

2017 ◽

Author(s):

John Aslanides ◽

Jan Leike ◽

Marcus Hutter

Keyword(s):

Reinforcement Learning ◽

Open Source ◽

Markov Decision Process ◽

Decision Process ◽

Empirical Investigation ◽

State Of The Art ◽

Learning Algorithms ◽

Markov Decision ◽

Reference Implementation ◽

Partially Observable

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open- source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.

Download Full-text

A constraint partially observable semi-Markov decision process for the attack–defence relationships in various critical infrastructures

Cyber-Physical Systems ◽

10.1080/23335777.2021.1879935 ◽

2021 ◽

pp. 1-26

Author(s):

Nadia Niknami ◽

Jie Wu

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Critical Infrastructures ◽

Markov Decision ◽

Partially Observable

Download Full-text

A Partially Observable Markov Decision Process-Based Blackboard Architecture for Cognitive Agents in Partially Observable Environments

IEEE Transactions on Cognitive and Developmental Systems ◽

10.1109/tcds.2020.3034428 ◽

2020 ◽

pp. 1-1

Author(s):

Hideaki Itoh ◽

Hidehiko Nakano ◽

Ryota Tokushima ◽

Hisao Fukumoto ◽

Hiroshi Wakuya

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Cognitive Agents ◽

Blackboard Architecture ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Operational State Evaluation and Maintenance Decision-making Method for Multi-state CNC Machine Tools based on Partially Observable Markov Decision Process

2020 International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC) ◽

10.1109/sdpc49476.2020.9353134 ◽

2020 ◽

Author(s):

Fang Zixuan ◽

Wang Xiaodong ◽

Wang Lifang

Keyword(s):

Decision Making ◽

Markov Decision Process ◽

Decision Process ◽

Machine Tools ◽

Cnc Machine Tools ◽

Cnc Machine ◽

State Evaluation ◽

Markov Decision ◽

Maintenance Decision ◽

Partially Observable

Download Full-text

IPOMDP-Net: A Deep Neural Network for Partially Observable Multi-Agent Planning Using Interactive POMDPs

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016062 ◽

2019 ◽

Vol 33 ◽

pp. 6062-6069 ◽

Cited By ~ 1

Author(s):

Yanlin Han ◽

Piotr Gmytrasiewicz

Keyword(s):

Neural Network ◽

Network Architecture ◽

State Of The Art ◽

Neural Computing ◽

Neural Network Architecture ◽

Markov Decision ◽

Planning Algorithm ◽

Multi Agent ◽

Partially Observable ◽

Multi Agent Planning

This paper introduces the IPOMDP-net, a neural network architecture for multi-agent planning under partial observability. It embeds an interactive partially observable Markov decision process (I-POMDP) model and a QMDP planning algorithm that solves the model in a neural network architecture. The IPOMDP-net is fully differentiable and allows for end-to-end training. In the learning phase, we train an IPOMDP-net on various fixed and randomly generated environments in a reinforcement learning setting, assuming observable reinforcements and unknown (randomly initialized) model functions. In the planning phase, we test the trained network on new, unseen variants of the environments under the planning setting, using the trained model to plan without reinforcements. Empirical results show that our model-based IPOMDP-net outperforms the other state-of-the-art modelfree network and generalizes better to larger, unseen environments. Our approach provides a general neural computing architecture for multi-agent planning using I-POMDPs. It suggests that, in a multi-agent setting, having a model of other agents benefits our decision-making, resulting in a policy of higher quality and better generalizability.

Download Full-text

Partially observable Markov Decision Process to prioritize software defects

10.32920/ryerson.14638470 ◽

2021 ◽

Author(s):

Shirin Akbarinasaji

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Tracking System ◽

Dependency Graph ◽

Relative Importance ◽

Bug Reports ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

Issue Tracking System

Background: Bug tracking systems receive many bug reports daily. Although the software quality team aims to identify and resolve these bugs, they are never able to fix all of the reported bugs in the issue tracking system before the release deadline. However, postponing the bug fixing may have some consequences. Prioritization of bug reports will help the software manager decide which bugs to fix and which bugs to postpone. Typically, bug reports are prioritized based on the severity, priority, time and effort for fixing, customer pressure, etc. Aim: Previous studies have shown that these factors may not be appropriate for prioritization. Therefore, relying on them to automate bug prioritization might be misleading. In this dissertation, we aim to prioritize bug reports with respect to the consequence of not fixing the bugs in terms of their relative importance in the issue tracking system. Method: In order to measure the relative importance of bugs in the issue tracking system, we propose the construction of a dependency graph based on the reported dependency-blocking information in the issue tracking system. Two metrics, namely depth and degree, are used to measure the relative importance of the bugs. However, there is uncertainty in the dependency graph structure as the dependency information is discovered manually and gradually. Owing to this uncertainty, prioritization of bugs in the descending order of depth and degree may be misleading. To handle the uncertainty, we propose a novel approach of a partially observable Markov decision process (POMDP) and partially observable Monte Carlo planning (POMCP). Result: To check the feasibility of the proposed approach, we analyzed seven years of data from an open source project, Firefox, and a commercial project. We compared the proposed policy with the developer policy, maximum policy, and random policy. Conclusion: The results suggest that software practitioners do not consider the relative importance of bugs in their current practice. The proposed framework can be combined with practitioners’ expertise to prioritize bugs more effectively and take the depth and degree of bugs into account. In practice, the POMDP framework with the POMCP planner can help practitioners sequentially select bugs to minimize the connectivity of the dependency graph.

Download Full-text

Partially Observable Markov Decision Process (POMDP) Technologies for Sign Language Based Human-Computer Interaction

Lecture Notes in Computer Science - Universal Access in Human-Computer Interaction. Applications and Services ◽

10.1007/978-3-642-02713-0_61 ◽

2009 ◽

pp. 577-586

Author(s):

Sylvie C. W. Ong ◽

David Hsu ◽

Wee Sun Lee ◽

Hanna Kurniawati

Keyword(s):

Human Computer Interaction ◽

Sign Language ◽

Markov Decision Process ◽

Decision Process ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

Computer Interaction

Download Full-text

Optimizing maintenance decisions in railway wheelsets: A Markov decision process approach

Proceedings of the Institution of Mechanical Engineers Part O Journal of Risk and Reliability ◽

10.1177/1748006x18783403 ◽

2018 ◽

Vol 233 (2) ◽

pp. 285-300 ◽

Cited By ~ 2

Author(s):

Joaquim AP Braga ◽

António R Andrade

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Transition Matrices ◽

Technical Standards ◽

Current State ◽

Markov Transition ◽

Markov Decision ◽

Maintenance Decisions ◽

Markov Transition Matrices

This article models the decision problem of maintaining railway wheelsets as a Markov decision process, with the aim to provide a way to support condition-based maintenance for railway wheelsets. A discussion on the role of the railway wheelsets is provided, as well as some background on the technical standards that guide maintenance decisions. A practical example is explored with the estimation of Markov transition matrices for different condition states that depend on the wheelset diameter, its mileage since last turning action (or renewal) and the damage occurrence. Bearing in mind all the possible maintenance actions, an optimal strategy is achieved, providing a map of best actions depending on the current state of the wheelset.

Download Full-text

Partially Observable Markov Decision Process Approximations for Adaptive Sensing

Discrete Event Dynamic Systems ◽

10.1007/s10626-009-0071-x ◽

2009 ◽

Vol 19 (3) ◽

pp. 377-422 ◽

Cited By ~ 46

Author(s):

Edwin K. P. Chong ◽

Christopher M. Kreucher ◽

Alfred O. Hero

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Adaptive Sensing ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Replacement policy for a partially observable Markov decision process model using fuzzy data

Computers & Industrial Engineering ◽

10.1016/0360-8352(93)90314-n ◽

1993 ◽

Vol 25 (1-4) ◽

pp. 435-438

Author(s):

Chang Eun Kim ◽

Mitsuo Gen

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Process Model ◽

Replacement Policy ◽

Fuzzy Data ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text