Probabilistic Relational Planning with First Order Decision Diagrams

Journal of Artificial Intelligence Research ◽

10.1613/jair.3205 ◽

2011 ◽

Vol 41 ◽

pp. 231-266 ◽

Cited By ~ 6

Author(s):

S. Joshi ◽

R. Khardon

Keyword(s):

Dynamic Programming ◽

Planning System ◽

Compact Representation ◽

Iteration Algorithm ◽

Decision Diagrams ◽

Value Functions ◽

Value Iteration ◽

First Order ◽

Stochastic Planning ◽

Planning Problems

Dynamic programming algorithms have been successfully applied to propositional stochastic planning problems by using compact representations, in particular algebraic decision diagrams, to capture domain dynamics and value functions. Work on symbolic dynamic programming lifted these ideas to first order logic using several representation schemes. Recent work introduced a first order variant of decision diagrams (FODD) and developed a value iteration algorithm for this representation. This paper develops several improvements to the FODD algorithm that make the approach practical. These include, new reduction operators that decrease the size of the representation, several speedup techniques, and techniques for value approximation. Incorporating these, the paper presents a planning system, FODD-Planner, for solving relational stochastic planning problems. The system is evaluated on several domains, including problems from the recent international planning competition, and shows competitive performance with top ranking systems. This is the first demonstration of feasibility of this approach and it shows that abstraction through compact representation is a promising approach to stochastic planning.

Download Full-text

First Order Decision Diagrams for Relational MDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.2489 ◽

2008 ◽

Vol 31 ◽

pp. 431-472 ◽

Cited By ~ 18

Author(s):

C. Wang ◽

S. Joshi ◽

R. Khardon

Keyword(s):

Markov Decision Processes ◽

Optimal Policy ◽

Decision Processes ◽

Compact Representation ◽

Iteration Algorithm ◽

Decision Diagrams ◽

Value Iteration ◽

First Order ◽

Relational Structures ◽

Markov Decision

Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational structure that can be naturally described in terms of objects and relations among them. Two contributions are presented. First, the paper develops First Order Decision Diagrams (FODD), a new compact representation for functions over relational structures, together with a set of operators to combine FODDs, and novel reduction techniques to keep the representation small. Second, the paper shows how FODDs can be used to develop solutions for RMDPs, where reasoning is performed at the abstract level and the resulting optimal policy is independent of domain size (number of objects) or instantiation. In particular, a variant of the value iteration algorithm is developed by using special operations over FODDs, and the algorithm is shown to converge to the optimal policy.

Download Full-text

Optimal Switching in Anti-Lock Brake Systems of Ground Vehicles Based on Approximate Dynamic Programming

Volume 3: Multiagent Network Systems; Natural Gas and Heat Exchangers; Path Planning and Motion Control; Powertrain Systems; Rehab Robotics; Robot Manipulators; Rollover Prevention (AVS); Sensors and Actuators; Time Delay Systems; Tracking Control Systems; Uncertain Systems and Robustness; Unmanned, Ground and Surface Robotics; Vehicle Dynamics Control; Vibration and Control of Smart Structures/Mech Systems; Vibration Issues in Mechanical Systems ◽

10.1115/dscc2015-9893 ◽

2015 ◽

Cited By ~ 2

Author(s):

Tohid Sardarmehni ◽

Ali Heydari

Keyword(s):

Dynamic Programming ◽

Approximate Dynamic Programming ◽

Infinite Horizon ◽

Brake System ◽

Iteration Algorithm ◽

Value Iteration ◽

Ground Vehicles ◽

Optimal Switching ◽

Hydraulic Brake ◽

And Control

Approximate dynamic programming, also known as reinforcement learning, is applied for optimal control of Antilock Brake Systems (ABS) in ground vehicles. As an accurate and control oriented model of the brake system, quarter vehicle model with hydraulic brake system is selected. Due to the switching nature of hydraulic brake system of ABS, an optimal switching solution is generated through minimizing a performance index that penalizes the braking distance and forces the vehicle velocity to go to zero, while preventing wheel lock-ups. Towards this objective, a value iteration algorithm is selected for ‘learning’ the infinite horizon solution. Artificial neural networks, as powerful function approximators, are utilized for approximating the value function. The training is conducted offline using least squares. Once trained, the converged neural network is used for determining optimal decisions for the actuators on the fly. Numerical simulations show that this approach is very promising while having low real-time computational burden, hence, outperforms many existing solutions in the literature.

Download Full-text

Adaptive dynamic programming with stable value iteration algorithm for discrete-time nonlinear systems

The 2012 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2012.6252512 ◽

2012 ◽

Cited By ~ 12

Author(s):

Qinglai Wei ◽

Derong Liu

Keyword(s):

Dynamic Programming ◽

Nonlinear Systems ◽

Discrete Time ◽

Adaptive Dynamic Programming ◽

Iteration Algorithm ◽

Value Iteration ◽

Adaptive Dynamic ◽

Value Iteration Algorithm

Download Full-text

The value iteration algorithm is not strongly polynomial for discounted dynamic programming

Operations Research Letters ◽

10.1016/j.orl.2013.12.011 ◽

2014 ◽

Vol 42 (2) ◽

pp. 130-131 ◽

Cited By ~ 12

Author(s):

Eugene A. Feinberg ◽

Jefferson Huang

Keyword(s):

Dynamic Programming ◽

Iteration Algorithm ◽

Value Iteration ◽

Discounted Dynamic Programming ◽

Value Iteration Algorithm ◽

Strongly Polynomial

Download Full-text

Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control.

International Journal of Emerging Electric Power Systems ◽

10.2202/1553-779x.1066 ◽

2005 ◽

Vol 3 (1) ◽

Cited By ~ 14

Author(s):

Damien Ernst ◽

Mevludin Glavic ◽

Pierre Geurts ◽

Louis Wehenkel

Keyword(s):

Reinforcement Learning ◽

Power System ◽

Control Problem ◽

Learning Algorithm ◽

Electrical Power ◽

Complex Case ◽

Iteration Algorithm ◽

Value Iteration ◽

Learning Context ◽

Power System Control

In this paper we explain how to design intelligent agents able to process the information acquired from interaction with a system to learn a good control policy and show how the methodology can be applied to control some devices aimed to damp electrical power oscillations. The control problem is formalized as a discrete-time optimal control problem and the information acquired from interaction with the system is a set of samples, where each sample is composed of four elements: a state, the action taken while being in this state, the instantaneous reward observed and the successor state of the system. To process this information we consider reinforcement learning algorithms that determine an approximation of the so-called Q-function by mimicking the behavior of the value iteration algorithm. Simulations are first carried on a benchmark power system modeled with two state variables. Then we present a more complex case study on a four-machine power system where the reinforcement learning algorithm controls a Thyristor Controlled Series Capacitor (TCSC) aimed to damp power system oscillations.

Download Full-text

A Model-Based Factored Bayesian Reinforcement Learning Approach

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1092 ◽

2014 ◽

Vol 513-517 ◽

pp. 1092-1095

Author(s):

Bo Wu ◽

Yan Peng Feng ◽

Hong Yan Zheng

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Iteration Algorithm ◽

Value Iteration ◽

Practical Applications ◽

Model Based ◽

Online Planning ◽

Bayesian Reinforcement Learning ◽

Bayesian Inference Method ◽

Unknown Structure

Bayesian reinforcement learning has turned out to be an effective solution to the optimal tradeoff between exploration and exploitation. However, in practical applications, the learning parameters with exponential growth are the main impediment for online planning and learning. To overcome this problem, we bring factored representations, model-based learning, and Bayesian reinforcement learning together in a new approach. Firstly, we exploit a factored representation to describe the states to reduce the size of learning parameters, and adopt Bayesian inference method to learn the unknown structure and parameters simultaneously. Then, we use an online point-based value iteration algorithm to plan and learn. The experimental results show that the proposed approach is an effective way for improving the learning efficiency in large-scale state spaces.

Download Full-text

Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm

Journal of Dynamic Systems Measurement and Control ◽

10.1115/1.4032875 ◽

2016 ◽

Vol 138 (6) ◽

Author(s):

Thai Duong ◽

Duong Nguyen-Huu ◽

Thinh Nguyen

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Transition Probability ◽

Transition Probability Matrix ◽

Rate Of Change ◽

Optimal Decision ◽

Iteration Algorithm ◽

Value Iteration ◽

Markov Decision ◽

Value Iteration Algorithm

Markov decision process (MDP) is a well-known framework for devising the optimal decision-making strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a time-invariant transition probability matrix. However, in many real-world scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of time-variant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework.

Download Full-text

Prädiktive Regelung mit Dynamischer Programmierung für nichtlineare Systeme erster Ordnung (Predictive Control with Dynamic Programming for First Order Nonlinear Systems)

at - Automatisierungstechnik ◽

10.1524/auto.51.12.547.22700 ◽

2003 ◽

Vol 51 (12-2003) ◽

pp. 547-554 ◽

Cited By ~ 3

Author(s):

Michael Back ◽

Stephan Terwen

Keyword(s):

Dynamic Programming ◽

Nonlinear Systems ◽

Predictive Control ◽

First Order ◽

Nichtlineare Systeme

Download Full-text

nso-HSVI: A Not-So-Optimistic Heuristic Search Value Iteration Algorithm for POMDPs

2014 IEEE 26th International Conference on Tools with Artificial Intelligence ◽

10.1109/ictai.2014.108 ◽

2014 ◽

Author(s):

Feng Liu ◽

Haibo Li ◽

Chongjun Wang

Keyword(s):

Heuristic Search ◽

Iteration Algorithm ◽

Value Iteration ◽

Value Iteration Algorithm

Download Full-text

Lifted Fact-Alternating Mutex Groups and Pruned Grounding of Classical Planning Problems

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i06.6536 ◽

2020 ◽

Vol 34 (06) ◽

pp. 9835-9842

Author(s):

Daniel Fišer

Keyword(s):

Positive Impact ◽

Planning System ◽

Inference Algorithm ◽

Planning Problems ◽

Pruning Techniques

In this paper, we focus on the inference of mutex groups in the lifted (PDDL) representation. We formalize the inference and prove that the most commonly used translator from the Fast Downward (FD) planning system infers a certain subclass of mutex groups, called fact-alternating mutex groups (fam-groups). Based on that, we show that the previously proposed fam-groups-based pruning techniques for the STRIPS representation can be utilized during the grounding process with lifted fam-groups, i.e., before the full STRIPS representation is known. Furthermore, we propose an improved inference algorithm for lifted fam-groups that produces a richer set of fam-groups than the FD translator and we demonstrate a positive impact on the number of pruned operators and overall coverage.

Download Full-text