Continuous State-Action Space Advantage-Learning Using Interval Analysis and Neural Networks

In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in damage to the learning system (or any other system). Consequently, when an agent begins an interaction with a dangerous and high-dimensional state-action space, an important question arises; namely, that of how to avoid (or at least minimize) damage caused by the exploration of the state-action space. We introduce the PI-SRL algorithm which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. We evaluate the proposed method in four complex tasks: automatic car parking, pole-balancing, helicopter hovering, and business management.

Download Full-text

Q-Learning in Continuous State-Action Space with Noisy and Redundant Inputs by Using a Selective Desensitization Neural Network

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2015.p0825 ◽

2015 ◽

Vol 19 (6) ◽

pp. 825-832 ◽

Cited By ~ 2

Author(s):

Takaaki Kobayashi ◽

◽

Takeshi Shibuya ◽

Masahiko Morita

Keyword(s):

Neural Network ◽

Real World ◽

Value Function ◽

Action Space ◽

Sensor Noise ◽

Q Learning ◽

State Action ◽

Continuous State ◽

Real World Applications ◽

Function Approximator

When applying reinforcement learning (RL) algorithms such as Q-learning to real-world applications, we must consider the influence of sensor noise. The simplest way to reduce such noise influence is to additionally use other types of sensors, but this may require more state space -- and probably increase redundancy. Conventional value-function approximators used to RL in continuous state-action space do not deal appropriately with such situations. The selective desensitization neural network (SDNN) has high generalization ability and robustness against noise and redundant input. We therefore propose an SDNN-based value-function approximator for Q-learning in continuous state-action space, and evaluate its performance in terms of robustness against redundant input and sensor noise. Results show that our proposal is strongly robust against noise and redundant input and enables the agent to take better actions by using additional inputs without degrading learning efficiency. These properties are eminently advantageous in real-world applications such as in robotic systems.

Download Full-text

Compositional RL Agents That Follow Language Commands in Temporal Logic

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.689550 ◽

2021 ◽

Vol 8 ◽

Author(s):

Yen-Ling Kuo ◽

Boris Katz ◽

Andrei Barbu

Keyword(s):

Neural Networks ◽

Temporal Logic ◽

Recurrent Neural Networks ◽

Prior Work ◽

State Action ◽

Compositional Structure ◽

Learning Agent ◽

Continuous State ◽

Compositional Structures ◽

Action Spaces

We demonstrate how a reinforcement learning agent can use compositional recurrent neural networks to learn to carry out commands specified in linear temporal logic (LTL). Our approach takes as input an LTL formula, structures a deep network according to the parse of the formula, and determines satisfying actions. This compositional structure of the network enables zero-shot generalization to significantly more complex unseen formulas. We demonstrate this ability in multiple problem domains with both discrete and continuous state-action spaces. In a symbolic domain, the agent finds a sequence of letters that satisfy a specification. In a Minecraft-like environment, the agent finds a sequence of actions that conform to a formula. In the Fetch environment, the robot finds a sequence of arm configurations that move blocks on a table to fulfill the commands. While most prior work can learn to execute one formula reliably, we develop a novel form of multi-task learning for RL agents that allows them to learn from a diverse set of tasks and generalize to a new set of diverse tasks without any additional training. The compositional structures presented here are not specific to LTL, thus opening the path to RL agents that perform zero-shot generalization in other compositional domains.

Download Full-text