Safe Deployment of a Reinforcement Learning Robot Using Self Stabilization

10.36227/techrxiv.14842245.v1 ◽

2021 ◽

Author(s):

Nanda Kishore Sreenivas ◽

Shrisha Rao

Keyword(s):

Reinforcement Learning ◽

Finite Number ◽

Autonomous Vehicles ◽

Safe Space ◽

Training Phase ◽

Prior Work ◽

Industrial Systems ◽

Learning Agent ◽

Improved Performance ◽

Action Spaces

In toy environments like video games, a reinforcement learning agent is deployed and operates within the same state space in which it was trained. However, in robotics applications such as industrial systems or autonomous vehicles, this cannot be guaranteed. A robot can be pushed out of its training space by some unforeseen perturbation, which may cause it to go into an unknown state from which it has not been trained to move towards its goal. While most prior work in the area of RL safety focuses on ensuring safety in the training phase, this paper focuses on ensuring the safe deployment of a robot that has already been trained to operate within a safe space. This work defines a condition on the state and action spaces, that if satisfied, guarantees the robot's recovery to safety independently. We also propose a strategy and design that facilitate this recovery within a finite number of steps after perturbation. This is implemented and tested against a standard RL model, and the results indicate a much-improved performance.

Download Full-text

What Can You Do with a Rock? Affordance Extraction via Word Embeddings

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/144 ◽

2017 ◽

Cited By ~ 9

Author(s):

Nancy Fulda ◽

Daniel Ricks ◽

Ben Murdoch ◽

David Wingate

Keyword(s):

Reinforcement Learning ◽

Computational Complexity ◽

Linear Algebra ◽

Autonomous Agents ◽

Common Knowledge ◽

Search Space ◽

Word Embeddings ◽

Knowledge Database ◽

Learning Agent ◽

Action Spaces

Autonomous agents must often detect affordances: the set of behaviors enabled by a situation. Affordance extraction is particularly helpful in domains with large action spaces, allowing the agent to prune its search space by avoiding futile behaviors. This paper presents a method for affordance extraction via word embeddings trained on a tagged Wikipedia corpus. The resulting word vectors are treated as a common knowledge database which can be queried using linear algebra. We apply this method to a reinforcement learning agent in a text-only environment and show that affordance-based action selection improves performance in most cases. Our method increases the computational complexity of each learning step but significantly reduces the total number of steps needed. In addition, the agent's action selections begin to resemble those a human would choose.

Download Full-text

Accelerating Reinforcement Learning through Implicit Imitation

Journal of Artificial Intelligence Research ◽

10.1613/jair.898 ◽

2003 ◽

Vol 19 ◽

pp. 569-629 ◽

Cited By ~ 72

Author(s):

B. Price ◽

C. Boutilier

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Formal Model ◽

The State ◽

Learning Agent ◽

Relative Value ◽

Multiagent Environments ◽

Improved Performance ◽

Prioritized Sweeping ◽

Extract Information

Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.

Download Full-text

A Novel Reinforcement Learning Architecture for Continuous State and Action Spaces

Advances in Artificial Intelligence ◽

10.1155/2013/492852 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10

Author(s):

Víctor Uc-Cetina

Keyword(s):

Reinforcement Learning ◽

Finite Number ◽

Control Problem ◽

Experimental Work ◽

Infinite Number ◽

Robot Control ◽

Real Numbers ◽

Continuous State ◽

The Right ◽

Action Spaces

We introduce a reinforcement learning architecture designed for problems with an infinite number of states, where each state can be seen as a vector of real numbers and with a finite number of actions, where each action requires a vector of real numbers as parameters. The main objective of this architecture is to distribute in two actors the work required to learn the final policy. One actor decides what action must be performed; meanwhile, a second actor determines the right parameters for the selected action. We tested our architecture and one algorithm based on it solving the robot dribbling problem, a challenging robot control problem taken from the RoboCup competitions. Our experimental work with three different function approximators provides enough evidence to prove that the proposed architecture can be used to implement fast, robust, and reliable reinforcement learning algorithms.

Download Full-text

Weakly Supervised Reinforcement Learning for Autonomous Highway Driving via Virtual Safety Cages

Sensors ◽

10.3390/s21062032 ◽

2021 ◽

Vol 21 (6) ◽

pp. 2032

Author(s):

Sampo Kuutti ◽

Richard Bowden ◽

Saber Fallah

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Autonomous Vehicle ◽

Model Performance ◽

Model Parameters ◽

Safe Driving ◽

Weak Supervision ◽

Rule Based ◽

Learning Agent ◽

Weakly Supervised

The use of neural networks and reinforcement learning has become increasingly popular in autonomous vehicle control. However, the opaqueness of the resulting control policies presents a significant barrier to deploying neural network-based control in autonomous vehicles. In this paper, we present a reinforcement learning based approach to autonomous vehicle longitudinal control, where the rule-based safety cages provide enhanced safety for the vehicle as well as weak supervision to the reinforcement learning agent. By guiding the agent to meaningful states and actions, this weak supervision improves the convergence during training and enhances the safety of the final trained policy. This rule-based supervisory controller has the further advantage of being fully interpretable, thereby enabling traditional validation and verification approaches to ensure the safety of the vehicle. We compare models with and without safety cages, as well as models with optimal and constrained model parameters, and show that the weak supervision consistently improves the safety of exploration, speed of convergence, and model performance. Additionally, we show that when the model parameters are constrained or sub-optimal, the safety cages can enable a model to learn a safe driving policy even when the model could not be trained to drive through reinforcement learning alone.

Download Full-text

Algorithmic Improvements for Deep Reinforcement Learning Applied to Interactive Fiction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5857 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4328-4336

Author(s):

Vishal Jain ◽

William Fedus ◽

Hugo Larochelle ◽

Doina Precup ◽

Marc G. Bellemare

Keyword(s):

Reinforcement Learning ◽

Structural Characteristics ◽

Learning Algorithms ◽

Learning Problem ◽

Interactive Fiction ◽

Reward Function ◽

Learning Agent ◽

Accumulated Reward ◽

Partially Observable ◽

Action Spaces

Text-based games are a natural challenge domain for deep reinforcement learning algorithms. Their state and action spaces are combinatorially large, their reward function is sparse, and they are partially observable: the agent is informed of the consequences of its actions through textual feedback. In this paper we emphasize this latter point and consider the design of a deep reinforcement learning agent that can play from feedback alone. Our design recognizes and takes advantage of the structural characteristics of text-based games. We first propose a contextualisation mechanism, based on accumulated reward, which simplifies the learning problem and mitigates partial observability. We then study different methods that rely on the notion that most actions are ineffectual in any given situation, following Zahavy et al.'s idea of an admissible action. We evaluate these techniques in a series of text-based games of increasing difficulty based on the TextWorld framework, as well as the iconic game Zork. Empirically, we find that these techniques improve the performance of a baseline deep reinforcement learning agent applied to text-based games.

Download Full-text

Algorithms or Actions? A Study in Large-Scale Reinforcement Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/377 ◽

2018 ◽

Cited By ~ 2

Author(s):

Anderson Rocha Tavares ◽

Sivasubramanian Anbalagan ◽

Leandro Soriano Marcolino ◽

Luiz Chaimowicz

Keyword(s):

Reinforcement Learning ◽

Finite Number ◽

Real Time ◽

Function Approximation ◽

Large Scale ◽

Sufficient Conditions ◽

Approximation Approach ◽

Conditions For Learning ◽

Strategy Games ◽

Action Spaces

Large state and action spaces are very challenging to reinforcement learning. However, in many domains there is a set of algorithms available, which estimate the best action given a state. Hence, agents can either directly learn a performance-maximizing mapping from states to actions, or from states to algorithms. We investigate several aspects of this dilemma, showing sufficient conditions for learning over algorithms to outperform over actions for a finite number of training iterations. We present synthetic experiments to further study such systems. Finally, we propose a function approximation approach, demonstrating the effectiveness of learning over algorithms in real-time strategy games.

Download Full-text

Compositional RL Agents That Follow Language Commands in Temporal Logic

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.689550 ◽

2021 ◽

Vol 8 ◽

Author(s):

Yen-Ling Kuo ◽

Boris Katz ◽

Andrei Barbu

Keyword(s):

Neural Networks ◽

Temporal Logic ◽

Recurrent Neural Networks ◽

Prior Work ◽

State Action ◽

Compositional Structure ◽

Learning Agent ◽

Continuous State ◽

Compositional Structures ◽

Action Spaces

We demonstrate how a reinforcement learning agent can use compositional recurrent neural networks to learn to carry out commands specified in linear temporal logic (LTL). Our approach takes as input an LTL formula, structures a deep network according to the parse of the formula, and determines satisfying actions. This compositional structure of the network enables zero-shot generalization to significantly more complex unseen formulas. We demonstrate this ability in multiple problem domains with both discrete and continuous state-action spaces. In a symbolic domain, the agent finds a sequence of letters that satisfy a specification. In a Minecraft-like environment, the agent finds a sequence of actions that conform to a formula. In the Fetch environment, the robot finds a sequence of arm configurations that move blocks on a table to fulfill the commands. While most prior work can learn to execute one formula reliably, we develop a novel form of multi-task learning for RL agents that allows them to learn from a diverse set of tasks and generalize to a new set of diverse tasks without any additional training. The compositional structures presented here are not specific to LTL, thus opening the path to RL agents that perform zero-shot generalization in other compositional domains.

Download Full-text

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

ACM Transactions on Internet of Things ◽

10.1145/3424739 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-25

Author(s):

Yongsen Ma ◽

Sheheryar Arshad ◽

Swetha Muniraju ◽

Eric Torkildson ◽

Enrico Rantala ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Activity Recognition ◽

Deep Neural Networks ◽

State Machine ◽

Recognition Algorithm ◽

The State ◽

Neural Architecture ◽

Learning Agent

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.

Download Full-text