Safe Exploration of State and Action Spaces in Reinforcement Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.3761 ◽

2012 ◽

Vol 45 ◽

pp. 515-564 ◽

Cited By ~ 20

Author(s):

J. Garcia ◽

F. Fernandez

Keyword(s):

Reinforcement Learning ◽

Learning System ◽

Action Space ◽

High Dimensional ◽

State Action ◽

Continuous State ◽

Additional Challenge ◽

Efficient Exploration ◽

Action Spaces ◽

Selection Of

In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in damage to the learning system (or any other system). Consequently, when an agent begins an interaction with a dangerous and high-dimensional state-action space, an important question arises; namely, that of how to avoid (or at least minimize) damage caused by the exploration of the state-action space. We introduce the PI-SRL algorithm which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. We evaluate the proposed method in four complex tasks: automatic car parking, pole-balancing, helicopter hovering, and business management.

Download Full-text

Swarm Reinforcement Learning Methods for Problems with Continuous State-action Space

Transactions of the Society of Instrument and Control Engineers ◽

10.9746/sicetr.48.790 ◽

2012 ◽

Vol 48 (11) ◽

pp. 790-798

Author(s):

Hitoshi IIMA ◽

Yasuaki KUROE

Keyword(s):

Reinforcement Learning ◽

Action Space ◽

Learning Methods ◽

State Action ◽

Continuous State

Download Full-text

Count-Based Exploration in Feature Space for Reinforcement Learning

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/344 ◽

2017 ◽

Cited By ~ 7

Author(s):

Jarryd Martin ◽

Suraj Narayanan S. ◽

Tom Everitt ◽

Marcus Hutter

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Function Approximation ◽

Feature Space ◽

Feature Representation ◽

High Dimensional ◽

Training Experience ◽

Approximation Techniques ◽

State Action ◽

Efficient Exploration

We introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state. Our \phi-pseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The \phi-Exploration-Bonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks.

Download Full-text

Automated Driving Highway Traffic Merging using Deep Multi-Agent Reinforcement Learning in Continuous State-Action Spaces

10.1109/iv48863.2021.9575676 ◽

2021 ◽

Author(s):

Larry Schester ◽

Luis E. Ortiz

Keyword(s):

Reinforcement Learning ◽

Highway Traffic ◽

Automated Driving ◽

State Action ◽

Continuous State ◽

Multi Agent ◽

Action Spaces

Download Full-text

HRLB⌃2: A Reinforcement Learning Based Framework for Believable Bots

Applied Sciences ◽

10.3390/app8122453 ◽

2018 ◽

Vol 8 (12) ◽

pp. 2453 ◽

Cited By ~ 5

Author(s):

Christian Arzate Cruz ◽

Jorge Ramirez Uresti

Keyword(s):

Reinforcement Learning ◽

High Dimensional ◽

State Action ◽

Hierarchical Reinforcement Learning ◽

Learning Framework ◽

Novel Approach ◽

The Creation ◽

Action Spaces ◽

Human Player

The creation of believable behaviors for Non-Player Characters (NPCs) is key to improve the players’ experience while playing a game. To achieve this objective, we need to design NPCs that appear to be controlled by a human player. In this paper, we propose a hierarchical reinforcement learning framework for believable bots (HRLB⌃2). This novel approach has been designed so it can overcome two main challenges currently faced in the creation of human-like NPCs. The first difficulty is exploring domains with high-dimensional state–action spaces, while satisfying constraints imposed by traits that characterize human-like behavior. The second problem is generating behavior diversity, by also adapting to the opponent’s playing style. We evaluated the effectiveness of our framework in the domain of the 2D fighting game named Street Fighter IV. The results of our tests demonstrate that our bot behaves in a human-like manner.

Download Full-text

Hashing over Predicted Future Frames for Informed Exploration of Deep Reinforcement Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/420 ◽

2018 ◽

Author(s):

Haiyan Yin ◽

Jianda Chen ◽

Sinno Jialin Pan

Keyword(s):

Reinforcement Learning ◽

Prediction Model ◽

High Dimensional ◽

State Action ◽

Dimensional Image ◽

The Future ◽

Convolutional Autoencoder ◽

Future Return ◽

Future Direction ◽

Efficient Exploration

In deep reinforcement learning (RL) tasks, an efficient exploration mechanism should be able to encourage an agent to take actions that lead to less frequent states which may yield higher accumulative future return. However, both knowing about the future and evaluating the frequentness of states are non-trivial tasks, especially for deep RL domains, where a state is represented by high-dimensional image frames. In this paper, we propose a novel informed exploration framework for deep RL, where we build the capability for an RL agent to predict over the future transitions and evaluate the frequentness for the predicted future frames in a meaningful manner. To this end, we train a deep prediction model to predict future frames given a state-action pair, and a convolutional autoencoder model to hash over the seen frames. In addition, to utilize the counts derived from the seen frames to evaluate the frequentness for the predicted frames, we tackle the challenge of matching the predicted future frames and their corresponding seen frames at the latent feature level. In this way, we derive a reliable metric for evaluating the novelty of the future direction pointed by each action, and hence inform the agent to explore the least frequent one.

Download Full-text

Swarm reinforcement learning methods for problems with continuous state-action space

2011 IEEE International Conference on Systems, Man, and Cybernetics ◽

10.1109/icsmc.2011.6083999 ◽

2011 ◽

Cited By ~ 2

Author(s):

Hitoshi Iima ◽

Yasuaki Kuroe ◽

Kazuo Emoto

Keyword(s):

Reinforcement Learning ◽

Action Space ◽

Learning Methods ◽

State Action ◽

Continuous State

Download Full-text

Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics

Complex & Intelligent Systems ◽

10.1007/s40747-021-00366-1 ◽

2021 ◽

Author(s):

Jie Zhong ◽

Tao Wang ◽

Lianglun Cheng

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Free Path ◽

Inverse Kinematics ◽

Multiple Dimensions ◽

Continuous State ◽

Planning Algorithm ◽

Convergence Performance ◽

Path Planner ◽

Action Spaces

AbstractIn actual welding scenarios, an effective path planner is needed to find a collision-free path in the configuration space for the welding manipulator with obstacles around. However, as a state-of-the-art method, the sampling-based planner only satisfies the probability completeness and its computational complexity is sensitive with state dimension. In this paper, we propose a path planner for welding manipulators based on deep reinforcement learning for solving path planning problems in high-dimensional continuous state and action spaces. Compared with the sampling-based method, it is more robust and is less sensitive with state dimension. In detail, to improve the learning efficiency, we introduce the inverse kinematics module to provide prior knowledge while a gain module is also designed to avoid the local optimal policy, we integrate them into the training algorithm. To evaluate our proposed planning algorithm in multiple dimensions, we conducted multiple sets of path planning experiments for welding manipulators. The results show that our method not only improves the convergence performance but also is superior in terms of optimality and robustness of planning compared with most other planning algorithms.

Download Full-text

A Deep Reinforcement Learning-Based MPPT Control for PV Systems under Partial Shading Condition

Sensors ◽

10.3390/s20113039 ◽

2020 ◽

Vol 20 (11) ◽

pp. 3039

Author(s):

Bao Chau Phan ◽

Ying-Chih Lai ◽

Chin E. Lin

Keyword(s):

Reinforcement Learning ◽

Maximum Power ◽

Maximum Power Point ◽

Partial Shading ◽

Discrete State ◽

Efficient Operation ◽

Pv Systems ◽

Continuous State ◽

Power Point ◽

Action Spaces

On the issues of global environment protection, the renewable energy systems have been widely considered. The photovoltaic (PV) system converts solar power into electricity and significantly reduces the consumption of fossil fuels from environment pollution. Besides introducing new materials for the solar cells to improve the energy conversion efficiency, the maximum power point tracking (MPPT) algorithms have been developed to ensure the efficient operation of PV systems at the maximum power point (MPP) under various weather conditions. The integration of reinforcement learning and deep learning, named deep reinforcement learning (DRL), is proposed in this paper as a future tool to deal with the optimization control problems. Following the success of deep reinforcement learning (DRL) in several fields, the deep Q network (DQN) and deep deterministic policy gradient (DDPG) are proposed to harvest the MPP in PV systems, especially under a partial shading condition (PSC). Different from the reinforcement learning (RL)-based method, which is only operated with discrete state and action spaces, the methods adopted in this paper are used to deal with continuous state spaces. In this study, DQN solves the problem with discrete action spaces, while DDPG handles the continuous action spaces. The proposed methods are simulated in MATLAB/Simulink for feasibility analysis. Further tests under various input conditions with comparisons to the classical Perturb and observe (P&O) MPPT method are carried out for validation. Based on the simulation results in this study, the performance of the proposed methods is outstanding and efficient, showing its potential for further applications.

Download Full-text