Exploiting Domain Symmetries in Reinforcement Learning with Continuous State and Action Spaces

AbstractIn actual welding scenarios, an effective path planner is needed to find a collision-free path in the configuration space for the welding manipulator with obstacles around. However, as a state-of-the-art method, the sampling-based planner only satisfies the probability completeness and its computational complexity is sensitive with state dimension. In this paper, we propose a path planner for welding manipulators based on deep reinforcement learning for solving path planning problems in high-dimensional continuous state and action spaces. Compared with the sampling-based method, it is more robust and is less sensitive with state dimension. In detail, to improve the learning efficiency, we introduce the inverse kinematics module to provide prior knowledge while a gain module is also designed to avoid the local optimal policy, we integrate them into the training algorithm. To evaluate our proposed planning algorithm in multiple dimensions, we conducted multiple sets of path planning experiments for welding manipulators. The results show that our method not only improves the convergence performance but also is superior in terms of optimality and robustness of planning compared with most other planning algorithms.

Download Full-text

A Deep Reinforcement Learning-Based MPPT Control for PV Systems under Partial Shading Condition

Sensors ◽

10.3390/s20113039 ◽

2020 ◽

Vol 20 (11) ◽

pp. 3039

Author(s):

Bao Chau Phan ◽

Ying-Chih Lai ◽

Chin E. Lin

Keyword(s):

Reinforcement Learning ◽

Maximum Power ◽

Maximum Power Point ◽

Partial Shading ◽

Discrete State ◽

Efficient Operation ◽

Pv Systems ◽

Continuous State ◽

Power Point ◽

Action Spaces

On the issues of global environment protection, the renewable energy systems have been widely considered. The photovoltaic (PV) system converts solar power into electricity and significantly reduces the consumption of fossil fuels from environment pollution. Besides introducing new materials for the solar cells to improve the energy conversion efficiency, the maximum power point tracking (MPPT) algorithms have been developed to ensure the efficient operation of PV systems at the maximum power point (MPP) under various weather conditions. The integration of reinforcement learning and deep learning, named deep reinforcement learning (DRL), is proposed in this paper as a future tool to deal with the optimization control problems. Following the success of deep reinforcement learning (DRL) in several fields, the deep Q network (DQN) and deep deterministic policy gradient (DDPG) are proposed to harvest the MPP in PV systems, especially under a partial shading condition (PSC). Different from the reinforcement learning (RL)-based method, which is only operated with discrete state and action spaces, the methods adopted in this paper are used to deal with continuous state spaces. In this study, DQN solves the problem with discrete action spaces, while DDPG handles the continuous action spaces. The proposed methods are simulated in MATLAB/Simulink for feasibility analysis. Further tests under various input conditions with comparisons to the classical Perturb and observe (P&O) MPPT method are carried out for validation. Based on the simulation results in this study, the performance of the proposed methods is outstanding and efficient, showing its potential for further applications.

Download Full-text

Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2016070102 ◽

2016 ◽

Vol 7 (3) ◽

pp. 23-42 ◽

Cited By ~ 5

Author(s):

Daniel Hein ◽

Alexander Hentschel ◽

Thomas A. Runkler ◽

Steffen Udluft

Keyword(s):

Particle Swarm Optimization ◽

Reinforcement Learning ◽

A Priori ◽

Particle Swarm ◽

Optimization Techniques ◽

Swarm Optimization ◽

Continuous State ◽

On Line ◽

The Rich ◽

Action Spaces

This article introduces a model-based reinforcement learning (RL) approach for continuous state and action spaces. While most RL methods try to find closed-form policies, the approach taken here employs numerical on-line optimization of control action sequences. First, a general method for reformulating RL problems as optimization tasks is provided. Subsequently, Particle Swarm Optimization (PSO) is applied to search for optimal solutions. This Particle Swarm Optimization Policy (PSO-P) is effective for high dimensional state spaces and does not require a priori assumptions about adequate policy representations. Furthermore, by translating RL problems into optimization tasks, the rich collection of real-world inspired RL benchmarks is made available for benchmarking numerical optimization techniques. The effectiveness of PSO-P is demonstrated on the two standard benchmarks: mountain car and cart pole.

Download Full-text

A reinforcement learning algorithm developed to model GenCo strategic bidding behavior in multidimensional and continuous state and action spaces

2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) ◽

10.1109/adprl.2013.6614997 ◽

2013 ◽

Cited By ~ 1

Author(s):

Alfred Yong Fu Lau ◽

Dipti Srinivasan ◽

Thomas Reindl

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Bidding Behavior ◽

Strategic Bidding ◽

Continuous State ◽

Action Spaces ◽

Reinforcement Learning Algorithm

Download Full-text

Docking Control of an Autonomous Underwater Vehicle Using Reinforcement Learning

Applied Sciences ◽

10.3390/app9173456 ◽

2019 ◽

Vol 9 (17) ◽

pp. 3456 ◽

Cited By ~ 1

Author(s):

Enrico Anderlini ◽

Gordon G. Parker ◽

Giles Thomas

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Learning Strategies ◽

Autonomous Underwater Vehicle ◽

Autonomous Underwater Vehicles ◽

Computational Cost ◽

Control Input ◽

Control Effort ◽

Continuous State ◽

Action Spaces

To achieve persistent systems in the future, autonomous underwater vehicles (AUVs) willneed to autonomously dock onto a charging station. Here, reinforcement learning strategies wereapplied for the first time to control the docking of an AUV onto a fixed platform in a simulationenvironment. Two reinforcement learning schemes were investigated: one with continuous stateand action spaces, deep deterministic policy gradient (DDPG), and one with continuous state butdiscrete action spaces, deep Q network (DQN). For DQN, the discrete actions were selected as stepchanges in the control input signals. The performance of the reinforcement learning strategies wascompared with classical and optimal control techniques. The control actions selected by DDPG sufferfrom chattering effects due to a hyperbolic tangent layer in the actor. Conversely, DQN presents thebest compromise between short docking time and low control effort, whilst meeting the dockingrequirements. Whereas the reinforcement learning algorithms present a very high computational costat training time, they are five orders of magnitude faster than optimal control at deployment time,thus enabling an on-line implementation. Therefore, reinforcement learning achieves a performancesimilar to optimal control at a much lower computational cost at deployment, whilst also presentinga more general framework.

Download Full-text

A Novel Reinforcement Learning Architecture for Continuous State and Action Spaces

Advances in Artificial Intelligence ◽

10.1155/2013/492852 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10

Author(s):

Víctor Uc-Cetina

Keyword(s):

Reinforcement Learning ◽

Finite Number ◽

Control Problem ◽

Experimental Work ◽

Infinite Number ◽

Robot Control ◽

Real Numbers ◽

Continuous State ◽

The Right ◽

Action Spaces

We introduce a reinforcement learning architecture designed for problems with an infinite number of states, where each state can be seen as a vector of real numbers and with a finite number of actions, where each action requires a vector of real numbers as parameters. The main objective of this architecture is to distribute in two actors the work required to learn the final policy. One actor decides what action must be performed; meanwhile, a second actor determines the right parameters for the selected action. We tested our architecture and one algorithm based on it solving the robot dribbling problem, a challenging robot control problem taken from the RoboCup competitions. Our experimental work with three different function approximators provides enough evidence to prove that the proposed architecture can be used to implement fast, robust, and reliable reinforcement learning algorithms.

Download Full-text

Reinforcement Learning in Continuous State and Action Spaces

Adaptation, Learning, and Optimization - Reinforcement Learning ◽

10.1007/978-3-642-27645-3_7 ◽

2012 ◽

pp. 207-251 ◽

Cited By ~ 58

Author(s):

Hado van Hasselt

Keyword(s):

Reinforcement Learning ◽

Continuous State ◽

Action Spaces

Download Full-text

Automated Driving Highway Traffic Merging using Deep Multi-Agent Reinforcement Learning in Continuous State-Action Spaces

10.1109/iv48863.2021.9575676 ◽

2021 ◽

Author(s):

Larry Schester ◽

Luis E. Ortiz

Keyword(s):

Reinforcement Learning ◽

Highway Traffic ◽

Automated Driving ◽

State Action ◽

Continuous State ◽

Multi Agent ◽

Action Spaces

Download Full-text

Power Management for a Plug-in Hybrid Electric Vehicle Based on Reinforcement Learning with Continuous State and Action Spaces

Energy Procedia ◽

10.1016/j.egypro.2017.12.629 ◽

2017 ◽

Vol 142 ◽

pp. 2270-2275 ◽

Cited By ~ 12

Author(s):

Yuecheng Li ◽

Hongwen He ◽

Jiankun Peng ◽

Hailong Zhang

Keyword(s):

Reinforcement Learning ◽

Electric Vehicle ◽

Power Management ◽

Hybrid Electric Vehicle ◽

Continuous State ◽

Hybrid Electric ◽

Action Spaces

Download Full-text

Safe Exploration of State and Action Spaces in Reinforcement Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.3761 ◽

2012 ◽

Vol 45 ◽

pp. 515-564 ◽

Cited By ~ 20

Author(s):

J. Garcia ◽

F. Fernandez

Keyword(s):

Reinforcement Learning ◽

Learning System ◽

Action Space ◽

High Dimensional ◽

State Action ◽

Continuous State ◽

Additional Challenge ◽

Efficient Exploration ◽

Action Spaces ◽

Selection Of

In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in damage to the learning system (or any other system). Consequently, when an agent begins an interaction with a dangerous and high-dimensional state-action space, an important question arises; namely, that of how to avoid (or at least minimize) damage caused by the exploration of the state-action space. We introduce the PI-SRL algorithm which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. We evaluate the proposed method in four complex tasks: automatic car parking, pole-balancing, helicopter hovering, and business management.

Download Full-text