Docking Control of an Autonomous Underwater
Vehicle Using Reinforcement Learning

Enrico Anderlini; Gordon G. Parker; Giles Thomas

doi:10.3390/app9173456

Docking Control of an Autonomous Underwater Vehicle Using Reinforcement Learning

Applied Sciences ◽

10.3390/app9173456 ◽

2019 ◽

Vol 9 (17) ◽

pp. 3456 ◽

Cited By ~ 1

Author(s):

Enrico Anderlini ◽

Gordon G. Parker ◽

Giles Thomas

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Learning Strategies ◽

Autonomous Underwater Vehicle ◽

Autonomous Underwater Vehicles ◽

Computational Cost ◽

Control Input ◽

Control Effort ◽

Continuous State ◽

Action Spaces

To achieve persistent systems in the future, autonomous underwater vehicles (AUVs) willneed to autonomously dock onto a charging station. Here, reinforcement learning strategies wereapplied for the first time to control the docking of an AUV onto a fixed platform in a simulationenvironment. Two reinforcement learning schemes were investigated: one with continuous stateand action spaces, deep deterministic policy gradient (DDPG), and one with continuous state butdiscrete action spaces, deep Q network (DQN). For DQN, the discrete actions were selected as stepchanges in the control input signals. The performance of the reinforcement learning strategies wascompared with classical and optimal control techniques. The control actions selected by DDPG sufferfrom chattering effects due to a hyperbolic tangent layer in the actor. Conversely, DQN presents thebest compromise between short docking time and low control effort, whilst meeting the dockingrequirements. Whereas the reinforcement learning algorithms present a very high computational costat training time, they are five orders of magnitude faster than optimal control at deployment time,thus enabling an on-line implementation. Therefore, reinforcement learning achieves a performancesimilar to optimal control at a much lower computational cost at deployment, whilst also presentinga more general framework.

Download Full-text

Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics

Complex & Intelligent Systems ◽

10.1007/s40747-021-00366-1 ◽

2021 ◽

Author(s):

Jie Zhong ◽

Tao Wang ◽

Lianglun Cheng

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Free Path ◽

Inverse Kinematics ◽

Multiple Dimensions ◽

Continuous State ◽

Planning Algorithm ◽

Convergence Performance ◽

Path Planner ◽

Action Spaces

AbstractIn actual welding scenarios, an effective path planner is needed to find a collision-free path in the configuration space for the welding manipulator with obstacles around. However, as a state-of-the-art method, the sampling-based planner only satisfies the probability completeness and its computational complexity is sensitive with state dimension. In this paper, we propose a path planner for welding manipulators based on deep reinforcement learning for solving path planning problems in high-dimensional continuous state and action spaces. Compared with the sampling-based method, it is more robust and is less sensitive with state dimension. In detail, to improve the learning efficiency, we introduce the inverse kinematics module to provide prior knowledge while a gain module is also designed to avoid the local optimal policy, we integrate them into the training algorithm. To evaluate our proposed planning algorithm in multiple dimensions, we conducted multiple sets of path planning experiments for welding manipulators. The results show that our method not only improves the convergence performance but also is superior in terms of optimality and robustness of planning compared with most other planning algorithms.

Download Full-text

A Deep Reinforcement Learning-Based MPPT Control for PV Systems under Partial Shading Condition

Sensors ◽

10.3390/s20113039 ◽

2020 ◽

Vol 20 (11) ◽

pp. 3039

Author(s):

Bao Chau Phan ◽

Ying-Chih Lai ◽

Chin E. Lin

Keyword(s):

Reinforcement Learning ◽

Maximum Power ◽

Maximum Power Point ◽

Partial Shading ◽

Discrete State ◽

Efficient Operation ◽

Pv Systems ◽

Continuous State ◽

Power Point ◽

Action Spaces

On the issues of global environment protection, the renewable energy systems have been widely considered. The photovoltaic (PV) system converts solar power into electricity and significantly reduces the consumption of fossil fuels from environment pollution. Besides introducing new materials for the solar cells to improve the energy conversion efficiency, the maximum power point tracking (MPPT) algorithms have been developed to ensure the efficient operation of PV systems at the maximum power point (MPP) under various weather conditions. The integration of reinforcement learning and deep learning, named deep reinforcement learning (DRL), is proposed in this paper as a future tool to deal with the optimization control problems. Following the success of deep reinforcement learning (DRL) in several fields, the deep Q network (DQN) and deep deterministic policy gradient (DDPG) are proposed to harvest the MPP in PV systems, especially under a partial shading condition (PSC). Different from the reinforcement learning (RL)-based method, which is only operated with discrete state and action spaces, the methods adopted in this paper are used to deal with continuous state spaces. In this study, DQN solves the problem with discrete action spaces, while DDPG handles the continuous action spaces. The proposed methods are simulated in MATLAB/Simulink for feasibility analysis. Further tests under various input conditions with comparisons to the classical Perturb and observe (P&O) MPPT method are carried out for validation. Based on the simulation results in this study, the performance of the proposed methods is outstanding and efficient, showing its potential for further applications.

Download Full-text

Autonomous underwater vehicle path planning based on actor-multi-critic reinforcement learning

Proceedings of the Institution of Mechanical Engineers Part I Journal of Systems and Control Engineering ◽

10.1177/0959651820937085 ◽

2020 ◽

pp. 095965182093708

Author(s):

Zhuo Wang ◽

Shiwei Zhang ◽

Xiaoning Feng ◽

Yancheng Sui

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Value Function ◽

Autonomous Underwater Vehicle ◽

Autonomous Underwater Vehicles ◽

Underwater Vehicle ◽

Learning Efficiency ◽

Environmental Adaptability ◽

Vehicle Path ◽

The Value Function

The environmental adaptability of autonomous underwater vehicles is always a problem for its path planning. Although reinforcement learning can improve the environmental adaptability, the slow convergence of reinforcement learning is caused by multi-behavior coupling, so it is difficult for autonomous underwater vehicle to avoid moving obstacles. This article proposes a multi-behavior critic reinforcement learning algorithm applied to autonomous underwater vehicle path planning to overcome problems associated with oscillating amplitudes and low learning efficiency in the early stages of training which are common in traditional actor–critic algorithms. Behavior critic reinforcement learning assesses the actions of the actor from perspectives such as energy saving and security, combining these aspects into a whole evaluation of the actor. In this article, the policy gradient method is selected as the actor part, and the value function method is selected as the critic part. The strategy gradient and the value function methods for actor and critic, respectively, are approximated by a backpropagation neural network, the parameters of which are updated using the gradient descent method. The simulation results show that the method has the ability of optimizing learning in the environment and can improve learning efficiency, which meets the needs of real time and adaptability for autonomous underwater vehicle dynamic obstacle avoidance.

Download Full-text

Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2016070102 ◽

2016 ◽

Vol 7 (3) ◽

pp. 23-42 ◽

Cited By ~ 5

Author(s):

Daniel Hein ◽

Alexander Hentschel ◽

Thomas A. Runkler ◽

Steffen Udluft

Keyword(s):

Particle Swarm Optimization ◽

Reinforcement Learning ◽

A Priori ◽

Particle Swarm ◽

Optimization Techniques ◽

Swarm Optimization ◽

Continuous State ◽

On Line ◽

The Rich ◽

Action Spaces

This article introduces a model-based reinforcement learning (RL) approach for continuous state and action spaces. While most RL methods try to find closed-form policies, the approach taken here employs numerical on-line optimization of control action sequences. First, a general method for reformulating RL problems as optimization tasks is provided. Subsequently, Particle Swarm Optimization (PSO) is applied to search for optimal solutions. This Particle Swarm Optimization Policy (PSO-P) is effective for high dimensional state spaces and does not require a priori assumptions about adequate policy representations. Furthermore, by translating RL problems into optimization tasks, the rich collection of real-world inspired RL benchmarks is made available for benchmarking numerical optimization techniques. The effectiveness of PSO-P is demonstrated on the two standard benchmarks: mountain car and cart pole.

Download Full-text

A reinforcement learning algorithm developed to model GenCo strategic bidding behavior in multidimensional and continuous state and action spaces

2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) ◽

10.1109/adprl.2013.6614997 ◽

2013 ◽

Cited By ~ 1

Author(s):

Alfred Yong Fu Lau ◽

Dipti Srinivasan ◽

Thomas Reindl

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Bidding Behavior ◽

Strategic Bidding ◽

Continuous State ◽

Action Spaces ◽

Reinforcement Learning Algorithm

Download Full-text

H∞ Control of Nonaffine Aerial Systems Using Off-policy Reinforcement Learning

Unmanned Systems ◽

10.1142/s2301385016400069 ◽

2016 ◽

Vol 04 (01) ◽

pp. 51-60 ◽

Cited By ~ 5

Author(s):

Bahare Kiumarsi ◽

Wei Kang ◽

Frank L. Lewis

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

System Dynamics ◽

General Class ◽

Nonlinear Function ◽

Admissible Solution ◽

Control Input ◽

Performance Function ◽

Optimal Tracking ◽

Model Free

This paper presents a completely model-free [Formula: see text] optimal tracking solution to the control of a general class of nonlinear nonaffine systems in the presence of the input constraints. The proposed method is motivated by nonaffine unmanned aerial vehicle (UAV) system as a real application. First, a general class of nonlinear nonaffine system dynamics is presented as an affine system in terms of a nonlinear function of the control input. It is shown that the optimal control of nonaffine systems may not have an admissible solution if the utility function is not defined properly. Moreover, the boundness of the optimal control input cannot be guaranteed for standard performance functions. A new performance function is defined and used in the [Formula: see text]-gain condition for this class of nonaffine system. This performance function guarantees the existence of an admissible solution (if any exists) and boundness of the control input solution. An off-policy reinforcement learning (RL) is employed to iteratively solve the [Formula: see text] optimal tracking control online using the measured data along the system trajectories. The proposed off-policy RL does not require any knowledge of the system dynamics. Moreover, the disturbance input does not need to be adjustable in a specific manner.

Download Full-text

Exploiting Domain Symmetries in Reinforcement Learning with Continuous State and Action Spaces

2009 International Conference on Machine Learning and Applications ◽

10.1109/icmla.2009.41 ◽

2009 ◽

Cited By ~ 2

Author(s):

Alejandro Agostini ◽

Enric Celaya

Keyword(s):

Reinforcement Learning ◽

Continuous State ◽

Action Spaces

Download Full-text

A Novel Reinforcement Learning Architecture for Continuous State and Action Spaces

Advances in Artificial Intelligence ◽

10.1155/2013/492852 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10

Author(s):

Víctor Uc-Cetina

Keyword(s):

Reinforcement Learning ◽

Finite Number ◽

Control Problem ◽

Experimental Work ◽

Infinite Number ◽

Robot Control ◽

Real Numbers ◽

Continuous State ◽

The Right ◽

Action Spaces

We introduce a reinforcement learning architecture designed for problems with an infinite number of states, where each state can be seen as a vector of real numbers and with a finite number of actions, where each action requires a vector of real numbers as parameters. The main objective of this architecture is to distribute in two actors the work required to learn the final policy. One actor decides what action must be performed; meanwhile, a second actor determines the right parameters for the selected action. We tested our architecture and one algorithm based on it solving the robot dribbling problem, a challenging robot control problem taken from the RoboCup competitions. Our experimental work with three different function approximators provides enough evidence to prove that the proposed architecture can be used to implement fast, robust, and reliable reinforcement learning algorithms.

Download Full-text

Deep Reinforcement Learning for Vectored Thruster Autonomous Underwater Vehicle Control

Complexity ◽

10.1155/2021/6649625 ◽

2021 ◽

Vol 2021 ◽

pp. 1-25

Author(s):

Tao Liu ◽

Yuli Hu ◽

Hui Xu

Keyword(s):

Control System ◽

Reinforcement Learning ◽

Autonomous Underwater Vehicle ◽

Autonomous Underwater Vehicles ◽

Continuous Control ◽

Trial And Error ◽

Simulation Environment ◽

Reward Function ◽

Underwater Vehicle Control ◽

Control Accuracy

Autonomous underwater vehicles (AUVs) are widely used to accomplish various missions in the complex marine environment; the design of a control system for AUVs is particularly difficult due to the high nonlinearity, variations in hydrodynamic coefficients, and external force from ocean currents. In this paper, we propose a controller based on deep reinforcement learning (DRL) in a simulation environment for studying the control performance of the vectored thruster AUV. RL is an important method of artificial intelligence that can learn behavior through trial-and-error interactions with the environment, so it does not need to provide an accurate AUV control model that is very hard to establish. The proposed RL algorithm only uses the information that can be measured by sensors inside the AUVs as the input parameters, and the outputs of the designed controller are the continuous control actions, which are the commands that are set to the vectored thruster. Moreover, a reward function is developed for deep RL controller considering different factors which actually affect the control accuracy of AUV navigation control. To confirm the algorithm’s effectiveness, a series of simulations are carried out in the designed simulation environment, which is a method to save time and improve efficiency. Simulation results prove the feasibility of the deep RL algorithm applied to the control system for AUV. Furthermore, our work also provides an optional method for robot control problems to deal with improving technology requirements and complicated application environments.

Download Full-text

Reinforcement Learning in Continuous State and Action Spaces

Adaptation, Learning, and Optimization - Reinforcement Learning ◽

10.1007/978-3-642-27645-3_7 ◽

2012 ◽

pp. 207-251 ◽

Cited By ~ 58

Author(s):

Hado van Hasselt

Keyword(s):

Reinforcement Learning ◽

Continuous State ◽

Action Spaces

Download Full-text