End-to-End AUV Motion Planning Method Based on Soft Actor–Critic

Xin Yu; Yushan Sun; Xiangbin Wang; Guocheng Zhang

doi:10.3390/s21175893

End-to-End AUV Motion Planning Method Based on Soft Actor–Critic

Sensors ◽

10.3390/s21175893 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5893

Author(s):

Xin Yu ◽

Yushan Sun ◽

Xiangbin Wang ◽

Guocheng Zhang

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Autonomous Underwater Vehicle ◽

Planning System ◽

Optimal Decision ◽

Target Point ◽

Planning Problem ◽

Training Time ◽

Reward Function ◽

End To End

This study aims to solve the problems of poor exploration ability, single strategy, and high training cost in autonomous underwater vehicle (AUV) motion planning tasks and to overcome certain difficulties, such as multiple constraints and a sparse reward environment. In this research, an end-to-end motion planning system based on deep reinforcement learning is proposed to solve the motion planning problem of an underactuated AUV. The system directly maps the state information of the AUV and the environment into the control instructions of the AUV. The system is based on the soft actor–critic (SAC) algorithm, which enhances the exploration ability and robustness to the AUV environment. We also use the method of generative adversarial imitation learning (GAIL) to assist its training to overcome the problem that learning a policy for the first time is difficult and time-consuming in reinforcement learning. A comprehensive external reward function is then designed to help the AUV smoothly reach the target point, and the distance and time are optimized as much as possible. Finally, the end-to-end motion planning algorithm proposed in this research is tested and compared on the basis of the Unity simulation platform. Results show that the algorithm has an optimal decision-making ability during navigation, a shorter route, less time consumption, and a smoother trajectory. Moreover, GAIL can speed up the AUV training speed and minimize the training time without affecting the planning effect of the SAC algorithm.

Download Full-text

Mapless Motion Planning System for an Autonomous Underwater Vehicle Using Policy Gradient-based Deep Reinforcement Learning

Journal of Intelligent & Robotic Systems ◽

10.1007/s10846-019-01004-2 ◽

2019 ◽

Vol 96 (3-4) ◽

pp. 591-601 ◽

Cited By ~ 4

Author(s):

Yushan Sun ◽

Junhan Cheng ◽

Guocheng Zhang ◽

Hao Xu

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Autonomous Underwater Vehicle ◽

Underwater Vehicle ◽

Planning System ◽

Policy Gradient ◽

Gradient Based

Download Full-text

Generative Adversarial Immitation Learning for Steering an Unmanned Surface Vehicle

Proceedings of the Northern Lights Deep Learning Workshop ◽

10.7557/18.5147 ◽

2020 ◽

Vol 1 ◽

pp. 6

Author(s):

Alexandra Vedeler ◽

Narada Warakagoda

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Complex Dynamics ◽

Imitation Learning ◽

Inverse Reinforcement Learning ◽

Radar Sensor ◽

Single Action ◽

Reward Function ◽

End To End ◽

Insight Into

The task of obstacle avoidance using maritime vessels, such as Unmanned Surface Vehicles (USV), has traditionally been solved using specialized modules that are designed and optimized separately. However, this approach requires a deep insight into the environment, the vessel, and their complex dynamics. We propose an alternative method using Imitation Learning (IL) through Deep Reinforcement Learning (RL) and Deep Inverse Reinforcement Learning (IRL) and present a system that learns an end-to-end steering model capable of mapping radar-like images directly to steering actions in an obstacle avoidance scenario. The USV used in the work is equipped with a Radar sensor and we studied the problem of generating a single action parameter, heading. We apply an IL algorithm known as generative adversarial imitation learning (GAIL) to develop an end-to-end steering model for a scenario where avoidance of an obstacle is the goal. The performance of the system was studied for different design choices and compared to that of a system that is based on pure RL. The IL system produces results that indicate it is able to grasp the concept of the task and that in many ways are on par with the RL system. We deem this to be promising for future use in tasks that are not as easily described by a reward function.

Download Full-text

Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/326 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yang Gao ◽

Christian M. Meyer ◽

Mohsen Mesgar ◽

Iryna Gurevych

Keyword(s):

Reinforcement Learning ◽

Learning To Rank ◽

Poor Performance ◽

Parameter Tuning ◽

Test Time ◽

Sequential Decision ◽

Time Data ◽

Training Time ◽

Search Spaces ◽

Reward Function

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative, but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.

Download Full-text

Deep Reinforcement Learning for Multi-contact Motion Planning of Hexapod Robots

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/328 ◽

2021 ◽

Author(s):

Huiqiao Fu ◽

Kaiqiang Tang ◽

Peng Li ◽

Wenqi Zhang ◽

Xinpeng Wang ◽

...

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

State Transition ◽

Center Of Mass ◽

Legged Locomotion ◽

Careful Planning ◽

Physical Systems ◽

Reward Function ◽

Markov Decision ◽

Hexapod Robots

Legged locomotion in a complex environment requires careful planning of the footholds of legged robots. In this paper, a novel Deep Reinforcement Learning (DRL) method is proposed to implement multi-contact motion planning for hexapod robots moving on uneven plum-blossom piles. First, the motion of hexapod robots is formulated as a Markov Decision Process (MDP) with a speciﬁed reward function. Second, a transition feasibility model is proposed for hexapod robots, which describes the feasibility of the state transition under the condition of satisfying kinematics and dynamics, and in turn determines the rewards. Third, the footholds and Center-of-Mass (CoM) sequences are sampled from a diagonal Gaussian distribution and the sequences are optimized through learning the optimal policies using the designed DRL algorithm. Both of the simulation and experimental results on physical systems demonstrate the feasibility and efficiency of the proposed method. Videos are shown at https://videoviewpage.wixsite.com/mcrl.

Download Full-text

Self-Generation of Reward by Moderate-Based Index for Senor Inputsvspace

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2015.p0057 ◽

2015 ◽

Vol 27 (1) ◽

pp. 57-63

Author(s):

Kentarou Kurashige ◽

◽

Kaoru Nikaido

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Research Participant ◽

Planning Problem ◽

Human Being ◽

Task Knowledge ◽

Reward Function ◽

Input Prediction ◽

Path Planning Problem ◽

Biological Organisms

<div class=""abs_img""><img src=""[disp_template_path]/JRM/abst-image/00270001/07.jpg"" width=""300"" />Moderate-based reward generator</div> In conventional reinforcement learning, a reward function influences the learning results, and therefore, the reward function is very important. To design this function considering a task, knowledge of reinforcement learning is required. In addition to this, a reward function must be designed for each task. These requirements make the design of a reward function unfeasible. We focus on this problem and aim at realizing a method to generate a reward without the design of a special reward function. In this paper, we propose a universal evaluation for sensor inputs, which is independent of a task and is modeled on the basis of the indicator of pleasure and pain in biological organisms. This evaluation estimates the trend of sensor inputs based on the ease of input prediction. Instead of the design of a reward function, our approach assists a human being in learning how to interact with an agent and teaching it his/her demand. We recruited a research participant and attempted to solve the path planning problem. The results show that a participant can teach an agent his/her demand by interacting with the agent and the agent can generate an adaptive route by interacting with the participant and the environment. </span>

Download Full-text

Deep Reinforcement Learning for Vectored Thruster Autonomous Underwater Vehicle Control

Complexity ◽

10.1155/2021/6649625 ◽

2021 ◽

Vol 2021 ◽

pp. 1-25

Author(s):

Tao Liu ◽

Yuli Hu ◽

Hui Xu

Keyword(s):

Control System ◽

Reinforcement Learning ◽

Autonomous Underwater Vehicle ◽

Autonomous Underwater Vehicles ◽

Continuous Control ◽

Trial And Error ◽

Simulation Environment ◽

Reward Function ◽

Underwater Vehicle Control ◽

Control Accuracy

Autonomous underwater vehicles (AUVs) are widely used to accomplish various missions in the complex marine environment; the design of a control system for AUVs is particularly difficult due to the high nonlinearity, variations in hydrodynamic coefficients, and external force from ocean currents. In this paper, we propose a controller based on deep reinforcement learning (DRL) in a simulation environment for studying the control performance of the vectored thruster AUV. RL is an important method of artificial intelligence that can learn behavior through trial-and-error interactions with the environment, so it does not need to provide an accurate AUV control model that is very hard to establish. The proposed RL algorithm only uses the information that can be measured by sensors inside the AUVs as the input parameters, and the outputs of the designed controller are the continuous control actions, which are the commands that are set to the vectored thruster. Moreover, a reward function is developed for deep RL controller considering different factors which actually affect the control accuracy of AUV navigation control. To confirm the algorithm’s effectiveness, a series of simulations are carried out in the designed simulation environment, which is a method to save time and improve efficiency. Simulation results prove the feasibility of the deep RL algorithm applied to the control system for AUV. Furthermore, our work also provides an optional method for robot control problems to deal with improving technology requirements and complicated application environments.

Download Full-text

Motion planning algorithm for nonholonomic autonomous underwater vehicle in disturbance using reinforcement learning and teaching method

Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292) ◽

10.1109/robot.2002.1014368 ◽

2003 ◽

Cited By ~ 1

Author(s):

H. Kawano ◽

T. Ura

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Autonomous Underwater Vehicle ◽

Teaching Method ◽

Underwater Vehicle ◽

Learning And Teaching ◽

Planning Algorithm

Download Full-text

Fast reinforcement learning algorithm for motion planning of nonholonomic autonomous underwater vehicle in disturbance

IEEE/RSJ International Conference on Intelligent Robots and System ◽

10.1109/irds.2002.1041505 ◽

2003 ◽

Cited By ~ 7

Author(s):

H. Kawano ◽

T. Ura

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Autonomous Underwater Vehicle ◽

Learning Algorithm ◽

Underwater Vehicle ◽

Reinforcement Learning Algorithm

Download Full-text

Reinforcement learning control of robot manipulator

Revista Brasileira de Computação Aplicada ◽

10.5335/rbca.v13i3.12091 ◽

2021 ◽

Vol 13 (3) ◽

pp. 42-53

Author(s):

Lucas Pereira Cotrim ◽

Marcos Menon José ◽

Eduardo Lobo Lustosa Cabral

Keyword(s):

Reinforcement Learning ◽

Robot Manipulator ◽

Industrial Robot ◽

Industrial Applications ◽

Value Iteration ◽

Robot Arm ◽

Training Time ◽

Reward Function ◽

Simulated Environment ◽

Reward Functions

Since the establishment of robotics in industrial applications, industrial robot programming involves therepetitive and time-consuming process of manually specifying a fixed trajectory, which results in machineidle time in terms of production and the necessity of completely reprogramming the robot for different tasks.The increasing number of robotics applications in unstructured environments requires not only intelligent butalso reactive controllers, due to the unpredictability of the environment and safety measures respectively. This paper presents a comparative analysis of two classes of Reinforcement Learning algorithms, value iteration (Q-Learning/DQN) and policy iteration (REINFORCE), applied to the discretized task of positioning a robotic manipulator in an obstacle-filled simulated environment, with no previous knowledge of the obstacles’ positions or of the robot arm dynamics. The agent’s performance and algorithm convergence are analyzed under different reward functions and on four increasingly complex test projects: 1-Degree of Freedom (DOF) robot, 2-DOF robot, Kuka KR16 Industrial robot, Kuka KR16 Industrial robot with random setpoint/obstacle placement. The DQN algorithm presented significantly better performance and reduced training time across all test projects and the third reward function generated better agents for both algorithms.

Download Full-text

Motion Planning with Energy Reduction for a Floating Robotic Platform Under Disturbances and Measurement Noise Using Reinforcement Learning

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213018600059 ◽

2018 ◽

Vol 27 (04) ◽

pp. 1860005 ◽

Cited By ~ 1

Author(s):

Konstantinos Tziortziotis ◽

Nikolaos Tziortziotis ◽

Kostas Vlachos ◽

Konstantinos Blekas

Keyword(s):

Reinforcement Learning ◽

Energy Consumption ◽

Degrees Of Freedom ◽

Optimal Path ◽

Iteration Scheme ◽

Measurement Noise ◽

Target Point ◽

Level Control ◽

Reward Function ◽

Marine Platform

This paper investigates the use of reinforcement learning for the navigation of an over-actuated, i.e. more control inputs than degrees of freedom, marine platform in unknown environment. The proposed approach uses an online least-squared policy iteration scheme for value function approximation in order to estimate optimal policy, in conjunction with a low-level control system that controls the magnitude of the linear velocity, and the orientation of the platform. Primary goal of the proposed scheme is the reduction of the consumed energy. To that end, we propose a variable reward function that depends on the energy consumption of the platform. We evaluate our approach in a complex and realistic simulation environment and report results concerning its performance on estimating optimal navigation policies under different environmental disturbances, and position GPS measurement noise. The proposed framework is compared, in terms of energy consumption, to a baseline approach based on virtual potential fields. The results show that the marine platform successfully discovers the target point following a sub-optimal path, maintaining reduced energy consumption.

Download Full-text