Deep Reinforcement Learning Approach with Multiple Experience Pools for UAV’s Autonomous Motion Planning in Complex Unknown Environments

Zijian Hu; Kaifang Wan; Xiaoguang Gao; Yiwei Zhai; Qianglong Wang

doi:10.3390/s20071890

Deep Reinforcement Learning Approach with Multiple Experience Pools for UAV’s Autonomous Motion Planning in Complex Unknown Environments

Sensors ◽

10.3390/s20071890 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1890 ◽

Cited By ~ 6

Author(s):

Zijian Hu ◽

Kaifang Wan ◽

Xiaoguang Gao ◽

Yiwei Zhai ◽

Qianglong Wang

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Predictive Control ◽

Experimental Testing ◽

The Novel ◽

Simulation Environment ◽

Unknown Environments ◽

Autonomous Motion ◽

Policy Gradient ◽

Speed Up

Autonomous motion planning (AMP) of unmanned aerial vehicles (UAVs) is aimed at enabling a UAV to safely fly to the target without human intervention. Recently, several emerging deep reinforcement learning (DRL) methods have been employed to address the AMP problem in some simplified environments, and these methods have yielded good results. This paper proposes a multiple experience pools (MEPs) framework leveraging human expert experiences for DRL to speed up the learning process. Based on the deep deterministic policy gradient (DDPG) algorithm, a MEP–DDPG algorithm was designed using model predictive control and simulated annealing to generate expert experiences. On applying this algorithm to a complex unknown simulation environment constructed based on the parameters of the real UAV, the training experiment results showed that the novel DRL algorithm resulted in a performance improvement exceeding 20% as compared with the state-of-the-art DDPG. The results of the experimental testing indicate that UAVs trained using MEP–DDPG can stably complete a variety of tasks in complex, unknown environments.

Download Full-text

Relevant experience learning: A Deep Reinforcement Learning method for UAV Autonomous Motion Planning in complex unknown environments

Chinese Journal of Aeronautics ◽

10.1016/j.cja.2020.12.027 ◽

2021 ◽

Author(s):

Zijian HU ◽

Xiaoguang GAO ◽

Kaifang WAN ◽

Yiwei ZHAI ◽

Qianglong WANG

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Learning Method ◽

Unknown Environments ◽

Autonomous Motion

Download Full-text

A RDA-Based Deep Reinforcement Learning Approach for Autonomous Motion Planning of UAV in Dynamic Unknown Environments

Journal of Physics Conference Series ◽

10.1088/1742-6596/1487/1/012006 ◽

2020 ◽

Vol 1487 ◽

pp. 012006

Author(s):

Kaifang WAN ◽

Xiaoguang GAO ◽

Zijian HU ◽

Wei ZHANG

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Learning Approach ◽

Unknown Environments ◽

Autonomous Motion

Download Full-text

Deep reinforcement learning based control for Autonomous Vehicles in CARLA

Multimedia Tools and Applications ◽

10.1007/s11042-021-11437-3 ◽

2022 ◽

Author(s):

Óscar Pérez-Gil ◽

Rafael Barea ◽

Elena López-Guillén ◽

Luis M. Bergasa ◽

Carlos Gómez-Huélamo ◽

...

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Autonomous Vehicle ◽

Vehicle Control ◽

Data Sources ◽

Simulation Environment ◽

Urban Simulation ◽

Policy Gradient ◽

Almost All ◽

Control Layer

AbstractNowadays, Artificial Intelligence (AI) is growing by leaps and bounds in almost all fields of technology, and Autonomous Vehicles (AV) research is one more of them. This paper proposes the using of algorithms based on Deep Learning (DL) in the control layer of an autonomous vehicle. More specifically, Deep Reinforcement Learning (DRL) algorithms such as Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) are implemented in order to compare results between them. The aim of this work is to obtain a trained model, applying a DRL algorithm, able of sending control commands to the vehicle to navigate properly and efficiently following a determined route. In addition, for each of the algorithms, several agents are presented as a solution, so that each of these agents uses different data sources to achieve the vehicle control commands. For this purpose, an open-source simulator such as CARLA is used, providing to the system with the ability to perform a multitude of tests without any risk into an hyper-realistic urban simulation environment, something that is unthinkable in the real world. The results obtained show that both DQN and DDPG reach the goal, but DDPG obtains a better performance. DDPG perfoms trajectories very similar to classic controller as LQR. In both cases RMSE is lower than 0.1m following trajectories with a range 180-700m. To conclude, some conclusions and future works are commented.

Download Full-text

Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards

International Journal of Advanced Robotic Systems ◽

10.1177/1729881419898342 ◽

2020 ◽

Vol 17 (1) ◽

pp. 172988141989834

Author(s):

Guoyu Zuo ◽

Qishen Zhao ◽

Jiahao Lu ◽

Jiangeng Li

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Learning To Learn ◽

Model Free ◽

Learning Speed ◽

Policy Gradient ◽

Experience Replay ◽

Speed Up ◽

Reward Functions ◽

Robotic Tasks

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.

Download Full-text

Data-Based Predictive Control via Multistep Policy Gradient Reinforcement Learning

IEEE Transactions on Cybernetics ◽

10.1109/tcyb.2021.3121078 ◽

2021 ◽

pp. 1-11

Author(s):

Xindi Yang ◽

Hao Zhang ◽

Zhuping Wang ◽

Huaicheng Yan ◽

Changzhu Zhang

Keyword(s):

Reinforcement Learning ◽

Predictive Control ◽

Policy Gradient

Download Full-text

Mapless Motion Planning System for an Autonomous Underwater Vehicle Using Policy Gradient-based Deep Reinforcement Learning

Journal of Intelligent & Robotic Systems ◽

10.1007/s10846-019-01004-2 ◽

2019 ◽

Vol 96 (3-4) ◽

pp. 591-601 ◽

Cited By ~ 4

Author(s):

Yushan Sun ◽

Junhan Cheng ◽

Guocheng Zhang ◽

Hao Xu

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Autonomous Underwater Vehicle ◽

Underwater Vehicle ◽

Planning System ◽

Policy Gradient ◽

Gradient Based

Download Full-text

A review of motion planning algorithms for intelligent robots

Journal of Intelligent Manufacturing ◽

10.1007/s10845-021-01867-z ◽

2021 ◽

Author(s):

Chengmin Zhou ◽

Bingding Huang ◽

Pasi Fränti

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Q Learning ◽

Learning Network ◽

Gradient Algorithms ◽

Optimal Value ◽

Policy Gradient ◽

Planning Algorithms

AbstractPrinciples of typical motion planning algorithms are investigated and analyzed in this paper. These algorithms include traditional planning algorithms, classical machine learning algorithms, optimal value reinforcement learning, and policy gradient reinforcement learning. Traditional planning algorithms investigated include graph search algorithms, sampling-based algorithms, interpolating curve algorithms, and reaction-based algorithms. Classical machine learning algorithms include multiclass support vector machine, long short-term memory, Monte-Carlo tree search and convolutional neural network. Optimal value reinforcement learning algorithms include Q learning, deep Q-learning network, double deep Q-learning network, dueling deep Q-learning network. Policy gradient algorithms include policy gradient method, actor-critic algorithm, asynchronous advantage actor-critic, advantage actor-critic, deterministic policy gradient, deep deterministic policy gradient, trust region policy optimization and proximal policy optimization. New general criteria are also introduced to evaluate the performance and application of motion planning algorithms by analytical comparisons. The convergence speed and stability of optimal value and policy gradient algorithms are specially analyzed. Future directions are presented analytically according to principles and analytical comparisons of motion planning algorithms. This paper provides researchers with a clear and comprehensive understanding about advantages, disadvantages, relationships, and future of motion planning algorithms in robots, and paves ways for better motion planning algorithms in academia, engineering, and manufacturing.

Download Full-text

Coordinated Motion Planning of Dual-arm Space Robot with Deep Reinforcement Learning

2019 IEEE International Conference on Unmanned Systems (ICUS) ◽

10.1109/icus48101.2019.8996069 ◽

2019 ◽

Author(s):

Mengying Tang ◽

Xiaofei Yue ◽

Zhan Zuo ◽

Xiaoping Huang ◽

Yanfang Liu ◽

...

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Space Robot ◽

Coordinated Motion ◽

Dual Arm

Download Full-text

Autonomous Motion Planning and Learning Control of a Biped Locomotive Robot

IFAC Proceedings Volumes ◽

10.1016/s1474-6670(17)51736-x ◽

1990 ◽

Vol 23 (8) ◽

pp. 205-210 ◽

Cited By ~ 1

Author(s):

S. Kitamura ◽

Y. Kurematsu

Keyword(s):

Motion Planning ◽

Learning Control ◽

Autonomous Motion

Download Full-text

Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method

2021 American Control Conference (ACC) ◽

10.23919/acc50511.2021.9482765 ◽

2021 ◽

Author(s):

Sebastien Gros ◽

Mario Zanon

Keyword(s):

Reinforcement Learning ◽

Gradient Method ◽

Policy Gradient

Download Full-text