Navigation in Unknown Dynamic Environments Based on Deep Reinforcement Learning

Junjie Zeng; Rusheng Ju; Long Qin; Yue Hu; Quanjun Yin; Cong Hu

doi:10.3390/s19183837

Navigation in Unknown Dynamic Environments Based on Deep Reinforcement Learning

Sensors ◽

10.3390/s19183837 ◽

2019 ◽

Vol 19 (18) ◽

pp. 3837 ◽

Cited By ~ 7

Author(s):

Junjie Zeng ◽

Rusheng Ju ◽

Long Qin ◽

Yue Hu ◽

Quanjun Yin ◽

...

Keyword(s):

Reinforcement Learning ◽

Domain Knowledge ◽

Moving Objects ◽

Dynamic Environment ◽

Dynamic Environments ◽

Continuous Control ◽

Complex Environments ◽

Reward Function ◽

Knowledge Based ◽

Task Architecture

In this paper, we propose a novel Deep Reinforcement Learning (DRL) algorithm which can navigate non-holonomic robots with continuous control in an unknown dynamic environment with moving obstacles. We call the approach MK-A3C (Memory and Knowledge-based Asynchronous Advantage Actor-Critic) for short. As its first component, MK-A3C builds a GRU-based memory neural network to enhance the robot’s capability for temporal reasoning. Robots without it tend to suffer from a lack of rationality in face of incomplete and noisy estimations for complex environments. Additionally, robots with certain memory ability endowed by MK-A3C can avoid local minima traps by estimating the environmental model. Secondly, MK-A3C combines the domain knowledge-based reward function and the transfer learning-based training task architecture, which can solve the non-convergence policies problems caused by sparse reward. These improvements of MK-A3C can efficiently navigate robots in unknown dynamic environments, and satisfy kinetic constraints while handling moving objects. Simulation experiments show that compared with existing methods, MK-A3C can realize successful robotic navigation in unknown and challenging environments by outputting continuous acceleration commands.

Download Full-text

TOWARDS CONTINUOUS CONTROL FOR MOBILE ROBOT NAVIGATION: A REINFORCEMENT LEARNING AND SLAM BASED APPROACH

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w13-857-2019 ◽

2019 ◽

Vol XLII-2/W13 ◽

pp. 857-863 ◽

Cited By ~ 2

Author(s):

K. A. A. Mustafa ◽

N. Botteghi ◽

B. Sirmacek ◽

M. Poel ◽

S. Stramigioli

Keyword(s):

Reinforcement Learning ◽

Mobile Robot ◽

Target Location ◽

Mobile Robot Navigation ◽

Continuous Control ◽

Local Optima ◽

Reward Function ◽

Angular Velocities ◽

Planning Algorithm ◽

Particle Filter Algorithm

<p><strong>Abstract.</strong> We introduce a new autonomous path planning algorithm for mobile robots for reaching target locations in an unknown environment where the robot relies on its on-board sensors. In particular, we describe the design and evaluation of a deep reinforcement learning motion planner with continuous linear and angular velocities to navigate to a desired target location based on deep deterministic policy gradient (DDPG). Additionally, the algorithm is enhanced by making use of the available knowledge of the environment provided by a grid-based SLAM with Rao-Blackwellized particle filter algorithm in order to shape the reward function in an attempt to improve the convergence rate, escape local optima and reduce the number of collisions with the obstacles. A comparison is made between a reward function shaped based on the map provided by the SLAM algorithm and a reward function when no knowledge of the map is available. Results show that the required learning time has been decreased in terms of number of episodes required to converge, which is 560 episodes compared to 1450 episodes in the standard RL algorithm, after adopting the proposed approach and the number of obstacle collision is reduced as well with a success ratio of 83% compared to 56% in the standard RL algorithm. The results are validated in a simulated experiment on a skid-steering mobile robot.</p>

Download Full-text

Reinforcement learning for dynamic environment: a classification of dynamic environments and a detection method of environmental changes

Artificial Life and Robotics ◽

10.1007/s10015-013-0106-0 ◽

2013 ◽

Vol 18 (1-2) ◽

pp. 104-108 ◽

Cited By ~ 2

Author(s):

Masato Nagayoshi ◽

Hajime Murao ◽

H. Tamaki

Keyword(s):

Reinforcement Learning ◽

Environmental Changes ◽

Detection Method ◽

Dynamic Environment ◽

Dynamic Environments

Download Full-text

A Q-learning approach based on human reasoning for navigation in a dynamic environment

Robotica ◽

10.1017/s026357471800111x ◽

2018 ◽

Vol 37 (3) ◽

pp. 445-468 ◽

Cited By ~ 2

Author(s):

Rupeng Yuan ◽

Fuhai Zhang ◽

Yu Wang ◽

Yili Fu ◽

Shuguo Wang

Keyword(s):

State Space ◽

Dynamic Environment ◽

Human Perception ◽

Dynamic Environments ◽

Robot Dynamics ◽

Learning Approach ◽

Human Reasoning ◽

Q Learning ◽

Reward Function ◽

Model Free

SUMMARYA Q-learning approach is often used for navigation in static environments where state space is easy to define. In this paper, a new Q-learning approach is proposed for navigation in dynamic environments by imitating human reasoning. As a model-free method, a Q-learning method does not require the environmental model in advance. The state space and the reward function in the proposed approach are defined according to human perception and evaluation, respectively. Specifically, approximate regions instead of accurate measurements are used to define states. Moreover, due to the limitation of robot dynamics, actions for each state are calculated by introducing a dynamic window that takes robot dynamics into account. The conducted tests show that the obstacle avoidance rate of the proposed approach can reach 90.5% after training, and the robot can always operate below the dynamics limitation.

Download Full-text

A Survey of Low-Cost 3D Laser Scanning Technology

Applied Sciences ◽

10.3390/app11093938 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3938

Author(s):

Shusheng Bi ◽

Chang Yuan ◽

Chang Liu ◽

Jun Cheng ◽

Wei Wang ◽

...

Keyword(s):

Laser Scanning ◽

Moving Objects ◽

Low Cost ◽

Dynamic Environment ◽

Dynamic Environments ◽

Future Research ◽

3D Laser Scanning ◽

Time Performance ◽

Improving Accuracy ◽

3D Lidar

By moving a commercial 2D LiDAR, 3D maps of the environment can be built, based on the data of a 2D LiDAR and its movements. Compared to a commercial 3D LiDAR, a moving 2D LiDAR is more economical. A series of problems need to be solved in order for a moving 2D LiDAR to perform better, among them, improving accuracy and real-time performance. In order to solve these problems, estimating the movements of a 2D LiDAR, and identifying and removing moving objects in the environment, are issues that should be studied. More specifically, calibrating the installation error between the 2D LiDAR and the moving unit, the movement estimation of the moving unit, and identifying moving objects at low scanning frequencies, are involved. As actual applications are mostly dynamic, and in these applications, a moving 2D LiDAR moves between multiple moving objects, we believe that, for a moving 2D LiDAR, how to accurately construct 3D maps in dynamic environments will be an important future research topic. Moreover, how to deal with moving objects in a dynamic environment via a moving 2D LiDAR has not been solved by previous research.

Download Full-text

Deep Reinforcement Learning for Vectored Thruster Autonomous Underwater Vehicle Control

Complexity ◽

10.1155/2021/6649625 ◽

2021 ◽

Vol 2021 ◽

pp. 1-25

Author(s):

Tao Liu ◽

Yuli Hu ◽

Hui Xu

Keyword(s):

Control System ◽

Reinforcement Learning ◽

Autonomous Underwater Vehicle ◽

Autonomous Underwater Vehicles ◽

Continuous Control ◽

Trial And Error ◽

Simulation Environment ◽

Reward Function ◽

Underwater Vehicle Control ◽

Control Accuracy

Autonomous underwater vehicles (AUVs) are widely used to accomplish various missions in the complex marine environment; the design of a control system for AUVs is particularly difficult due to the high nonlinearity, variations in hydrodynamic coefficients, and external force from ocean currents. In this paper, we propose a controller based on deep reinforcement learning (DRL) in a simulation environment for studying the control performance of the vectored thruster AUV. RL is an important method of artificial intelligence that can learn behavior through trial-and-error interactions with the environment, so it does not need to provide an accurate AUV control model that is very hard to establish. The proposed RL algorithm only uses the information that can be measured by sensors inside the AUVs as the input parameters, and the outputs of the designed controller are the continuous control actions, which are the commands that are set to the vectored thruster. Moreover, a reward function is developed for deep RL controller considering different factors which actually affect the control accuracy of AUV navigation control. To confirm the algorithm’s effectiveness, a series of simulations are carried out in the designed simulation environment, which is a method to save time and improve efficiency. Simulation results prove the feasibility of the deep RL algorithm applied to the control system for AUV. Furthermore, our work also provides an optional method for robot control problems to deal with improving technology requirements and complicated application environments.

Download Full-text

Dynamic Path Planning of Unknown Environment Based on Deep Reinforcement Learning

Journal of Robotics ◽

10.1155/2018/5781591 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 14

Author(s):

Xiaoyun Lei ◽

Zhian Zhang ◽

Peifang Dong

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Learning Algorithm ◽

Dynamic Environment ◽

Target Position ◽

Dynamic Environments ◽

Unknown Environment ◽

Starting Position ◽

Dynamic Path Planning ◽

Dynamic Path

Dynamic path planning of unknown environment has always been a challenge for mobile robots. In this paper, we apply double Q-network (DDQN) deep reinforcement learning proposed by DeepMind in 2016 to dynamic path planning of unknown environment. The reward and punishment function and the training method are designed for the instability of the training stage and the sparsity of the environment state space. In different training stages, we dynamically adjust the starting position and target position. With the updating of neural network and the increase of greedy rule probability, the local space searched by agent is expanded. Pygame module in PYTHON is used to establish dynamic environments. Considering lidar signal and local target position as the inputs, convolutional neural networks (CNNs) are used to generalize the environmental state. Q-learning algorithm enhances the ability of the dynamic obstacle avoidance and local planning of the agents in environment. The results show that, after training in different dynamic environments and testing in a new environment, the agent is able to reach the local target position successfully in unknown dynamic environment.

Download Full-text

Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm

PLoS ONE ◽

10.1371/journal.pone.0252754 ◽

2021 ◽

Vol 16 (6) ◽

pp. e0252754

Author(s):

Nesma M. Ashraf ◽

Reham R. Mostafa ◽

Rasha H. Sakr ◽

M. Z. Rashad

Keyword(s):

Reinforcement Learning ◽

Optimization Algorithm ◽

Autonomous Driving ◽

Whale Optimization Algorithm ◽

Complex Environments ◽

Reward Function ◽

Processing Times ◽

Whale Optimization ◽

Total Rewards ◽

Policy Gradient

Deep Reinforcement Learning (DRL) enables agents to make decisions based on a well-designed reward function that suites a particular environment without any prior knowledge related to a given environment. The adaptation of hyperparameters has a great impact on the overall learning process and the learning processing times. Hyperparameters should be accurately estimated while training DRL algorithms, which is one of the key challenges that we attempt to address. This paper employs a swarm-based optimization algorithm, namely the Whale Optimization Algorithm (WOA), for optimizing the hyperparameters of the Deep Deterministic Policy Gradient (DDPG) algorithm to achieve the optimum control strategy in an autonomous driving control problem. DDPG is capable of handling complex environments, which contain continuous spaces for actions. To evaluate the proposed algorithm, the Open Racing Car Simulator (TORCS), a realistic autonomous driving simulation environment, was chosen to its ease of design and implementation. Using TORCS, the DDPG agent with optimized hyperparameters was compared with a DDPG agent with reference hyperparameters. The experimental results showed that the DDPG’s hyperparameters optimization leads to maximizing the total rewards, along with testing episodes and maintaining a stable driving policy.

Download Full-text

Integrating a Path Planner and an Adaptive Motion Controller for Navigation in Dynamic Environments

Applied Sciences ◽

10.3390/app9071384 ◽

2019 ◽

Vol 9 (7) ◽

pp. 1384 ◽

Cited By ~ 3

Author(s):

Junjie Zeng ◽

Long Qin ◽

Yue Hu ◽

Quanjun Yin ◽

Cong Hu

Keyword(s):

Dynamic Environments ◽

Complex Environments ◽

Local Motion ◽

Motion Controller ◽

Jump Point ◽

Reward Function ◽

Noisy Information ◽

Reward Shaping ◽

Point Search ◽

Path Planner

Since an individual approach can hardly navigate robots through complex environments, we present a novel two-level hierarchical framework called JPS-IA3C (Jump Point Search improved Asynchronous Advantage Actor-Critic) in this paper for robot navigation in dynamic environments through continuous controlling signals. Its global planner JPS+ (P) is a variant of JPS (Jump Point Search), which efficiently computes an abstract path of neighboring jump points. These nodes, which are seen as subgoals, completely rid Deep Reinforcement Learning (DRL)-based controllers of notorious local minima. To satisfy the kinetic constraints and be adaptive to changing environments, we propose an improved A3C (IA3C) algorithm to learn the control policies of the robots’ local motion. Moreover, the combination of modified curriculum learning and reward shaping helps IA3C build a novel reward function framework to avoid learning inefficiency because of sparse reward. We additionally strengthen the robots’ temporal reasoning of the environments by a memory-based network. These improvements make the IA3C controller converge faster and become more adaptive to incomplete, noisy information caused by partial observability. Simulated experiments show that compared with existing methods, this JPS-IA3C hierarchy successfully outputs continuous commands to accomplish large-range navigation tasks at shorter paths and less time through reasonable subgoal selection and rational motions.

Download Full-text

Leveraging Conventional Control to Improve Performance of Systems Using Reinforcement Learning

Volume 2: Intelligent Transportation/Vehicles; Manufacturing; Mechatronics; Engine/After-Treatment Systems; Soft Actuators/Manipulators; Modeling/Validation; Motion/Vibration Control Applications; Multi-Agent/Networked Systems; Path Planning/Motion Control; Renewable/Smart Energy Systems; Security/Privacy of Cyber-Physical Systems; Sensors/Actuators; Tracking Control Systems; Unmanned Ground/Aerial Vehicles; Vehicle Dynamics, Estimation, Control; Vibration/Control Systems; Vibrations ◽

10.1115/dscc2020-3307 ◽

2020 ◽

Author(s):

Gerald Eaglin ◽

Joshua Vaughan

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Domain Knowledge ◽

Training Time ◽

Reward Function ◽

Model Based ◽

Model Free ◽

Optimal Controllers ◽

Dynamics And Control ◽

Learning Reinforcement

Abstract While many model-based methods have been proposed for optimal control, it is often difficult to generate model-based optimal controllers for nonlinear systems. One model-free method to solve for optimal control policies is reinforcement learning. Reinforcement learning iteratively trains an agent to optimize a reward function. However, agents often perform poorly at the beginning of training and require a large number of trials to converge to a successful policy. A method is proposed to incorporate domain knowledge of dynamics and control into the controllers using reinforcement learning to reduce the training time needed. Simulations are presented to compare the performance of agents utilizing domain knowledge to those that do not use domain knowledge. The results show that the agents with domain knowledge can accomplish the desired task with less training time than those without domain knowledge.

Download Full-text

Inverse Reinforcement Learning Through Max-Margin Algorithm

Intelligent Environments 2021 - Ambient Intelligence and Smart Environments ◽

10.3233/aise210096 ◽

2021 ◽

Author(s):

Syed Ihtesham Hussain Shah ◽

Antonio Coronato

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Dynamic Environment ◽

Robotic Navigation ◽

Inverse Reinforcement Learning ◽

Navigation Problem ◽

Reward Function ◽

Reward Functions ◽

Function Inverse ◽

Better Than

Reinforcement Learning (RL) methods provide a solution for decision-making problems under uncertainty. An agent finds a suitable policy through a reward function by interacting with a dynamic environment. However, for complex and large problems it is very difficult to specify and tune the reward function. Inverse Reinforcement Learning (IRL) may mitigate this problem by learning the reward function through expert demonstrations. This work exploits an IRL method named Max-Margin Algorithm (MMA) to learn the reward function for a robotic navigation problem. The learned reward function reveals the demonstrated policy (expert policy) better than all other policies. Results show that this method has better convergence and learned reward functions through the adopted method represents expert behavior more efficiently.

Download Full-text