UAV Maneuvering Target Tracking in Uncertain Environments Based on Deep Reinforcement Learning and Meta-Learning

Bo Li; Zhigang Gan; Daqing Chen; Dyachenko Sergey Aleksandrovich

doi:10.3390/rs12223789

UAV Maneuvering Target Tracking in Uncertain Environments Based on Deep Reinforcement Learning and Meta-Learning

Remote Sensing ◽

10.3390/rs12223789 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3789

Author(s):

Bo Li ◽

Zhigang Gan ◽

Daqing Chen ◽

Dyachenko Sergey Aleksandrovich

Keyword(s):

Reinforcement Learning ◽

Target Tracking ◽

Uncertain Environments ◽

Target Movement ◽

Maneuvering Target Tracking ◽

Novel Approach ◽

Policy Gradient ◽

Meta Learning ◽

Experience Replay ◽

Task Experience

This paper combines deep reinforcement learning (DRL) with meta-learning and proposes a novel approach, named meta twin delayed deep deterministic policy gradient (Meta-TD3), to realize the control of unmanned aerial vehicle (UAV), allowing a UAV to quickly track a target in an environment where the motion of a target is uncertain. This approach can be applied to a variety of scenarios, such as wildlife protection, emergency aid, and remote sensing. We consider a multi-task experience replay buffer to provide data for the multi-task learning of the DRL algorithm, and we combine meta-learning to develop a multi-task reinforcement learning update method to ensure the generalization capability of reinforcement learning. Compared with the state-of-the-art algorithms, namely the deep deterministic policy gradient (DDPG) and twin delayed deep deterministic policy gradient (TD3), experimental results show that the Meta-TD3 algorithm has achieved a great improvement in terms of both convergence value and convergence rate. In a UAV target tracking problem, Meta-TD3 only requires a few steps to train to enable a UAV to adapt quickly to a new target movement mode more and maintain a better tracking effectiveness.

Download Full-text

Deep Reinforcement Learning Based Left-Turn Connected and Automated Vehicle Control at Signalized Intersection in Vehicle-to-Infrastructure Environment

Information ◽

10.3390/info11020077 ◽

2020 ◽

Vol 11 (2) ◽

pp. 77 ◽

Cited By ~ 1

Author(s):

Juan Chen ◽

Zhengxuan Xue ◽

Daiqian Fan

Keyword(s):

Reinforcement Learning ◽

Control Method ◽

Signalized Intersection ◽

Signal Control ◽

Left Turn ◽

Automated Vehicle ◽

Whole Process ◽

Policy Gradient ◽

Experience Replay ◽

Automated Vehicle Control

In order to solve the problem of vehicle delay caused by stops at signalized intersections, a micro-control method of a left-turning connected and automated vehicle (CAV) based on an improved deep deterministic policy gradient (DDPG) is designed in this paper. In this paper, the micro-control of the whole process of a left-turn vehicle approaching, entering, and leaving a signalized intersection is considered. In addition, in order to solve the problems of low sampling efficiency and overestimation of the critic network of the DDPG algorithm, a positive and negative reward experience replay buffer sampling mechanism and multi-critic network structure are adopted in the DDPG algorithm in this paper. Finally, the effectiveness of the signal control method, six DDPG-based methods (DDPG, PNRERB-1C-DDPG, PNRERB-3C-DDPG, PNRERB-5C-DDPG, PNRERB-5CNG-DDPG, and PNRERB-7C-DDPG), and four DQN-based methods (DQN, Dueling DQN, Double DQN, and Prioritized Replay DQN) are verified under 0.2, 0.5, and 0.7 saturation degrees of left-turning vehicles at a signalized intersection within a VISSIM simulation environment. The results show that the proposed deep reinforcement learning method can get a number of stops benefits ranging from 5% to 94%, stop time benefits ranging from 1% to 99%, and delay benefits ranging from −17% to 93%, respectively compared with the traditional signal control method.

Download Full-text

Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning

Symmetry ◽

10.3390/sym11111352 ◽

2019 ◽

Vol 11 (11) ◽

pp. 1352 ◽

Cited By ~ 1

Author(s):

Kim ◽

Park

Keyword(s):

Reinforcement Learning ◽

Experimental Comparison ◽

Continuous Control ◽

Policy Gradient ◽

Experience Replay ◽

Discrete Action ◽

Original Goal ◽

Efficient Exploration ◽

Greedy Policy ◽

Theoretical Results

In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks (DQNs), can demonstrate efficient exploration behavior. A random ε-greedy policy exploits additional replay buffers in an environment of sparse and binary rewards, such as in the real-time online detection of network securities by verifying whether the network is “normal or anomalous.” Prior studies have illustrated that a prioritized replay memory attributed to a complex temporal difference error provides superior theoretical results. However, another implementation illustrated that in certain environments, the prioritized replay memory is not superior to the randomly-selected buffers of random ε-greedy policy. Moreover, a key challenge of hindsight experience replay inspires our objective by using additional buffers corresponding to each different goal. Therefore, we attempt to exploit multiple random ε-greedy buffers to enhance explorations for a more near-perfect generalization with one original goal in off-policy RL. We demonstrate the benefit of off-policy learning from our method through an experimental comparison of DQN and a deep deterministic policy gradient in terms of discrete action, as well as continuous control for complete symmetric environments.

Download Full-text

Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards

International Journal of Advanced Robotic Systems ◽

10.1177/1729881419898342 ◽

2020 ◽

Vol 17 (1) ◽

pp. 172988141989834

Author(s):

Guoyu Zuo ◽

Qishen Zhao ◽

Jiahao Lu ◽

Jiangeng Li

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Learning To Learn ◽

Model Free ◽

Learning Speed ◽

Policy Gradient ◽

Experience Replay ◽

Speed Up ◽

Reward Functions ◽

Robotic Tasks

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.

Download Full-text

Novel Approach for Nonlinear Maneuvering Target Tracking Based on Input Estimation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.110-116.4415 ◽

2011 ◽

Vol 110-116 ◽

pp. 4415-4423 ◽

Cited By ~ 1

Author(s):

Hodjat Rahmati ◽

Hamid Khaloozadeh ◽

Moosa Ayati

Keyword(s):

Target Tracking ◽

Maneuvering Target Tracking ◽

Input Estimation ◽

Maneuvering Target ◽

Detection Delay ◽

Maneuvering Targets ◽

Novel Approach ◽

Nonlinear Input ◽

Highly Nonlinear ◽

Concurrent Estimation

In this paper, a new method for maneuvering target tracking (MTT) based on nonlinear input estimation (IE) is innovated and is employed for tracking of maneuvering targets. Proposed method augments the states and unknown inputs (maneuvers) in a higher order state space realization and estimates both of them simultaneously. The concurrent estimation of states and inputs eliminates the maneuver detection delay which is popular in the conventional IE methods. The proposed method has excellent performance for both maneuvering and non-maneuvering stages. Also, a model with highly nonlinear dynamics for maneuvering targets is given and is used in the numerical simulations to analyze the performance of the proposed maneuvering target tracking method. Furthermore the proposed method is compared with a conventional IE method and the simulation results show the effectiveness of the proposed method.

Download Full-text

Robust Motion Control for UAV in Dynamic Uncertain Environments Using Deep Reinforcement Learning

Remote Sensing ◽

10.3390/rs12040640 ◽

2020 ◽

Vol 12 (4) ◽

pp. 640 ◽

Cited By ~ 6

Author(s):

Kaifang Wan ◽

Xiaoguang Gao ◽

Zijian Hu ◽

Gaofeng Wu

Keyword(s):

Reinforcement Learning ◽

High Efficiency ◽

Continuous Control ◽

Uncertain Environments ◽

Dual Channel ◽

Policy Gradient ◽

Aerial Vehicle ◽

Adversarial Attack ◽

Robust Motion Control ◽

Remote Surveillance

In this paper, a novel deep reinforcement learning (DRL) method, and robust deep deterministic policy gradient (Robust-DDPG), is proposed for developing a controller that allows robust flying of an unmanned aerial vehicle (UAV) in dynamic uncertain environments. This technique is applicable in many fields, such as penetration and remote surveillance. The learning-based controller is constructed with an actor-critic framework, and can perform a dual-channel continuous control (roll and speed) of the UAV. To overcome the fragility and volatility of original DDPG, three critical learning tricks are introduced in Robust-DDPG: (1) Delayed-learning trick, providing stable learnings, while facing dynamic environments; (2) adversarial attack trick, improving policy’s adaptability to uncertain environments; (3) mixed exploration trick, enabling faster convergence of the model. The training experiments show great improvement in its convergence speed, convergence effect, and stability. The exploiting experiments demonstrate high efficiency in providing the UAV a shorter and smoother path. While, the generalization experiments verify its better adaptability to complicated, dynamic and uncertain environments, comparing to Deep Q Network (DQN) and DDPG algorithms.

Download Full-text

Deep Reinforcement Learning for the Navigation of Neurovascular Catheters

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2019-0002 ◽

2019 ◽

Vol 5 (1) ◽

pp. 5-8 ◽

Cited By ~ 2

Author(s):

Tobias Behr ◽

Tim Philipp Pusch ◽

Marius Siegfarth ◽

Dominik Hüsener ◽

Tobias Mörschel ◽

...

Keyword(s):

Reinforcement Learning ◽

Future Research ◽

Training Performance ◽

Heart Attacks ◽

Policy Gradient ◽

Experience Replay ◽

Life Threatening ◽

Average Success Rate ◽

Time Critical ◽

Human Demonstration

AbstractEndovascular catheters are necessary for state-ofthe- art treatments of life-threatening and time-critical diseases like strokes and heart attacks. Navigating them through the vascular tree is a highly challenging task. We present our preliminary results for the autonomous control of a guidewire through a vessel phantom with the help of Deep Reinforcement Learning. We trained Deep-Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) agents on a simulated vessel phantom and evaluated the training performance. We also investigated the effect of the two enhancements Hindsight Experience Replay (HER) and Human Demonstration (HD) on the training speed of our agents. The results show that the agents are capable of learning to navigate a guidewire from a random start point in the vessel phantom to a random goal. This is achieved with an average success rate of 86.5% for DQN and 89.6% for DDPG. The use of HER and HD significantly increases the training speed. The results are promising and future research should address more complex vessel phantoms and the use of a combination of guidewire and catheter.

Download Full-text

Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/891 ◽

2019 ◽

Author(s):

Ritesh Noothigattu ◽

Djallel Bouneffouf ◽

Nicholas Mattei ◽

Rachita Chandra ◽

Piyush Madan ◽

...

Keyword(s):

Reinforcement Learning ◽

Ethical Values ◽

Large Role ◽

Learning To Learn ◽

Inverse Reinforcement Learning ◽

Time Step ◽

Novel Approach

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using Pac-Man and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.

Download Full-text

A Multiple Model Particle Filter for Maneuvering Target Tracking Based on Composite Sampling

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2013.01152 ◽

2014 ◽

Vol 39 (7) ◽

pp. 1152-1156 ◽

Cited By ~ 3

Author(s):

Xiao WANG ◽

Chong-Zhao HAN

Keyword(s):

Particle Filter ◽

Target Tracking ◽

Multiple Model ◽

Maneuvering Target Tracking ◽

Model Particle ◽

Composite Sampling ◽

Maneuvering Target

Download Full-text

A Novel Approach to Feedback Control with Deep Reinforcement Learning

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2018.09.241 ◽

2018 ◽

Vol 51 (18) ◽

pp. 31-36 ◽

Cited By ~ 3

Author(s):

Yuan Wang ◽

Kirubakaran Velswamy ◽

Biao Huang

Keyword(s):

Reinforcement Learning ◽

Feedback Control ◽

Novel Approach

Download Full-text

Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method

2021 American Control Conference (ACC) ◽

10.23919/acc50511.2021.9482765 ◽

2021 ◽

Author(s):

Sebastien Gros ◽

Mario Zanon

Keyword(s):

Reinforcement Learning ◽

Gradient Method ◽

Policy Gradient

Download Full-text