End-to-End Deep Reinforcement Learning for Image-Based UAV Autonomous Control

Jiang Zhao; Jiaming Sun; Zhihao Cai; Longhong Wang; Yingxun Wang

doi:10.3390/app11188419

End-to-End Deep Reinforcement Learning for Image-Based UAV Autonomous Control

Applied Sciences ◽

10.3390/app11188419 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8419

Author(s):

Jiang Zhao ◽

Jiaming Sun ◽

Zhihao Cai ◽

Longhong Wang ◽

Yingxun Wang

Keyword(s):

Reinforcement Learning ◽

Network Architecture ◽

Control Method ◽

Control Policy ◽

Input Image ◽

Autonomous Control ◽

Policy Network ◽

Model Free ◽

Control Command ◽

End To End

To achieve the perception-based autonomous control of UAVs, schemes with onboard sensing and computing are popular in state-of-the-art work, which often consist of several separated modules with respective complicated algorithms. Most methods depend on handcrafted designs and prior models with little capacity for adaptation and generalization. Inspired by the research on deep reinforcement learning, this paper proposes a new end-to-end autonomous control method to simplify the separate modules in the traditional control pipeline into a single neural network. An image-based reinforcement learning framework is established, depending on the design of the network architecture and the reward function. Training is performed with model-free algorithms developed according to the specific mission, and the control policy network can map the input image directly to the continuous actuator control command. A simulation environment for the scenario of UAV landing was built. In addition, the results under different typical cases, including both the small and large initial lateral or heading angle offsets, show that the proposed end-to-end method is feasible for perception-based autonomous control.

Download Full-text

Robust Walking Control of a Lower Limb Rehabilitation Exoskeleton Coupled with a Musculoskeletal Model via Deep Reinforcement Learning

10.21203/rs.3.rs-1212542/v1 ◽

2021 ◽

Author(s):

Shuzhen Luo ◽

Ghaith Androwis ◽

Sergei Adamovich ◽

Erick Nunez ◽

Hao Su ◽

...

Keyword(s):

Reinforcement Learning ◽

Neuromuscular Disorders ◽

Control Policy ◽

Policy Network ◽

Control Parameters ◽

Robust Controller ◽

Interaction Forces ◽

Learning Framework ◽

Limb Rehabilitation ◽

Lower Limb Rehabilitation

Abstract Background: Few studies have systematically investigated robust controllers for lower limb rehabilitation exoskeletons (LLREs) that can safely and effectively assist users with a variety of neuromuscular disorders to walk with full autonomy. One of the key challenges for developing such a robust controller is to handle different degrees of uncertain human-exoskeleton interaction forces from the patients. Consequently, conventional walking controllers either are patient-condition specific or involve tuning of many control parameters, which could behave unreliably and even fail to maintain balance. Methods: We present a novel and robust controller for a LLRE based on a decoupled deep reinforcement learning framework with three independent networks, which aims to provide reliable walking assistance against various and uncertain human-exoskeleton interaction forces. The exoskeleton controller is driven by a neural network control policy that acts on a stream of the LLRE’s proprioceptive signals, including joint kinematic states, and subsequently predicts real-time position control targets for the actuated joints. To handle uncertain human-interaction forces, the control policy is trained intentionally with an integrated human musculoskeletal model and realistic human-exoskeleton interaction forces. Two other neural networks are connected with the control policy network to predict the interaction forces and muscle coordination. To further increase the robustness of the control policy, we employ domain randomization during training that includes not only randomization of exoskeleton dynamics properties but, more importantly, randomization of human muscle strength to simulate the variability of the patient’s disability. Through this decoupled deep reinforcement learning framework, the trained controller of LLREs is able to provide reliable walking assistance to the human with different degrees of neuromuscular disorders. Results and Conclusion: A universal, RL-based walking controller is trained and virtually tested on a LLRE system to verify its effectiveness and robustness in assisting users with different disabilities such as passive muscles (quadriplegic), muscle weakness, or hemiplegic conditions. An ablation study demonstrates strong robustness of the control policy under large exoskeleton dynamic property ranges and various human-exoskeleton interaction forces. The decoupled network structure allows us to isolate the LLRE control policy network for testing and sim-to-real transfer since it uses only proprioception information of the LLRE (joint sensory state) as the input. Furthermore, the controller is shown to be able to handle different patient conditions without the need for patient-specific control parameters tuning.

Download Full-text

End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr42600.2020.00718 ◽

2020 ◽

Author(s):

Marin Toromanoff ◽

Emilie Wirbel ◽

Fabien Moutarde

Keyword(s):

Reinforcement Learning ◽

Model Free ◽

End To End

Download Full-text

Model-Free Real-Time Autonomous Control for a Residential Multi-Energy System Using Deep Reinforcement Learning

IEEE Transactions on Smart Grid ◽

10.1109/tsg.2020.2976771 ◽

2020 ◽

Vol 11 (4) ◽

pp. 3068-3082 ◽

Cited By ~ 2

Author(s):

Yujian Ye ◽

Dawei Qiu ◽

Xiaodong Wu ◽

Goran Strbac ◽

Jonathan Ward

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Energy System ◽

Autonomous Control ◽

Model Free ◽

Time Autonomous

Download Full-text

Stochastic Actor-Executor-Critic for Image-to-Image Translation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/382 ◽

2021 ◽

Author(s):

Ziwei Luo ◽

Jing Hu ◽

Xin Wang ◽

Siwei Lyu ◽

Bin Kong ◽

...

Keyword(s):

Reinforcement Learning ◽

Control Policy ◽

High Dimensional ◽

Continuous Control ◽

Continuous Space ◽

Model Free ◽

Recent Success ◽

Image Translation ◽

Continuous State ◽

And Control

Training a model-free deep reinforcement learning model to solve image-to-image translation is difficult since it involves high-dimensional continuous state and action spaces. In this paper, we draw inspiration from the recent success of the maximum entropy reinforcement learning framework designed for challenging continuous control problems to develop stochastic policies over high dimensional continuous spaces including image representation, generation, and control simultaneously. Central to this method is the Stochastic Actor-Executor-Critic (SAEC) which is an off-policy actor-critic model with an additional executor to generate realistic images. Specifically, the actor focuses on the high-level representation and control policy by a stochastic latent action, as well as explicitly directs the executor to generate low-level actions to manipulate the state. Experiments on several image-to-image translation tasks have demonstrated the effectiveness and robustness of the proposed SAEC when facing high-dimensional continuous space problems.

Download Full-text

Digital Twin and Reinforcement Learning-Based Resilient Production Control for Micro Smart Factory

Applied Sciences ◽

10.3390/app11072977 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2977

Author(s):

Kyu Tae Park ◽

Yoo Ho Son ◽

Sang Wook Ko ◽

Sang Do Noh

Keyword(s):

Reinforcement Learning ◽

Production Control ◽

Control Method ◽

Production Systems ◽

Smart Factory ◽

Production Cycle ◽

Policy Network ◽

Digital Twin ◽

Event Logs ◽

Architectural Framework

To achieve efficient personalized production at an affordable cost, a modular manufacturing system (MMS) can be utilized. MMS enables restructuring of its configuration to accommodate product changes and is thus an efficient solution to reduce the costs involved in personalized production. A micro smart factory (MSF) is an MMS with heterogeneous production processes to enable personalized production. Similar to MMS, MSF also enables the restructuring of production configuration; additionally, it comprises cyber-physical production systems (CPPSs) that help achieve resilience. However, MSFs need to overcome performance hurdles with respect to production control. Therefore, this paper proposes a digital twin (DT) and reinforcement learning (RL)-based production control method. This method replaces the existing dispatching rule in the type and instance phases of the MSF. In this method, the RL policy network is learned and evaluated by coordination between DT and RL. The DT provides virtual event logs that include states, actions, and rewards to support learning. These virtual event logs are returned based on vertical integration with the MSF. As a result, the proposed method provides a resilient solution to the CPPS architectural framework and achieves appropriate actions to the dynamic situation of MSF. Additionally, applying DT with RL helps decide what-next/where-next in the production cycle. Moreover, the proposed concept can be extended to various manufacturing domains because the priority rule concept is frequently applied.

Download Full-text

Controlling colloidal crystals via morphing energy landscapes and reinforcement learning

Science Advances ◽

10.1126/sciadv.abd6716 ◽

2020 ◽

Vol 6 (48) ◽

pp. eabd6716

Author(s):

Jianli Zhang ◽

Junyan Yang ◽

Yuanxing Zhang ◽

Michael A. Bevan

Keyword(s):

Reinforcement Learning ◽

Electric Fields ◽

Large Scale ◽

Colloidal Particles ◽

Control Method ◽

Relaxation Times ◽

Colloidal Crystals ◽

Hierarchical Structures ◽

Control Policy ◽

Energy Landscapes

We report a feedback control method to remove grain boundaries and produce circular shaped colloidal crystals using morphing energy landscapes and reinforcement learning–based policies. We demonstrate this approach in optical microscopy and computer simulation experiments for colloidal particles in ac electric fields. First, we discover how tunable energy landscape shapes and orientations enhance grain boundary motion and crystal morphology relaxation. Next, reinforcement learning is used to develop an optimized control policy to actuate morphing energy landscapes to produce defect-free crystals orders of magnitude faster than natural relaxation times. Morphing energy landscapes mechanistically enable rapid crystal repair via anisotropic stresses to control defect and shape relaxation without melting. This method is scalable for up to at least N = 103 particles with mean process times scaling as N0.5. Further scalability is possible by controlling parallel local energy landscapes (e.g., periodic landscapes) to generate large-scale global defect-free hierarchical structures.

Download Full-text

Open Loop Position Control of Soft Continuum Arm Using Deep Reinforcement Learning

10.31224/osf.io/n7h9y ◽

2018 ◽

Cited By ~ 1

Author(s):

Sreeshankar Satheeshbabu ◽

Naveen Kumar Uppalapati ◽

Girish Chowdhary ◽

Girish Krishnan

Keyword(s):

Reinforcement Learning ◽

Position Control ◽

Numerical Models ◽

Control Policy ◽

Open Loop ◽

External Loading ◽

Q Learning ◽

Model Free ◽

Experience Replay ◽

The Continuum

Soft robots undergo large nonlinear spatial deformations due to both inherent actuation and external loading. The physics underlying these deformations is complex, and often requires intricate analytical and numerical models. The complexity of these models may render traditional model based control difficult and unsuitable. Model-free methods offer an alternative for analyzing the behavior of such complex systems without the need for elaborate modeling techniques.In this paper, we present a model-free approach for open loop position control of a soft spatial continuum arm, based on deep reinforcement learning. The continuum arm is pneumatically actuated and attains a spatial workspace by a combination ofunidirectional bending and bidirectional torsional deformation. We use Deep-Q Learning with experience replay to train the system in simulation. The efficacy and robustness of the control policy obtained from the system is validated both in simulation and on the continuum arm prototype for varying external loading conditions

Download Full-text

Developing End-to-End Control Policies for Robotic Swarms Using Deep Q-learning

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2019.p0920 ◽

2019 ◽

Vol 23 (5) ◽

pp. 920-927 ◽

Cited By ~ 3

Author(s):

Yufei Wei ◽

Xiaotong Nie ◽

Motoaki Hiraga ◽

Kazuhiro Ohkura ◽

Zlatan Car ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Evolutionary Robotics ◽

Control Policy ◽

Control Policies ◽

Q Learning ◽

Robotic Swarms ◽

Learning Techniques ◽

End To End ◽

Large Parameter Space

In this study, the use of a popular deep reinforcement learning algorithm – deep Q-learning – in developing end-to-end control policies for robotic swarms is explored. Robots only have limited local sensory capabilities; however, in a swarm, they can accomplish collective tasks beyond the capability of a single robot. Compared with most automatic design approaches proposed so far, which belong to the field of evolutionary robotics, deep reinforcement learning techniques provide two advantages: (i) they enable researchers to develop control policies in an end-to-end fashion; and (ii) they require fewer computation resources, especially when the control policy to be developed has a large parameter space. The proposed approach is evaluated in a round-trip task, where the robots are required to travel between two destinations as much as possible. Simulation results show that the proposed approach can learn control policies directly from high-dimensional raw camera pixel inputs for robotic swarms.

Download Full-text

Control of chaotic systems by deep reinforcement learning

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2019.0351 ◽

2019 ◽

Vol 475 (2231) ◽

pp. 20190351 ◽

Cited By ~ 3

Author(s):

M. A. Bucci ◽

O. Semeraro ◽

A. Allauzen ◽

G. Wisniewski ◽

L. Cordier ◽

...

Keyword(s):

Reinforcement Learning ◽

Bluff Body ◽

Initial Conditions ◽

Control Policy ◽

Chaotic Regime ◽

Model Free ◽

Local Measurements ◽

Target States ◽

The One ◽

Learning Principles

Deep reinforcement learning (DRL) is applied to control a nonlinear, chaotic system governed by the one-dimensional Kuramoto–Sivashinsky (KS) equation. DRL uses reinforcement learning principles for the determination of optimal control solutions and deep neural networks for approximating the value function and the control policy. Recent applications have shown that DRL may achieve superhuman performance in complex cognitive tasks. In this work, we show that using restricted localized actuation, partial knowledge of the state based on limited sensor measurements and model-free DRL controllers, it is possible to stabilize the dynamics of the KS system around its unstable fixed solutions, here considered as target states. The robustness of the controllers is tested by considering several trajectories in the phase space emanating from different initial conditions; we show that DRL is always capable of driving and stabilizing the dynamics around target states. The possibility of controlling the KS system in the chaotic regime by using a DRL strategy solely relying on local measurements suggests the extension of the application of RL methods to the control of more complex systems such as drag reduction in bluff-body wakes or the enhancement/diminution of turbulent mixing.

Download Full-text

Early Failure Detection of Deep End-to-End Control Policy by Reinforcement Learning

2019 International Conference on Robotics and Automation (ICRA) ◽

10.1109/icra.2019.8794189 ◽

2019 ◽

Cited By ~ 1

Author(s):

Keuntaek Lee ◽

Kamil Saigol ◽

Evangelos A. Theodorou

Keyword(s):

Reinforcement Learning ◽

Control Policy ◽

Failure Detection ◽

Early Failure ◽

End To End

Download Full-text