Deep Reinforcement Learning for Transfer of Control Policies

Volume 2A: 45th Design Automation Conference ◽

10.1115/detc2019-97689 ◽

2019 ◽

Author(s):

James D. Cunningham ◽

Simon W. Miller ◽

Michael A. Yukish ◽

Timothy W. Simpson ◽

Conrad S. Tucker

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Control Policy ◽

Dynamic State ◽

Test Case ◽

Original Design ◽

Control Policies ◽

Parallel Learning ◽

Rotor Design ◽

Control Knowledge

Abstract We present a form-aware reinforcement learning (RL) method to extend control knowledge from one design form to another, without losing the ability to control the original design. A major challenge in developing control knowledge is the creation of generalized control policies across designs of varying form. Our presented RL policy is form-aware because in addition to receiving dynamic state information about the environment, it also receives states that encode information about the form of the design that is being controlled. In this paper, we investigate the impact of this mixed state space on transfer learning. We present a transfer learning method for extending a control policy to a different design form, while continuing to expose the agent to the original design during the training of the new design. To demonstrate this concept, we present a case study of a multi-rotor aircraft simulation, wherein the designated task is to achieve a stable hover. We show that by introducing form states, an RL agent is able to learn a control policy to achieve the hovering task with both a four rotor and three rotor design at once, whereas without the form states it can only hover with the four rotor design. We also benchmark our method against a test case that removes the transfer learning component, as well as a test case that removes the continued exposure to the original design to show the value of each of these components. We find that form states, transfer learning, and parallel learning all contribute to a more robust control policy for the new design, and that parallel learning is especially important for maintaining control knowledge of the original design.

Download Full-text

Developing End-to-End Control Policies for Robotic Swarms Using Deep Q-learning

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2019.p0920 ◽

2019 ◽

Vol 23 (5) ◽

pp. 920-927 ◽

Cited By ~ 3

Author(s):

Yufei Wei ◽

Xiaotong Nie ◽

Motoaki Hiraga ◽

Kazuhiro Ohkura ◽

Zlatan Car ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Evolutionary Robotics ◽

Control Policy ◽

Control Policies ◽

Q Learning ◽

Robotic Swarms ◽

Learning Techniques ◽

End To End ◽

Large Parameter Space

In this study, the use of a popular deep reinforcement learning algorithm – deep Q-learning – in developing end-to-end control policies for robotic swarms is explored. Robots only have limited local sensory capabilities; however, in a swarm, they can accomplish collective tasks beyond the capability of a single robot. Compared with most automatic design approaches proposed so far, which belong to the field of evolutionary robotics, deep reinforcement learning techniques provide two advantages: (i) they enable researchers to develop control policies in an end-to-end fashion; and (ii) they require fewer computation resources, especially when the control policy to be developed has a large parameter space. The proposed approach is evaluated in a round-trip task, where the robots are required to travel between two destinations as much as possible. Simulation results show that the proposed approach can learn control policies directly from high-dimensional raw camera pixel inputs for robotic swarms.

Download Full-text

Crime Control Policy

Criminology ◽

10.1093/obo/9780195396607-0039 ◽

2009 ◽

Author(s):

Thomas G. Blomberg ◽

Julie Brancale

Keyword(s):

Public Policy ◽

Control System ◽

Criminal Behavior ◽

Crime Control ◽

Control Policy ◽

Unintended Consequences ◽

Critical Inquiry ◽

Control Policies ◽

Crime Control Policy ◽

Control Knowledge

The literature on crime control policy has developed from several areas of study. Included among these areas are general descriptive studies of the operations of the crime control system (police, courts, and corrections), studies of the causes of criminal behavior in relation to the rehabilitation of offenders, critical inquiry into crime control policies and practices, historical studies of crime control, studies of crime control reforms, studies of get-tough crime control policies, and studies aimed at linking crime control knowledge to public policy. A theme emerging from this literature has been a recognition of the patterned capacity of various crime control policies and reforms to have unintended consequences.

Download Full-text

XCS as a reinforcement learning approach to automatic test case prioritization

Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion ◽

10.1145/3377929.3398128 ◽

2020 ◽

Author(s):

Lukas Rosenbauer ◽

Anthony Stein ◽

Roland Maier ◽

David Pätzel ◽

Jörg Hähner

Keyword(s):

Reinforcement Learning ◽

Test Case ◽

Learning Approach ◽

Test Case Prioritization ◽

Automatic Test

Download Full-text

Experience Sharing Based Memetic Transfer Learning for Multiagent Reinforcement Learning

Memetic Computing ◽

10.1007/s12293-021-00339-4 ◽

2021 ◽

Author(s):

Tonghao Wang ◽

Xingguang Peng ◽

Yaochu Jin ◽

Demin Xu

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Multiagent Reinforcement Learning

Download Full-text

Cascade Attribute Network: Decomposing Reinforcement Learning Control Policies using Hierarchical Neural Networks

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2020.12.2317 ◽

2020 ◽

Vol 53 (2) ◽

pp. 8181-8186

Author(s):

Haonan Chang ◽

Zhuo Xu ◽

Masayoshi Tomizuka

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Learning Control ◽

Control Policies ◽

Hierarchical Neural Networks

Download Full-text

Reinforcement Learning for Test Case Prioritization

IEEE Transactions on Software Engineering ◽

10.1109/tse.2021.3070549 ◽

2021 ◽

pp. 1-1

Author(s):

Mojtaba Bagherzadeh ◽

Nafiseh Kahani ◽

Lionel Briand

Keyword(s):

Reinforcement Learning ◽

Test Case ◽

Test Case Prioritization

Download Full-text

Graphical Minimax Game and Off-Policy Reinforcement Learning for Heterogeneous MASs with Spanning Tree Condition

Guidance, Navigation and Control ◽

10.1142/s2737480721500114 ◽

2021 ◽

pp. 2150011

Author(s):

Wei Dong ◽

Jianan Wang ◽

Chunyan Wang ◽

Zhenqiang Qi ◽

Zhengtao Ding

Keyword(s):

Reinforcement Learning ◽

Spanning Tree ◽

Learning Algorithm ◽

Control Policy ◽

Game Problem ◽

Algebraic Riccati Equation ◽

Multi Agent Systems ◽

Rank Condition ◽

Minimax Game ◽

Tree Condition

In this paper, the optimal consensus control problem is investigated for heterogeneous linear multi-agent systems (MASs) with spanning tree condition based on game theory and reinforcement learning. First, the graphical minimax game algebraic Riccati equation (ARE) is derived by converting the consensus problem into a zero-sum game problem between each agent and its neighbors. The asymptotic stability and minimax validation of the closed-loop systems are proved theoretically. Then, a data-driven off-policy reinforcement learning algorithm is proposed to online learn the optimal control policy without the information of the system dynamics. A certain rank condition is established to guarantee the convergence of the proposed algorithm to the unique solution of the ARE. Finally, the effectiveness of the proposed method is demonstrated through a numerical simulation.

Download Full-text

Methods and Algorithms for Knowledge Reuse in Multiagent Reinforcement Learning

10.5753/ctd.2020.11360 ◽

2020 ◽

Author(s):

Felipe Leno Da Silva ◽

Anna Helena Reali Costa

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Learning Process ◽

Trial And Error ◽

Knowledge Reuse ◽

Previous Knowledge ◽

Learning Methods ◽

Types Of Knowledge ◽

Learning Agent ◽

Multiagent Reinforcement Learning

Reinforcement Learning (RL) is a powerful tool that has been used to solve increasingly complex tasks. RL operates through repeated interactions of the learning agent with the environment, via trial and error. However, this learning process is extremely slow, requiring many interactions. In this thesis, we leverage previous knowledge so as to accelerate learning in multiagent RL problems. We propose knowledge reuse both from previous tasks and from other agents. Several flexible methods are introduced so that each of these two types of knowledge reuse is possible. This thesis adds important steps towards more flexible and broadly applicable multiagent transfer learning methods.

Download Full-text

Applying a Deep Q Network for OpenAIs Car Racing Game

10.14293/s2199-1006.1.sor-.ppd7fvs.v1 ◽

2020 ◽

Author(s):

Ali Fakhry

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Transfer Learning ◽

State Of The Art ◽

Learning Techniques ◽

Car Racing ◽

Custom Made ◽

Learning Technique ◽

Reward Threshold

The applications of Deep Q-Networks are seen throughout the field of reinforcement learning, a large subsect of machine learning. Using a classic environment from OpenAI, CarRacing-v0, a 2D car racing environment, alongside a custom based modification of the environment, a DQN, Deep Q-Network, was created to solve both the classic and custom environments. The environments are tested using custom made CNN architectures and applying transfer learning from Resnet18. While DQNs were state of the art years ago, using it for CarRacing-v0 appears somewhat unappealing and not as effective as other reinforcement learning techniques. Overall, while the model did train and the agent learned various parts of the environment, attempting to reach the reward threshold for the environment with this reinforcement learning technique seems problematic and difficult as other techniques would be more useful.

Download Full-text

Robust Walking Control of a Lower Limb Rehabilitation Exoskeleton Coupled with a Musculoskeletal Model via Deep Reinforcement Learning

10.21203/rs.3.rs-1212542/v1 ◽

2021 ◽

Author(s):

Shuzhen Luo ◽

Ghaith Androwis ◽

Sergei Adamovich ◽

Erick Nunez ◽

Hao Su ◽

...

Keyword(s):

Reinforcement Learning ◽

Neuromuscular Disorders ◽

Control Policy ◽

Policy Network ◽

Control Parameters ◽

Robust Controller ◽

Interaction Forces ◽

Learning Framework ◽

Limb Rehabilitation ◽

Lower Limb Rehabilitation

Abstract Background: Few studies have systematically investigated robust controllers for lower limb rehabilitation exoskeletons (LLREs) that can safely and effectively assist users with a variety of neuromuscular disorders to walk with full autonomy. One of the key challenges for developing such a robust controller is to handle different degrees of uncertain human-exoskeleton interaction forces from the patients. Consequently, conventional walking controllers either are patient-condition specific or involve tuning of many control parameters, which could behave unreliably and even fail to maintain balance. Methods: We present a novel and robust controller for a LLRE based on a decoupled deep reinforcement learning framework with three independent networks, which aims to provide reliable walking assistance against various and uncertain human-exoskeleton interaction forces. The exoskeleton controller is driven by a neural network control policy that acts on a stream of the LLRE’s proprioceptive signals, including joint kinematic states, and subsequently predicts real-time position control targets for the actuated joints. To handle uncertain human-interaction forces, the control policy is trained intentionally with an integrated human musculoskeletal model and realistic human-exoskeleton interaction forces. Two other neural networks are connected with the control policy network to predict the interaction forces and muscle coordination. To further increase the robustness of the control policy, we employ domain randomization during training that includes not only randomization of exoskeleton dynamics properties but, more importantly, randomization of human muscle strength to simulate the variability of the patient’s disability. Through this decoupled deep reinforcement learning framework, the trained controller of LLREs is able to provide reliable walking assistance to the human with different degrees of neuromuscular disorders. Results and Conclusion: A universal, RL-based walking controller is trained and virtually tested on a LLRE system to verify its effectiveness and robustness in assisting users with different disabilities such as passive muscles (quadriplegic), muscle weakness, or hemiplegic conditions. An ablation study demonstrates strong robustness of the control policy under large exoskeleton dynamic property ranges and various human-exoskeleton interaction forces. The decoupled network structure allows us to isolate the LLRE control policy network for testing and sim-to-real transfer since it uses only proprioception information of the LLRE (joint sensory state) as the input. Furthermore, the controller is shown to be able to handle different patient conditions without the need for patient-specific control parameters tuning.

Download Full-text