Variable Compliance Control for Robotic Peg-in-Hole Assembly: A Deep-Reinforcement-Learning Approach

Cristian C. Beltran-Hernandez; Damien Petit; Ixchel G. Ramirez-Alpizar; Kensuke Harada

doi:10.3390/app10196923

Variable Compliance Control for Robotic Peg-in-Hole Assembly: A Deep-Reinforcement-Learning Approach

Applied Sciences ◽

10.3390/app10196923 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6923 ◽

Cited By ~ 2

Author(s):

Cristian C. Beltran-Hernandez ◽

Damien Petit ◽

Ixchel G. Ramirez-Alpizar ◽

Kensuke Harada

Keyword(s):

Reinforcement Learning ◽

Industrial Robot ◽

Position Uncertainty ◽

Policy Model ◽

Learning Framework ◽

Model Free ◽

Learning Techniques ◽

Modern Manufacturing ◽

Variable Compliance ◽

Industrial Task

Industrial robot manipulators are playing a significant role in modern manufacturing industries. Though peg-in-hole assembly is a common industrial task that has been extensively researched, safely solving complex, high-precision assembly in an unstructured environment remains an open problem. Reinforcement-learning (RL) methods have proven to be successful in autonomously solving manipulation tasks. However, RL is still not widely adopted in real robotic systems because working with real hardware entails additional challenges, especially when using position-controlled manipulators. The main contribution of this work is a learning-based method to solve peg-in-hole tasks with hole-position uncertainty. We propose the use of an off-policy, model-free reinforcement-learning method, and we bootstraped the training speed by using several transfer-learning techniques (sim2real) and domain randomization. Our proposed learning framework for position-controlled robots was extensively evaluated in contact-rich insertion tasks in a variety of environments.

Download Full-text

Routing of Electric Vehicles With Intermediary Charging Stations: A Reinforcement Learning Approach

Frontiers in Big Data ◽

10.3389/fdata.2021.586481 ◽

2021 ◽

Vol 4 ◽

Author(s):

Marina Dorokhova ◽

Christophe Ballif ◽

Nicolas Wyrsch

Keyword(s):

Reinforcement Learning ◽

Electric Vehicles ◽

Mathematical Formulation ◽

Route Planning ◽

Learning Approach ◽

Training Procedure ◽

Routing Problem ◽

Policy Model ◽

Model Free ◽

Charging Stations

In the past few years, the importance of electric mobility has increased in response to growing concerns about climate change. However, limited cruising range and sparse charging infrastructure could restrain a massive deployment of electric vehicles (EVs). To mitigate the problem, the need for optimal route planning algorithms emerged. In this paper, we propose a mathematical formulation of the EV-specific routing problem in a graph-theoretical context, which incorporates the ability of EVs to recuperate energy. Furthermore, we consider a possibility to recharge on the way using intermediary charging stations. As a possible solution method, we present an off-policy model-free reinforcement learning approach that aims to generate energy feasible paths for EV from source to target. The algorithm was implemented and tested on a case study of a road network in Switzerland. The training procedure requires low computing and memory demands and is suitable for online applications. The results achieved demonstrate the algorithm’s capability to take recharging decisions and produce desired energy feasible paths.

Download Full-text

Model-Free Ultra Reliable Low Latency Communication (URLLC): A Deep Reinforcement Learning Framework

ICC 2019 - 2019 IEEE International Conference on Communications (ICC) ◽

10.1109/icc.2019.8761721 ◽

2019 ◽

Cited By ~ 2

Author(s):

Ali Taleb Zadeh Kasgari ◽

Walid Saad

Keyword(s):

Reinforcement Learning ◽

Low Latency ◽

Learning Framework ◽

Model Free

Download Full-text

Multiobjective model-free learning for robot pathfinding with environmental disturbances

International Journal of Advanced Robotic Systems ◽

10.1177/1729881419885703 ◽

2019 ◽

Vol 16 (6) ◽

pp. 172988141988570 ◽

Cited By ~ 1

Author(s):

Changyun Wei ◽

Fusheng Ni

Keyword(s):

Reinforcement Learning ◽

Action Selection ◽

Two Dimensions ◽

Selection Strategy ◽

Sequential Decision ◽

Environmental Disturbances ◽

Learning Framework ◽

Model Free ◽

Conflicting Objectives ◽

Save Energy

This article addresses the robot pathfinding problem with environmental disturbances, where a solution to this problem must consider potential risks inherent in an uncertain and stochastic environment. For example, the movements of an underwater robot can be seriously disturbed by ocean currents, and thus any applied control actions to the robot cannot exactly lead to the desired locations. Reinforcement learning is a formal methodology that has been extensively studied in many sequential decision-making domains with uncertainty, but most reinforcement learning algorithms consider only a single objective encoded by a scalar reward. However, the robot pathfinding problem with environmental disturbances naturally promotes multiple conflicting objectives. Specifically, in this work, the robot has to minimise its moving distance so as to save energy, and, moreover, it has to keep away from unsafe regions as far as possible. To this end, we first propose a multiobjective model-free learning framework, and then proceed to investigate an appropriate action selection strategy by improving a baseline with respect to two dimensions. To demonstrate the effectiveness of the proposed learning framework and evaluate the performance of three action selection strategies, we also carry out an empirical study in a simulated environment.

Download Full-text

Model-Free Reinforcement Learning for Branching Markov Decision Processes

Computer Aided Verification - Lecture Notes in Computer Science ◽

10.1007/978-3-030-81688-9_30 ◽

2021 ◽

pp. 651-673

Author(s):

Ernst Moritz Hahn ◽

Mateo Perez ◽

Sven Schewe ◽

Fabio Somenzi ◽

Ashutosh Trivedi ◽

...

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Markov Decision Processes ◽

Control Strategy ◽

Natural Extension ◽

Decision Processes ◽

Optimal Control Strategy ◽

Model Free ◽

Learning Techniques ◽

Markov Decision

AbstractWe study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.

Download Full-text

Human subjects exploit a cognitive map for credit assignment

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2016884118 ◽

2021 ◽

Vol 118 (4) ◽

pp. e2016884118

Author(s):

Rani Moran ◽

Peter Dayan ◽

Raymond J. Dolan

Keyword(s):

Reinforcement Learning ◽

Behavioral Control ◽

Human Subjects ◽

Cognitive Map ◽

The Other ◽

Task Structure ◽

Credit Assignment ◽

Learning Framework ◽

Model Free ◽

Normative Status

An influential reinforcement learning framework proposes that behavior is jointly governed by model-free (MF) and model-based (MB) controllers. The former learns the values of actions directly from past encounters, and the latter exploits a cognitive map of the task to calculate these prospectively. Considerable attention has been paid to how these systems interact during choice, but how and whether knowledge of a cognitive map contributes to the way MF and MB controllers assign credit (i.e., to how they revaluate actions and states following the receipt of an outcome) remains underexplored. Here, we examine such sophisticated credit assignment using a dual-outcome bandit task. We provide evidence that knowledge of a cognitive map influences credit assignment in both MF and MB systems, mediating subtly different aspects of apparent relevance. Specifically, we show MF credit assignment is enhanced for those rewards that are related to a choice, and this contrasted with choice-unrelated rewards that reinforced subsequent choices negatively. This modulation is only possible based on knowledge of task structure. On the other hand, MB credit assignment was boosted for outcomes that impacted on differences in values between offered bandits. We consider mechanistic accounts and the normative status of these findings. We suggest the findings extend the scope and sophistication of cognitive map-based credit assignment during reinforcement learning, with implications for understanding behavioral control.

Download Full-text

Model-free adaptive control design for nonlinear discrete-time processes with reinforcement learning techniques

International Journal of Systems Science ◽

10.1080/00207721.2018.1498557 ◽

2018 ◽

Vol 49 (11) ◽

pp. 2298-2308 ◽

Cited By ~ 6

Author(s):

Dong Liu ◽

Guang-Hong Yang

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

Discrete Time ◽

Control Design ◽

Model Free ◽

Learning Techniques

Download Full-text

UAV Autonomous Tracking and Landing Based on Deep Reinforcement Learning Strategy

Sensors ◽

10.3390/s20195630 ◽

2020 ◽

Vol 20 (19) ◽

pp. 5630

Author(s):

Jingyi Xie ◽

Xiaodong Peng ◽

Haijiao Wang ◽

Wenlong Niu ◽

Xiao Zheng

Keyword(s):

Reinforcement Learning ◽

Learning Strategy ◽

Control Method ◽

Heuristic Rules ◽

Learning Framework ◽

Model Free ◽

Simulation Engine ◽

Markov Decision ◽

Moving Platform ◽

Partially Observable

Unmanned aerial vehicle (UAV) autonomous tracking and landing is playing an increasingly important role in military and civil applications. In particular, machine learning has been successfully introduced to robotics-related tasks. A novel UAV autonomous tracking and landing approach based on a deep reinforcement learning strategy is presented in this paper, with the aim of dealing with the UAV motion control problem in an unpredictable and harsh environment. Instead of building a prior model and inferring the landing actions based on heuristic rules, a model-free method based on a partially observable Markov decision process (POMDP) is proposed. In the POMDP model, the UAV automatically learns the landing maneuver by an end-to-end neural network, which combines the Deep Deterministic Policy Gradients (DDPG) algorithm and heuristic rules. A Modular Open Robots Simulation Engine (MORSE)-based reinforcement learning framework is designed and validated with a continuous UAV tracking and landing task on a randomly moving platform in high sensor noise and intermittent measurements. The simulation results show that when the moving platform is moving in different trajectories, the average landing success rate of the proposed algorithm is about 10% higher than that of the Proportional-Integral-Derivative (PID) method. As an indirect result, a state-of-the-art deep reinforcement learning-based UAV control method is validated, where the UAV can learn the optimal strategy of a continuously autonomous landing and perform properly in a simulation environment.

Download Full-text

Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards

10.32470/ccn.2018.1191-0 ◽

2018 ◽

Author(s):

Paul Krueger ◽

Thomas Griffiths

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Model Free

Download Full-text

Model-Based and Model-Free Social Cognition

10.31234/osf.io/ue6j2 ◽

2019 ◽

Author(s):

Leor M Hackel ◽

Jeffrey Jordan Berg ◽

Björn Lindström ◽

David Amodio

Keyword(s):

Reinforcement Learning ◽

Social Cognition ◽

Learning Strategies ◽

Memory Systems ◽

Learning Task ◽

Financial Advisors ◽

Model Based ◽

Model Free ◽

Systems Model ◽

Task Assessment

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.

Download Full-text

Faculty Opinions recommendation of States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.4125957.4076054 ◽

2010 ◽

Author(s):

Susan Courtney

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Model Based ◽

Model Free

Download Full-text