Deep Reinforcement Learning with Interactive Feedback in a Human–Robot Environment

Ithan Moreira; Javier Rivas; Francisco Cruz; Richard Dazeley; Angel Ayala; Bruno Fernandes

doi:10.3390/app10165574

Deep Reinforcement Learning with Interactive Feedback in a Human–Robot Environment

Applied Sciences ◽

10.3390/app10165574 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5574 ◽

Cited By ~ 4

Author(s):

Ithan Moreira ◽

Javier Rivas ◽

Francisco Cruz ◽

Richard Dazeley ◽

Angel Ayala ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Robotic Arm ◽

Learning Approach ◽

Speed Up ◽

Open Issue ◽

Domestic Environments ◽

The Given ◽

Different Sources ◽

Interactive Feedback

Robots are extending their presence in domestic environments every day, it being more common to see them carrying out tasks in home scenarios. In the future, robots are expected to increasingly perform more complex tasks and, therefore, be able to acquire experience from different sources as quickly as possible. A plausible approach to address this issue is interactive feedback, where a trainer advises a learner on which actions should be taken from specific states to speed up the learning process. Moreover, deep reinforcement learning has been recently widely used in robotics to learn the environment and acquire new skills autonomously. However, an open issue when using deep reinforcement learning is the excessive time needed to learn a task from raw input images. In this work, we propose a deep reinforcement learning approach with interactive feedback to learn a domestic task in a Human–Robot scenario. We compare three different learning methods using a simulated robotic arm for the task of organizing different objects; the proposed methods are (i) deep reinforcement learning (DeepRL); (ii) interactive deep reinforcement learning using a previously trained artificial agent as an advisor (agent–IDeepRL); and (iii) interactive deep reinforcement learning using a human advisor (human–IDeepRL). We demonstrate that interactive approaches provide advantages for the learning process. The obtained results show that a learner agent, using either agent–IDeepRL or human–IDeepRL, completes the given task earlier and has fewer mistakes compared to the autonomous DeepRL approach.

Download Full-text

Reinforcement-Learning-Based Quantum Adiabatic Algorithm Design for Integer Programming

SPIN ◽

10.1142/s2010324721400099 ◽

2021 ◽

Author(s):

Jiawei Zhu

Keyword(s):

Quantum Computing ◽

Reinforcement Learning ◽

Integer Programming ◽

Learning Process ◽

Optimization Problems ◽

Algorithm Design ◽

Hamiltonian Path ◽

Quantum Algorithm ◽

Learning Approach ◽

Computation Efficiency

Adiabatic quantum computing (AQC) is a computation protocol to solve difficult problems exploiting quantum advantage, directly applicable to optimization problems. In performing the AQC, different configurations of the Hamiltonian path could lead to dramatic differences in the computation efficiency. It is thus crucial to configure the Hamiltonian path to optimize the computation performance of AQC. Here we apply a reinforcement learning approach to configure AQC for integer programming, where we find the learning process automatically converges to a quantum algorithm that exhibits scaling advantage over the trivial AQC using a linear Hamiltonian path. This reinforcement-learning-based approach for quantum adiabatic algorithm design for integer programming can well be adapted to the quantum resources in different quantum computation devices, due to its built-in flexibility.

Download Full-text

Learning in Large Worlds

Game Theory in Biology ◽

10.1093/oso/9780198815778.003.0005 ◽

2020 ◽

pp. 91-110

Author(s):

John M. McNamara ◽

Olof Leimar

Keyword(s):

Game Theory ◽

Reinforcement Learning ◽

Learning Process ◽

Learning Rule ◽

Bayesian Updating ◽

Learning Approach ◽

Learning Approaches ◽

Small Worlds ◽

Learning Rates ◽

Relative Payoff

The chapter introduces reinforcement learning in game-theory models. A distinction is made between small-worlds models with Bayesian updating and large-worlds models that implement specific behavioural mechanisms. The actor–critic learning approach is introduced and illustrated with simple examples of learning in a coordination game and in the Hawk–Dove game. Simple versions of a game of investments with joint benefits and a social dominance game are presented, and these games are further developed in Chapter 8. The idea that parameters of the learning process, such as learning rates, can evolve is put forward. For the game examples it is shown that with slow learning over many rounds the outcome can approximate an ESS of a one-shot game, but for higher rates of learning and fewer rounds this need not be the case. The chapter ends with an overview of learning approaches in game theory, including the originally proposed relative-payoff-sum learning rule for games in biology.

Download Full-text

Self-Reflective Learning Strategy for Persistent Autonomy of Aerial Manipulators

Volume 3, Rapid Fire Interactive Presentations: Advances in Control Systems; Advances in Robotics and Mechatronics; Automotive and Transportation Systems; Motion Planning and Trajectory Tracking; Soft Mechatronic Actuators and Sensors; Unmanned Ground and Aerial Vehicles ◽

10.1115/dscc2019-9086 ◽

2019 ◽

Author(s):

Xu Zhou ◽

Jiucai Zhang ◽

Xiaoli Zhang

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Learning Strategy ◽

Reflective Learning ◽

Manual Labor ◽

Self Reflection ◽

Aerial Manipulator ◽

Optimal Policies ◽

Infrastructure Repair ◽

The Given

Abstract Autonomous aerial manipulators have great potentials to assist humans or even fully automate manual labor-intensive tasks such as aerial cleaning, aerial transportation, infrastructure repair, and agricultural inspection and sampling. Reinforcement learning holds the promise of enabling persistent autonomy of aerial manipulators because it can adapt to different situations by automatically learning optimal policies from the interactions between the aerial manipulator and environments. However, the learning process itself could experience failures that can practically endanger the safety of aerial manipulators and hence hinder persistent autonomy. In order to solve this problem, we propose for the aerial manipulator a self-reflective learning strategy that can smartly and safely finding optimal policies for different new situations. This self-reflective manner consists of three steps: identifying the appearance of new situations, re-seeking the optimal policy with reinforcement learning, and evaluating the termination of self-reflection. Numerical simulations demonstrate, compared with conventional learning-based autonomy, our strategy can significantly reduce failures while still can finish the given task.

Download Full-text

Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2009.p0600 ◽

2009 ◽

Vol 13 (6) ◽

pp. 600-607

Author(s):

Petar S. Kormushev ◽

◽

Kohei Nomoto ◽

Fangyan Dong ◽

Kaoru Hirota ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

State Transitions ◽

Time Hopping ◽

Speed Up

A mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph. Experiments on a simulated biped crawling robot confirm that Eligibility Propagation accelerates the learning process more than 3 times.

Download Full-text

Interactive Teaching Algorithms for Inverse Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/374 ◽

2019 ◽

Author(s):

Parameswaran Kamalaruban ◽

Rati Devidze ◽

Volkan Cevher ◽

Adish Singla

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Driving Simulator ◽

Current Policy ◽

Inverse Reinforcement Learning ◽

Interactive Teaching ◽

Algorithmic Question ◽

Speed Up ◽

Full Knowledge ◽

Car Driving

We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the next demonstration based on learner's current policy. In particular, we design teaching algorithms for two concrete settings: an omniscient setting where a teacher has full knowledge about the learner's dynamics and a blackbox setting where the teacher has minimal knowledge. Then, we study a sequential variant of the popular MCE-IRL learner and prove convergence guarantees of our teaching algorithm in the omniscient setting. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher.

Download Full-text

Pitfalls of Learning a Reward Function Online

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/221 ◽

2020 ◽

Author(s):

Stuart Armstrong ◽

Jan Leike ◽

Laurent Orseau ◽

Shane Legg

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Learning Approach ◽

Inverse Reinforcement Learning ◽

Function Learning ◽

Reward Function ◽

Reward Functions

In some agent designs like inverse reinforcement learning an agent needs to learn its own reward function. Learning the reward function and optimising for it are typically two different processes, usually performed at different stages. We consider a continual (``one life'') learning approach where the agent both learns the reward function and optimises for it at the same time. We show that this comes with a number of pitfalls, such as deliberately manipulating the learning process in one direction, refusing to learn, ``learning'' facts already known to the agent, and making decisions that are strictly dominated (for all relevant reward functions). We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise. The second is `uninfluenceability', whereby the reward-function learning process operates by learning facts about the environment. We show that an uninfluenceable process is automatically unriggable, and if the set of possible environments is sufficiently large, the converse is true too.

Download Full-text

Service Chaining Offloading Decision in the EdgeAI: A Deep Reinforcement Learning Approach

2020 21st Asia-Pacific Network Operations and Management Symposium (APNOMS) ◽

10.23919/apnoms50412.2020.9237048 ◽

2020 ◽

Author(s):

Minkyun Lee ◽

Choong Seon Hong

Keyword(s):

Reinforcement Learning ◽

Learning Approach

Download Full-text

Emergent Control of MPSoC Operation by a Hierarchical Supervisor / Reinforcement Learning Approach

2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) ◽

10.23919/date48585.2020.9116574 ◽

2020 ◽

Author(s):

Florian Maurer ◽

Bryan Donyanavard ◽

Amir M. Rahmani ◽

Nikil Dutt ◽

Andreas Herkersdorf

Keyword(s):

Reinforcement Learning ◽

Learning Approach

Download Full-text

Energy-efficient UAV trajectory design for backscatter communication: A deep reinforcement learning approach

China Communications ◽

10.23919/jcc.2020.10.009 ◽

2020 ◽

Vol 17 (10) ◽

pp. 129-141

Author(s):

Yiwen Nie ◽

Junhui Zhao ◽

Jun Liu ◽

Jing Jiang ◽

Ruijin Ding

Keyword(s):

Reinforcement Learning ◽

Energy Efficient ◽

Learning Approach ◽

Trajectory Design ◽

Backscatter Communication

Download Full-text

MODIFIKASI MODEL PEMBELAJARAN TEAM ASSISTED INDIVIDUALIZATION DENGAN STRATEGI PEMBELAJARAN TUGAS DAN PAKSA

Prosiding Seminar Nasional Pendidikan KALUNI ◽

10.30998/prokaluni.v2i0.58 ◽

2019 ◽

Vol 2 ◽

Author(s):

Lea Christy Restu Kinasih ◽

Dewi Fatimah ◽

Veranica Julianti

Keyword(s):

Cooperative Learning ◽

Learning Strategies ◽

Learning Process ◽

Learning Model ◽

Learning Models ◽

Classroom Learning ◽

Quality Of Learning ◽

The Given

The selection and determination of appropriate learning strategies can improve the results to be obtained from the application of classroom learning models. This writing aims to discipline students to develop individual abilities of students to be more active in the learning process and improve the quality of learning. The learning process in Indonesia in general only uses conventional learning models that make students passive and undeveloped. In order for the quality of learning to increase, the Team Assisted Individualization learning model is combined with the task learning and forced strategies. The Team Assisted Individualization cooperative learning model is one of the cooperative learning models that combines learning individually and in groups. Meanwhile, task and forced learning strategies are strategies that focus on giving assignments that require students to complete them on time so that the learning process can run effectively. Students are required to do assignments according to the given deadline. This makes students become familiar with the tasks given by the teacher. Combining or modifying the learning model of the assisted individualization team with forced and forced learning strategies is expected to be able to make students more active, disciplined, independent, creative in learning and responsible for the tasks assigned. Therefore this method of incorporation is very necessary in the learning process and can be applied to improve the quality of learning in schools.

Download Full-text