scholarly journals Deep Reinforcement Learning with Interactive Feedback in a Human–Robot Environment

2020 ◽  
Vol 10 (16) ◽  
pp. 5574 ◽  
Author(s):  
Ithan Moreira ◽  
Javier Rivas ◽  
Francisco Cruz ◽  
Richard Dazeley ◽  
Angel Ayala ◽  
...  

Robots are extending their presence in domestic environments every day, it being more common to see them carrying out tasks in home scenarios. In the future, robots are expected to increasingly perform more complex tasks and, therefore, be able to acquire experience from different sources as quickly as possible. A plausible approach to address this issue is interactive feedback, where a trainer advises a learner on which actions should be taken from specific states to speed up the learning process. Moreover, deep reinforcement learning has been recently widely used in robotics to learn the environment and acquire new skills autonomously. However, an open issue when using deep reinforcement learning is the excessive time needed to learn a task from raw input images. In this work, we propose a deep reinforcement learning approach with interactive feedback to learn a domestic task in a Human–Robot scenario. We compare three different learning methods using a simulated robotic arm for the task of organizing different objects; the proposed methods are (i) deep reinforcement learning (DeepRL); (ii) interactive deep reinforcement learning using a previously trained artificial agent as an advisor (agent–IDeepRL); and (iii) interactive deep reinforcement learning using a human advisor (human–IDeepRL). We demonstrate that interactive approaches provide advantages for the learning process. The obtained results show that a learner agent, using either agent–IDeepRL or human–IDeepRL, completes the given task earlier and has fewer mistakes compared to the autonomous DeepRL approach.

SPIN ◽  
2021 ◽  
Author(s):  
Jiawei Zhu

Adiabatic quantum computing (AQC) is a computation protocol to solve difficult problems exploiting quantum advantage, directly applicable to optimization problems. In performing the AQC, different configurations of the Hamiltonian path could lead to dramatic differences in the computation efficiency. It is thus crucial to configure the Hamiltonian path to optimize the computation performance of AQC. Here we apply a reinforcement learning approach to configure AQC for integer programming, where we find the learning process automatically converges to a quantum algorithm that exhibits scaling advantage over the trivial AQC using a linear Hamiltonian path. This reinforcement-learning-based approach for quantum adiabatic algorithm design for integer programming can well be adapted to the quantum resources in different quantum computation devices, due to its built-in flexibility.


2020 ◽  
pp. 91-110
Author(s):  
John M. McNamara ◽  
Olof Leimar

The chapter introduces reinforcement learning in game-theory models. A distinction is made between small-worlds models with Bayesian updating and large-worlds models that implement specific behavioural mechanisms. The actor–critic learning approach is introduced and illustrated with simple examples of learning in a coordination game and in the Hawk–Dove game. Simple versions of a game of investments with joint benefits and a social dominance game are presented, and these games are further developed in Chapter 8. The idea that parameters of the learning process, such as learning rates, can evolve is put forward. For the game examples it is shown that with slow learning over many rounds the outcome can approximate an ESS of a one-shot game, but for higher rates of learning and fewer rounds this need not be the case. The chapter ends with an overview of learning approaches in game theory, including the originally proposed relative-payoff-sum learning rule for games in biology.


Author(s):  
Xu Zhou ◽  
Jiucai Zhang ◽  
Xiaoli Zhang

Abstract Autonomous aerial manipulators have great potentials to assist humans or even fully automate manual labor-intensive tasks such as aerial cleaning, aerial transportation, infrastructure repair, and agricultural inspection and sampling. Reinforcement learning holds the promise of enabling persistent autonomy of aerial manipulators because it can adapt to different situations by automatically learning optimal policies from the interactions between the aerial manipulator and environments. However, the learning process itself could experience failures that can practically endanger the safety of aerial manipulators and hence hinder persistent autonomy. In order to solve this problem, we propose for the aerial manipulator a self-reflective learning strategy that can smartly and safely finding optimal policies for different new situations. This self-reflective manner consists of three steps: identifying the appearance of new situations, re-seeking the optimal policy with reinforcement learning, and evaluating the termination of self-reflection. Numerical simulations demonstrate, compared with conventional learning-based autonomy, our strategy can significantly reduce failures while still can finish the given task.


Author(s):  
Petar S. Kormushev ◽  
◽  
Kohei Nomoto ◽  
Fangyan Dong ◽  
Kaoru Hirota ◽  
...  

A mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph. Experiments on a simulated biped crawling robot confirm that Eligibility Propagation accelerates the learning process more than 3 times.


Author(s):  
Parameswaran Kamalaruban ◽  
Rati Devidze ◽  
Volkan Cevher ◽  
Adish Singla

We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the next demonstration based on learner's current policy. In particular, we design teaching algorithms for two concrete settings: an omniscient setting where a teacher has full knowledge about the learner's dynamics and a blackbox setting where the teacher has minimal knowledge. Then, we study a sequential variant of the popular MCE-IRL learner and prove convergence guarantees of our teaching algorithm in the omniscient setting. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher.


Author(s):  
Stuart Armstrong ◽  
Jan Leike ◽  
Laurent Orseau ◽  
Shane Legg

In some agent designs like inverse reinforcement learning an agent needs to learn its own reward function. Learning the reward function and optimising for it are typically two different processes, usually performed at different stages. We consider a continual (``one life'') learning approach where the agent both learns the reward function and optimises for it at the same time. We show that this comes with a number of pitfalls, such as deliberately manipulating the learning process in one direction, refusing to learn, ``learning'' facts already known to the agent, and making decisions that are strictly dominated (for all relevant reward functions). We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise. The second is `uninfluenceability', whereby the reward-function learning process operates by learning facts about the environment. We show that an uninfluenceable process is automatically unriggable, and if the set of possible environments is sufficiently large, the converse is true too.


2020 ◽  
Vol 17 (10) ◽  
pp. 129-141
Author(s):  
Yiwen Nie ◽  
Junhui Zhao ◽  
Jun Liu ◽  
Jing Jiang ◽  
Ruijin Ding

Author(s):  
Lea Christy Restu Kinasih ◽  
Dewi Fatimah ◽  
Veranica Julianti

The selection and determination of appropriate learning strategies can improve the results to be obtained from the application of classroom learning models. This writing aims to discipline students to develop individual abilities of students to be more active in the learning process and improve the quality of learning. The learning process in Indonesia in general only uses conventional learning models that make students passive and undeveloped. In order for the quality of learning to increase, the Team Assisted Individualization learning model is combined with the task learning and forced strategies. The Team Assisted Individualization cooperative learning model is one of the cooperative learning models that combines learning individually and in groups. Meanwhile, task and forced learning strategies are strategies that focus on giving assignments that require students to complete them on time so that the learning process can run effectively. Students are required to do assignments according to the given deadline. This makes students become familiar with the tasks given by the teacher. Combining or modifying the learning model of the assisted individualization team with forced and forced learning strategies is expected to be able to make students more active, disciplined, independent, creative in learning and responsible for the tasks assigned. Therefore this method of incorporation is very necessary in the learning process and can be applied to improve the quality of learning in schools.


Sign in / Sign up

Export Citation Format

Share Document