scholarly journals The Bottleneck Simulator: A Model-Based Deep Reinforcement Learning Approach

2020 ◽  
Vol 69 ◽  
pp. 571-612
Author(s):  
Iulian Vlad Serban ◽  
Chinnadhurai Sankar ◽  
Michael Pieper ◽  
Joelle Pineau ◽  
Yoshua Bengio

Deep reinforcement learning has recently shown many impressive successes. However, one major obstacle towards applying such methods to real-world problems is their lack of data-efficiency. To this end, we propose the Bottleneck Simulator: a model-based reinforcement learning method which combines a learned, factorized transition model of the environment with rollout simulations to learn an effective policy from few examples. The learned transition model employs an abstract, discrete (bottleneck) state, which increases sample efficiency by reducing the number of model parameters and by exploiting structural properties of the environment. We provide a mathematical analysis of the Bottleneck Simulator in terms of fixed points of the learned policy, which reveals how performance is affected by four distinct sources of error: an error related to the abstract space structure, an error related to the transition model estimation variance, an error related to the transition model estimation bias, and an error related to the transition model class bias. Finally, we evaluate the Bottleneck Simulator on two natural language processing tasks: a text adventure game and a real-world, complex dialogue response selection task. On both tasks, the Bottleneck Simulator yields excellent performance beating competing approaches.

2022 ◽  
pp. 1-12
Author(s):  
Shuailong Li ◽  
Wei Zhang ◽  
Huiwen Zhang ◽  
Xin Zhang ◽  
Yuquan Leng

Model-free reinforcement learning methods have successfully been applied to practical applications such as decision-making problems in Atari games. However, these methods have inherent shortcomings, such as a high variance and low sample efficiency. To improve the policy performance and sample efficiency of model-free reinforcement learning, we propose proximal policy optimization with model-based methods (PPOMM), a fusion method of both model-based and model-free reinforcement learning. PPOMM not only considers the information of past experience but also the prediction information of the future state. PPOMM adds the information of the next state to the objective function of the proximal policy optimization (PPO) algorithm through a model-based method. This method uses two components to optimize the policy: the error of PPO and the error of model-based reinforcement learning. We use the latter to optimize a latent transition model and predict the information of the next state. For most games, this method outperforms the state-of-the-art PPO algorithm when we evaluate across 49 Atari games in the Arcade Learning Environment (ALE). The experimental results show that PPOMM performs better or the same as the original algorithm in 33 games.


2021 ◽  
Vol 8 ◽  
Author(s):  
S. M. Nahid Mahmud ◽  
Scott A. Nivison ◽  
Zachary I. Bell ◽  
Rushikesh Kamalapurkar

Reinforcement learning has been established over the past decade as an effective tool to find optimal control policies for dynamical systems, with recent focus on approaches that guarantee safety during the learning and/or execution phases. In general, safety guarantees are critical in reinforcement learning when the system is safety-critical and/or task restarts are not practically feasible. In optimal control theory, safety requirements are often expressed in terms of state and/or control constraints. In recent years, reinforcement learning approaches that rely on persistent excitation have been combined with a barrier transformation to learn the optimal control policies under state constraints. To soften the excitation requirements, model-based reinforcement learning methods that rely on exact model knowledge have also been integrated with the barrier transformation framework. The objective of this paper is to develop safe reinforcement learning method for deterministic nonlinear systems, with parametric uncertainties in the model, to learn approximate constrained optimal policies without relying on stringent excitation conditions. To that end, a model-based reinforcement learning technique that utilizes a novel filtered concurrent learning method, along with a barrier transformation, is developed in this paper to realize simultaneous learning of unknown model parameters and approximate optimal state-constrained control policies for safety-critical systems.


Machines ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 46
Author(s):  
Peng Chang ◽  
Taşkın Padır

Manipulation of deformable objects is a desired skill in making robots ubiquitous in manufacturing, service, healthcare, and security. Common deformable objects (e.g., wires, clothes, bed sheets, etc.) are significantly more difficult to model than rigid objects. In this research, we contribute to the model-based manipulation of linear flexible objects such as cables. We propose a 3D geometric model of the linear flexible object that is subject to gravity and a physical model with multiple links connected by revolute joints and identified model parameters. These models enable task automation in manipulating linear flexible objects both in simulation and real world. To bridge the gap between simulation and real world and build a close-to-reality simulation of flexible objects, we propose a new strategy called Simulation-to-Real-to-Simulation (Sim2Real2Sim). We demonstrate the feasibility of our approach by completing the Plug Task used in the 2015 DARPA Robotics Challenge Finals both in simulation and real world, which involves unplugging a power cable from one socket and plugging it into another. Numerical experiments are implemented to validate our approach.


2014 ◽  
Vol 596 ◽  
pp. 843-846
Author(s):  
Rui Sun

This paper studied the evolution law of the real-world networks, and then proposed a complex network model based on node attraction with tunable parameters in order to solve the problems existing in BA model and the original node attraction model. The model considered the effects of preferential attachments by the changes of degree and node attraction in the evolution process of networks. Theory research and simulation analysis show that we can more flexible adjust the evolution process of network through adjusting model parameters, therefore make it more accord with the network topology and statistical characteristics of real-world networks.


2021 ◽  
Author(s):  
Michael Lutter ◽  
Johannes Silberbauer ◽  
Joe Watson ◽  
Jan Peters

Sensors ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 7003
Author(s):  
Yuri Assayag ◽  
Horácio Oliveira ◽  
Eduardo Souto ◽  
Raimundo Barreto ◽  
Richard Pazzi

Indoor Positioning Systems (IPSs) are used to locate mobile devices in indoor environments. Model-based IPSs have the advantage of not having an exhausting training and signal characterization of the environment, as required by the fingerprint technique. However, most model-based IPSs are done using fixed model parameters, treating the whole scenario as having a uniform signal propagation. This might work for most small scale experiments, but not for larger scenarios. In this paper, we propose PoDME (Positioning using Dynamic Model Estimation), a model-based IPS that uses dynamic parameters that are estimated based on the location the signal was sent. More specifically, we use the set of anchor nodes that received the signal sent by the mobile node and their signal strengths, to estimate the best local values for the log-distance model parameters. Also, since our solution depends highly on the selected anchor nodes to use on the position computation, we propose a novel method for choosing the three best anchor nodes. Our method is based on several data analysis executed on a large-scale, Bluetooth-based, real-world experiment and it chooses not only the nearest anchor but also the ones that benefit our least-square-based position computation. Our solution achieves a position estimation error of 3 m, which is 17% better than a fixed-parameters model from the literature.


Diabetes ◽  
2019 ◽  
Vol 68 (Supplement 1) ◽  
pp. 1243-P
Author(s):  
JIANMIN WU ◽  
FRITHA J. MORRISON ◽  
ZHENXIANG ZHAO ◽  
XUANYAO HE ◽  
MARIA SHUBINA ◽  
...  

2019 ◽  
Author(s):  
Leor M Hackel ◽  
Jeffrey Jordan Berg ◽  
Björn Lindström ◽  
David Amodio

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.


Sign in / Sign up

Export Citation Format

Share Document