The Bottleneck Simulator: A Model-Based Deep Reinforcement Learning Approach

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12463 ◽

2020 ◽

Vol 69 ◽

pp. 571-612

Author(s):

Iulian Vlad Serban ◽

Chinnadhurai Sankar ◽

Michael Pieper ◽

Joelle Pineau ◽

Yoshua Bengio

Keyword(s):

Reinforcement Learning ◽

Language Processing ◽

Real World ◽

Response Selection ◽

Space Structure ◽

Model Parameters ◽

Transition Model ◽

Model Estimation ◽

Model Based ◽

Adventure Game

Deep reinforcement learning has recently shown many impressive successes. However, one major obstacle towards applying such methods to real-world problems is their lack of data-efficiency. To this end, we propose the Bottleneck Simulator: a model-based reinforcement learning method which combines a learned, factorized transition model of the environment with rollout simulations to learn an effective policy from few examples. The learned transition model employs an abstract, discrete (bottleneck) state, which increases sample efficiency by reducing the number of model parameters and by exploiting structural properties of the environment. We provide a mathematical analysis of the Bottleneck Simulator in terms of fixed points of the learned policy, which reveals how performance is affected by four distinct sources of error: an error related to the abstract space structure, an error related to the transition model estimation variance, an error related to the transition model estimation bias, and an error related to the transition model class bias. Finally, we evaluate the Bottleneck Simulator on two natural language processing tasks: a text adventure game and a real-world, complex dialogue response selection task. On both tasks, the Bottleneck Simulator yields excellent performance beating competing approaches.

Download Full-text

Proximal policy optimization with model-based methods

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211935 ◽

2022 ◽

pp. 1-12

Author(s):

Shuailong Li ◽

Wei Zhang ◽

Huiwen Zhang ◽

Xin Zhang ◽

Yuquan Leng

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Transition Model ◽

Practical Applications ◽

Original Algorithm ◽

Policy Performance ◽

Model Based ◽

Model Free ◽

Future State ◽

Policy Optimization

Model-free reinforcement learning methods have successfully been applied to practical applications such as decision-making problems in Atari games. However, these methods have inherent shortcomings, such as a high variance and low sample efficiency. To improve the policy performance and sample efficiency of model-free reinforcement learning, we propose proximal policy optimization with model-based methods (PPOMM), a fusion method of both model-based and model-free reinforcement learning. PPOMM not only considers the information of past experience but also the prediction information of the future state. PPOMM adds the information of the next state to the objective function of the proximal policy optimization (PPO) algorithm through a model-based method. This method uses two components to optimize the policy: the error of PPO and the error of model-based reinforcement learning. We use the latter to optimize a latent transition model and predict the information of the next state. For most games, this method outperforms the state-of-the-art PPO algorithm when we evaluate across 49 Atari games in the Arcade Learning Environment (ALE). The experimental results show that PPOMM performs better or the same as the original algorithm in 33 games.

Download Full-text

Safe Model-Based Reinforcement Learning for Systems With Parametric Uncertainties

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.733104 ◽

2021 ◽

Vol 8 ◽

Author(s):

S. M. Nahid Mahmud ◽

Scott A. Nivison ◽

Zachary I. Bell ◽

Rushikesh Kamalapurkar

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Model Parameters ◽

Parametric Uncertainties ◽

Learning Approaches ◽

Learning Method ◽

Critical Systems ◽

Control Policies ◽

Safety Critical ◽

Model Based

Reinforcement learning has been established over the past decade as an effective tool to find optimal control policies for dynamical systems, with recent focus on approaches that guarantee safety during the learning and/or execution phases. In general, safety guarantees are critical in reinforcement learning when the system is safety-critical and/or task restarts are not practically feasible. In optimal control theory, safety requirements are often expressed in terms of state and/or control constraints. In recent years, reinforcement learning approaches that rely on persistent excitation have been combined with a barrier transformation to learn the optimal control policies under state constraints. To soften the excitation requirements, model-based reinforcement learning methods that rely on exact model knowledge have also been integrated with the barrier transformation framework. The objective of this paper is to develop safe reinforcement learning method for deterministic nonlinear systems, with parametric uncertainties in the model, to learn approximate constrained optimal policies without relying on stringent excitation conditions. To that end, a model-based reinforcement learning technique that utilizes a novel filtered concurrent learning method, along with a barrier transformation, is developed in this paper to realize simultaneous learning of unknown model parameters and approximate optimal state-constrained control policies for safety-critical systems.

Download Full-text

Model-Based Manipulation of Linear Flexible Objects: Task Automation in Simulation and Real World

Machines ◽

10.3390/machines8030046 ◽

2020 ◽

Vol 8 (3) ◽

pp. 46

Author(s):

Peng Chang ◽

Taşkın Padır

Keyword(s):

Real World ◽

Geometric Model ◽

Model Parameters ◽

Power Cable ◽

Deformable Objects ◽

Model Based ◽

Revolute Joints ◽

New Strategy ◽

Flexible Objects ◽

3D Geometric Model

Manipulation of deformable objects is a desired skill in making robots ubiquitous in manufacturing, service, healthcare, and security. Common deformable objects (e.g., wires, clothes, bed sheets, etc.) are significantly more difficult to model than rigid objects. In this research, we contribute to the model-based manipulation of linear flexible objects such as cables. We propose a 3D geometric model of the linear flexible object that is subject to gravity and a physical model with multiple links connected by revolute joints and identified model parameters. These models enable task automation in manipulating linear flexible objects both in simulation and real world. To bridge the gap between simulation and real world and build a close-to-reality simulation of flexible objects, we propose a new strategy called Simulation-to-Real-to-Simulation (Sim2Real2Sim). We demonstrate the feasibility of our approach by completing the Plug Task used in the 2015 DARPA Robotics Challenge Finals both in simulation and real world, which involves unplugging a power cable from one socket and plugging it into another. Numerical experiments are implemented to validate our approach.

Download Full-text

Complex Network Evolution Model Based on Node Attraction

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.596.843 ◽

2014 ◽

Vol 596 ◽

pp. 843-846

Author(s):

Rui Sun

Keyword(s):

Complex Network ◽

Real World ◽

Simulation Analysis ◽

Network Evolution ◽

Evolution Process ◽

Statistical Characteristics ◽

Model Parameters ◽

Model Based ◽

Attraction Model ◽

Theory Research

This paper studied the evolution law of the real-world networks, and then proposed a complex network model based on node attraction with tunable parameters in order to solve the problems existing in BA model and the original node attraction model. The model considered the effects of preferential attachments by the changes of degree and node attraction in the evolution process of networks. Theory research and simulation analysis show that we can more flexible adjust the evolution process of network through adjusting model parameters, therefore make it more accord with the network topology and statistical characteristics of real-world networks.

Download Full-text

Differentiable Physics Models for Real-world Offline Model-based Reinforcement Learning

10.1109/icra48506.2021.9561805 ◽

2021 ◽

Author(s):

Michael Lutter ◽

Johannes Silberbauer ◽

Joe Watson ◽

Jan Peters

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Model Based

Download Full-text

Indoor Positioning System Using Dynamic Model Estimation

Sensors ◽

10.3390/s20247003 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7003

Author(s):

Yuri Assayag ◽

Horácio Oliveira ◽

Eduardo Souto ◽

Raimundo Barreto ◽

Richard Pazzi

Keyword(s):

Dynamic Model ◽

Signal Propagation ◽

Indoor Positioning ◽

Position Estimation ◽

Small Scale ◽

Model Parameters ◽

Model Estimation ◽

Positioning Systems ◽

Model Based ◽

Anchor Nodes

Indoor Positioning Systems (IPSs) are used to locate mobile devices in indoor environments. Model-based IPSs have the advantage of not having an exhausting training and signal characterization of the environment, as required by the fingerprint technique. However, most model-based IPSs are done using fixed model parameters, treating the whole scenario as having a uniform signal propagation. This might work for most small scale experiments, but not for larger scenarios. In this paper, we propose PoDME (Positioning using Dynamic Model Estimation), a model-based IPS that uses dynamic parameters that are estimated based on the location the signal was sent. More specifically, we use the set of anchor nodes that received the signal sent by the mobile node and their signal strengths, to estimate the best local values for the log-distance model parameters. Also, since our solution depends highly on the selected anchor nodes to use on the position computation, we propose a novel method for choosing the three best anchor nodes. Our method is based on several data analysis executed on a large-scale, Bluetooth-based, real-world experiment and it chooses not only the nearest anchor but also the ones that benefit our least-square-based position computation. Our solution achieves a position estimation error of 3 m, which is 17% better than a fixed-parameters model from the literature.

Download Full-text

Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards

10.32470/ccn.2018.1191-0 ◽

2018 ◽

Author(s):

Paul Krueger ◽

Thomas Griffiths

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Model Free

Download Full-text

Improved Temperature Compensation of Atomic Clocks and INS Instruments using Multivariate Model-based Design Optimized for Real-world Operating Conditions.

Proceedings of the 49th Annual Precise Time and Time Interval Systems and Applications Meeting ◽

10.33012/2018.15601 ◽

2018 ◽

Author(s):

Andrew V. Dowd

Keyword(s):

Real World ◽

Temperature Compensation ◽

Multivariate Model ◽

Operating Conditions ◽

Atomic Clocks ◽

Model Based

Download Full-text

1243-P: Novel Use of Natural Language Processing to Identify Reasons for Insulin Discontinuation in Patients with T2DM: A Real-World Evidence Study

Diabetes ◽

10.2337/db19-1243-p ◽

2019 ◽

Vol 68 (Supplement 1) ◽

pp. 1243-P

Author(s):

JIANMIN WU ◽

FRITHA J. MORRISON ◽

ZHENXIANG ZHAO ◽

XUANYAO HE ◽

MARIA SHUBINA ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Real World ◽

Real World Evidence

Download Full-text

Model-Based and Model-Free Social Cognition

10.31234/osf.io/ue6j2 ◽

2019 ◽

Author(s):

Leor M Hackel ◽

Jeffrey Jordan Berg ◽

Björn Lindström ◽

David Amodio

Keyword(s):

Reinforcement Learning ◽

Social Cognition ◽

Learning Strategies ◽

Memory Systems ◽

Learning Task ◽

Financial Advisors ◽

Model Based ◽

Model Free ◽

Systems Model ◽

Task Assessment

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.

Download Full-text