scholarly journals A State Representation for Reinforcement Learning and Decision-Making in the Orbitofrontal Cortex

2017 ◽  
Author(s):  
Nicolas W. Schuck ◽  
Robert Wilson ◽  
Yael Niv

AbstractDespite decades of research, the exact ways in which the orbitofrontal cortex (OFC) influences cognitive function have remained mysterious. Anatomically, the OFC is characterized by remarkably broad connectivity to sensory, limbic and subcortical areas, and functional studies have implicated the OFC in a plethora of functions ranging from facial processing to value-guided choice. Notwithstanding such diversity of findings, much research suggests that one important function of the OFC is to support decision making and reinforcement learning. Here, we describe a novel theory that posits that OFC’s specific role in decision-making is to provide an up-to-date representation of task-related information, called a state representation. This representation reflects a mapping between distinct task states and sensory as well as unobservable information. We summarize evidence supporting the existence of such state representations in rodent and human OFC and argue that forming these state representations provides a crucial scaffold that allows animals to efficiently perform decision making and reinforcement learning in high-dimensional and partially observable environments. Finally, we argue that our theory offers an integrating framework for linking the diversity of functions ascribed to OFC and is in line with its wide ranging connectivity.

Author(s):  
Daoming Lyu ◽  
Fangkai Yang ◽  
Bo Liu ◽  
Daesub Yoon

Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options. This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.


2021 ◽  
Vol 11 (21) ◽  
pp. 10337
Author(s):  
Junkai Ren ◽  
Yujun Zeng ◽  
Sihang Zhou ◽  
Yichuan Zhang

Scaling end-to-end learning to control robots with vision inputs is a challenging problem in the field of deep reinforcement learning (DRL). While achieving remarkable success in complex sequential tasks, vision-based DRL remains extremely data-inefficient, especially when dealing with high-dimensional pixels inputs. Many recent studies have tried to leverage state representation learning (SRL) to break through such a barrier. Some of them could even help the agent learn from pixels as efficiently as from states. Reproducing existing work, accurately judging the improvements offered by novel methods, and applying these approaches to new tasks are vital for sustaining this progress. However, the demands of these three aspects are seldom straightforward. Without significant criteria and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the previous methods are meaningful. For this reason, we conducted ablation studies on hyperparameters, embedding network architecture, embedded dimension, regularization methods, sample quality and SRL methods to compare and analyze their effects on representation learning and reinforcement learning systematically. Three evaluation metrics are summarized, including five baseline algorithms (including both value-based and policy-based methods) and eight tasks are adopted to avoid the particularity of each experiment setting. We highlight the variability in reported methods and suggest guidelines to make future results in SRL more reproducible and stable based on a wide number of experimental analyses. We aim to spur discussion about how to assure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.


2019 ◽  
Vol 65 ◽  
pp. 1-30 ◽  
Author(s):  
Vincent Francois-Lavet ◽  
Guillaume Rabusseau ◽  
Joelle Pineau ◽  
Damien Ernst ◽  
Raphael Fonteneau

This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. This analysis relies on expressing the quality of a state representation by bounding $L_1$ error terms of the associated belief states.  Theoretical results are empirically illustrated when the state representation is a truncated history of observations, both on synthetic POMDPs and on a large-scale POMDP in the context of smartgrids, with real-world data. Finally, similarly to known results in the fully observable setting, we also briefly discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting in the partially observable context.


2014 ◽  
Vol 369 (1655) ◽  
pp. 20130472 ◽  
Author(s):  
Jeffrey J. Stott ◽  
A. David Redish

Both orbitofrontal cortex (OFC) and ventral striatum (vStr) have been identified as key structures that represent information about value in decision-making tasks. However, the dynamics of how this information is processed are not yet understood. We recorded ensembles of cells from OFC and vStr in rats engaged in the spatial adjusting delay-discounting task , a decision-making task that involves a trade-off between delay to and magnitude of reward. Ventral striatal neural activity signalled information about reward before the rat's decision, whereas such reward-related signals were absent in OFC until after the animal had committed to its decision. These data support models in which vStr is directly involved in action selection, but OFC processes decision-related information afterwards that can be used to compare the predicted and actual consequences of behaviour.


Author(s):  
Stephen Kelly ◽  
Malcolm Heywood

We propose a Genetic Programming (GP) framework to address high-dimensional Multi-Task Reinforcement Learning (MTRL) through emergent modularity. A bottom-up process is assumed in which multiple programs self-organize into collective decision-making entities, or teams, which then further develop into multi-team policy graphs, or Tangled Program Graphs (TPG). The framework learns to play three Atari video games simultaneously, producing a single control policy that matches or exceeds leading results from (game-specific) deep reinforcement learning in each game. More importantly, unlike the representation assumed for deep learning, TPG policies start simple and adaptively complexify through interaction with the task environment, resulting in agents that are exceedingly simple, operating in real-time without specialized hardware support such as GPUs.


Author(s):  
Soraya Rahma Hayati ◽  
Mesran Mesran ◽  
Taronisokhi Zebua ◽  
Heri Nurdiyanto ◽  
Khasanah Khasanah

The reception of journalists at the Waspada Daily Medan always went through several rigorous selections before being determined to be accepted as journalists at the Waspada Medan Daily. There are several criteria that must be possessed by each participant as a condition for becoming a journalist in the Daily Alert Medan. To get the best participants, the Waspada Medan Daily needed a decision support system. Decision Support Systems (SPK) are part of computer-based information systems (including knowledge-based systems (knowledge management)) that are used to support decision making within an organization or company. Decision support systems provide a semitructured decision, where no one knows exactly how the decision should be made. In this study the authors applied the VlseKriterijumska Optimizacija I Kompromisno Resenje (VIKOR) as the method to be applied in the decision support system application. The VIKOR method is part of the Multi-Attibut Decision Making (MADM) Concept, which requires normalization in its calculations. The expected results in this study can obtain maximum decisions.Keywords: Journalist Acceptance, Decision Support System, VIKOR


2017 ◽  
Vol 65 (4) ◽  

Within a clinical sports medical setting the discussion about doping is insufficient. In elite-sports use of pharmaceutical agents is daily business in order to maintain the expected top-level performance. Unfortunately, a similar development could be observed in the general population of leisure athletes where medical supervision is absent. As a sports physician you are facing imminent ethical questions when standing in between. Therefore, we propose the application of a standardised risk score as a tool to promote doping-prevention and launch the debate within athlete-physician-relationship. In the longterm such kind of risk stratification systems may support decision-making with regard to «protective» exclusion of sporting competition.


Sign in / Sign up

Export Citation Format

Share Document