A State Representation for Reinforcement Learning and Decision-Making in the Orbitofrontal Cortex

Mapping Intimacies ◽

10.1101/210591 ◽

2017 ◽

Cited By ~ 4

Author(s):

Nicolas W. Schuck ◽

Robert Wilson ◽

Yael Niv

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Orbitofrontal Cortex ◽

High Dimensional ◽

State Representation ◽

Functional Studies ◽

Related Information ◽

Novel Theory ◽

Partially Observable ◽

Support Decision Making

AbstractDespite decades of research, the exact ways in which the orbitofrontal cortex (OFC) influences cognitive function have remained mysterious. Anatomically, the OFC is characterized by remarkably broad connectivity to sensory, limbic and subcortical areas, and functional studies have implicated the OFC in a plethora of functions ranging from facial processing to value-guided choice. Notwithstanding such diversity of findings, much research suggests that one important function of the OFC is to support decision making and reinforcement learning. Here, we describe a novel theory that posits that OFC’s specific role in decision-making is to provide an up-to-date representation of task-related information, called a state representation. This representation reflects a mapping between distinct task states and sensory as well as unobservable information. We summarize evidence supporting the existence of such state representations in rodent and human OFC and argue that forming these state representations provides a crucial scaffold that allows animals to efficiently perform decision making and reinforcement learning in high-dimensional and partially observable environments. Finally, we argue that our theory offers an integrating framework for linking the diversity of functions ascribed to OFC and is in line with its wide ranging connectivity.

Download Full-text

A State Representation for Reinforcement Learning and Decision-Making in the Orbitofrontal Cortex

Goal-Directed Decision Making ◽

10.1016/b978-0-12-812098-9.00012-7 ◽

2018 ◽

pp. 259-278 ◽

Cited By ~ 12

Author(s):

Nicolas W. Schuck ◽

Robert Wilson ◽

Yael Niv

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Orbitofrontal Cortex ◽

State Representation

Download Full-text

Logic-Based Sequential Decision-Making

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019995 ◽

2019 ◽

Vol 33 ◽

pp. 9995-9996

Author(s):

Daoming Lyu ◽

Fangkai Yang ◽

Bo Liu ◽

Daesub Yoon

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

High Dimensional ◽

Great Success ◽

Sequential Decision ◽

Sensory Inputs ◽

Hierarchical Decision ◽

High Level ◽

Data Efficiency ◽

Symbolic Planning

Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options. This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.

Download Full-text

An Experimental Study on State Representation Extraction for Vision-Based Deep Reinforcement Learning

Applied Sciences ◽

10.3390/app112110337 ◽

2021 ◽

Vol 11 (21) ◽

pp. 10337

Author(s):

Junkai Ren ◽

Yujun Zeng ◽

Sihang Zhou ◽

Yichuan Zhang

Keyword(s):

Experimental Study ◽

Reinforcement Learning ◽

Network Architecture ◽

Representation Learning ◽

Evaluation Metrics ◽

High Dimensional ◽

Regularization Methods ◽

Challenging Problem ◽

State Representation ◽

Sample Quality

Scaling end-to-end learning to control robots with vision inputs is a challenging problem in the field of deep reinforcement learning (DRL). While achieving remarkable success in complex sequential tasks, vision-based DRL remains extremely data-inefficient, especially when dealing with high-dimensional pixels inputs. Many recent studies have tried to leverage state representation learning (SRL) to break through such a barrier. Some of them could even help the agent learn from pixels as efficiently as from states. Reproducing existing work, accurately judging the improvements offered by novel methods, and applying these approaches to new tasks are vital for sustaining this progress. However, the demands of these three aspects are seldom straightforward. Without significant criteria and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the previous methods are meaningful. For this reason, we conducted ablation studies on hyperparameters, embedding network architecture, embedded dimension, regularization methods, sample quality and SRL methods to compare and analyze their effects on representation learning and reinforcement learning systematically. Three evaluation metrics are summarized, including five baseline algorithms (including both value-based and policy-based methods) and eight tasks are adopted to avoid the particularity of each experiment setting. We highlight the variability in reported methods and suggest guidelines to make future results in SRL more reproducible and stable based on a wide number of experimental analyses. We aim to spur discussion about how to assure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.

Download Full-text

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11478 ◽

2019 ◽

Vol 65 ◽

pp. 1-30 ◽

Cited By ~ 2

Author(s):

Vincent Francois-Lavet ◽

Guillaume Rabusseau ◽

Joelle Pineau ◽

Damien Ernst ◽

Raphael Fonteneau

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Asymptotic Bias ◽

State Representation ◽

Real World Data ◽

Partial Observability ◽

History Of ◽

Batch Reinforcement Learning ◽

Partially Observable ◽

Belief States

This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. This analysis relies on expressing the quality of a state representation by bounding $L_1$ error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations, both on synthetic POMDPs and on a large-scale POMDP in the context of smartgrids, with real-world data. Finally, similarly to known results in the fully observable setting, we also briefly discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting in the partially observable context.

Download Full-text

A functional difference in information processing between orbitofrontal cortex and ventral striatum during decision-making behaviour

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2013.0472 ◽

2014 ◽

Vol 369 (1655) ◽

pp. 20130472 ◽

Cited By ~ 31

Author(s):

Jeffrey J. Stott ◽

A. David Redish

Keyword(s):

Decision Making ◽

Information Processing ◽

Delay Discounting ◽

Neural Activity ◽

Orbitofrontal Cortex ◽

Ventral Striatum ◽

Action Selection ◽

Functional Difference ◽

Trade Off ◽

Related Information

Both orbitofrontal cortex (OFC) and ventral striatum (vStr) have been identified as key structures that represent information about value in decision-making tasks. However, the dynamics of how this information is processed are not yet understood. We recorded ensembles of cells from OFC and vStr in rats engaged in the spatial adjusting delay-discounting task , a decision-making task that involves a trade-off between delay to and magnitude of reward. Ventral striatal neural activity signalled information about reward before the rat's decision, whereas such reward-related signals were absent in OFC until after the animal had committed to its decision. These data support models in which vStr is directly involved in action selection, but OFC processes decision-related information afterwards that can be used to compare the predicted and actual consequences of behaviour.

Download Full-text

Emergent Tangled Program Graphs in Multi-Task Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/740 ◽

2018 ◽

Author(s):

Stephen Kelly ◽

Malcolm Heywood

Keyword(s):

Decision Making ◽

Deep Learning ◽

Reinforcement Learning ◽

Video Games ◽

Genetic Programming ◽

Control Policy ◽

Task Environment ◽

High Dimensional ◽

Collective Decision Making ◽

Specialized Hardware

We propose a Genetic Programming (GP) framework to address high-dimensional Multi-Task Reinforcement Learning (MTRL) through emergent modularity. A bottom-up process is assumed in which multiple programs self-organize into collective decision-making entities, or teams, which then further develop into multi-team policy graphs, or Tangled Program Graphs (TPG). The framework learns to play three Atari video games simultaneously, producing a single control policy that matches or exceeds leading results from (game-specific) deep reinforcement learning in each game. More importantly, unlike the representation assumed for deep learning, TPG policies start simple and adaptively complexify through interaction with the task environment, resulting in agents that are exceedingly simple, operating in real-time without specialized hardware support such as GPUs.

Download Full-text

Information systems to support decision-making in construction owner organizations : a data warehousing approach

10.25148/etd.fi14032386 ◽

2005 ◽

Author(s):

Salman Azhar

Keyword(s):

Decision Making ◽

Information Systems ◽

Data Warehousing ◽

Support Decision Making

Download Full-text

IMPLEMENTASI METODE VIKOR DALAM PENERIMAAN JURNALIS PADA KORAN WASPADA MEDAN

KOMIK (Konferensi Nasional Teknologi Informasi dan Komputer) ◽

10.30865/komik.v2i1.905 ◽

2018 ◽

Vol 2 (1) ◽

Author(s):

Soraya Rahma Hayati ◽

Mesran Mesran ◽

Taronisokhi Zebua ◽

Heri Nurdiyanto ◽

Khasanah Khasanah

Keyword(s):

Decision Making ◽

Decision Support ◽

Decision Support System ◽

Decision Support Systems ◽

Support System ◽

Support Systems ◽

Knowledge Based ◽

Computer Based ◽

Support Decision Making ◽

Made In

The reception of journalists at the Waspada Daily Medan always went through several rigorous selections before being determined to be accepted as journalists at the Waspada Medan Daily. There are several criteria that must be possessed by each participant as a condition for becoming a journalist in the Daily Alert Medan. To get the best participants, the Waspada Medan Daily needed a decision support system. Decision Support Systems (SPK) are part of computer-based information systems (including knowledge-based systems (knowledge management)) that are used to support decision making within an organization or company. Decision support systems provide a semitructured decision, where no one knows exactly how the decision should be made. In this study the authors applied the VlseKriterijumska Optimizacija I Kompromisno Resenje (VIKOR) as the method to be applied in the decision support system application. The VIKOR method is part of the Multi-Attibut Decision Making (MADM) Concept, which requires normalization in its calculations. The expected results in this study can obtain maximum decisions.Keywords: Journalist Acceptance, Decision Support System, VIKOR

Download Full-text

Doping in sports – conscious lack of interest or ethical Everest?

Swiss Sports & Exercise Medicine ◽

10.34045/ssem/2017/27 ◽

2017 ◽

Vol 65 (4) ◽

Keyword(s):

Decision Making ◽

General Population ◽

Medical Setting ◽

Doping In Sports ◽

Physician Relationship ◽

Pharmaceutical Agents ◽

Similar Development ◽

Level Performance ◽

Elite Sports ◽

Support Decision Making

Within a clinical sports medical setting the discussion about doping is insufficient. In elite-sports use of pharmaceutical agents is daily business in order to maintain the expected top-level performance. Unfortunately, a similar development could be observed in the general population of leisure athletes where medical supervision is absent. As a sports physician you are facing imminent ethical questions when standing in between. Therefore, we propose the application of a standardised risk score as a tool to promote doping-prevention and launch the debate within athlete-physician-relationship. In the longterm such kind of risk stratification systems may support decision-making with regard to «protective» exclusion of sporting competition.

Download Full-text

Tactical Decision-Making in Autonomous Driving by Reinforcement Learning with Uncertainty Estimation

2020 IEEE Intelligent Vehicles Symposium (IV) ◽

10.1109/iv47402.2020.9304614 ◽

2020 ◽

Author(s):

Carl-Johan Hoel ◽

Krister Wolff ◽

Leo Laine

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Autonomous Driving ◽

Uncertainty Estimation ◽

Tactical Decision

Download Full-text