scholarly journals Can model-free reinforcement learning operate over information stored in working-memory?

2017 ◽  
Author(s):  
Carolina Feher da Silva ◽  
Yuan-Wei Yao ◽  
Todd A. Hare

AbstractModel-free learning creates stimulus-response associations. But what constitutes a stimulus? Are there limits to types of stimuli a model-free or habitual system can operate over? Most experiments on reward learning in humans and animals have used discrete sensory stimuli, but there is no algorithmic reason that model-free learning should be restricted to external stimuli, and recent theories have suggested that model-free processes may operate over highly abstract concepts and goals. Our study aimed to determine whether model-free learning processes can operate over environmental states defined by information held in working memory. Specifically, we tested whether or not humans can learn explicit temporal patterns of individually uninformative cues in a model-free manner. We compared the data from human participants in a reward learning paradigm using (1) a simultaneous symbol presentation condition or (2) a sequential symbol presentation condition, wherein the same visual stimuli were presented simultaneously or as a temporal sequence that required working memory. We found a significant effect of reward on human behavior in the sequential presentation condition, indicating that model-free learning can operate on information stored in working memory. Further analyses, however, revealed that the behavior of the participants contradicts the basic assumptions of our hypotheses, and it is possible that the observed effect of reward was generated by model-based rather than model-free learning. Thus it is not possible to draw any conclusions from out study regarding model-free learning of temporal sequences held in working memory. We conclude instead that careful thought should be given about how to best explain two-stage tasks to participants.

PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0244822
Author(s):  
Nareg Berberian ◽  
Matt Ross ◽  
Sylvain Chartier

Sensory stimuli endow animals with the ability to generate an internal representation. This representation can be maintained for a certain duration in the absence of previously elicited inputs. The reliance on an internal representation rather than purely on the basis of external stimuli is a hallmark feature of higher-order functions such as working memory. Patterns of neural activity produced in response to sensory inputs can continue long after the disappearance of previous inputs. Experimental and theoretical studies have largely invested in understanding how animals faithfully maintain sensory representations during ongoing reverberations of neural activity. However, these studies have focused on preassigned protocols of stimulus presentation, leaving out by default the possibility of exploring how the content of working memory interacts with ongoing input streams. Here, we study working memory using a network of spiking neurons with dynamic synapses subject to short-term and long-term synaptic plasticity. The formal model is embodied in a physical robot as a companion approach under which neuronal activity is directly linked to motor output. The artificial agent is used as a methodological tool for studying the formation of working memory capacity. To this end, we devise a keyboard listening framework to delineate the context under which working memory content is (1) refined, (2) overwritten or (3) resisted by ongoing new input streams. Ultimately, this study takes a neurorobotic perspective to resurface the long-standing implication of working memory in flexible cognition.


2017 ◽  
Vol 29 (12) ◽  
pp. 2011-2024 ◽  
Author(s):  
Anastasia Kiyonaga ◽  
Emma Wu Dowd ◽  
Tobias Egner

Recent theories assert that visual working memory (WM) relies on the same attentional resources and sensory substrates as visual attention to external stimuli. Behavioral studies have observed competitive tradeoffs between internal (i.e., WM) and external (i.e., visual) attentional demands, and neuroimaging studies have revealed representations of WM content as distributed patterns of activity within the same cortical regions engaged by perception of that content. Although a key function of WM is to protect memoranda from competing input, it remains unknown how neural representations of WM content are impacted by incoming sensory stimuli and concurrent attentional demands. Here, we investigated how neural evidence for WM information is affected when attention is occupied by visual search—at varying levels of difficulty—during the delay interval of a WM match-to-sample task. Behavioral and fMRI analyses suggested that WM maintenance was impacted by the difficulty of a concurrent visual task. Critically, multivariate classification analyses of category-specific ventral visual areas revealed a reduction in decodable WM-related information when attention was diverted to a visual search task, especially when the search was more difficult. This study suggests that the amount of available attention during WM maintenance influences the detection of sensory WM representations.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Julia Friedrich ◽  
Henriette Spaleck ◽  
Ronja Schappert ◽  
Maximilian Kleimaker ◽  
Julius Verrel ◽  
...  

AbstractIt is a common phenomenon that somatosensory sensations can trigger actions to alleviate experienced tension. Such “urges” are particularly relevant in patients with Gilles de la Tourette (GTS) syndrome since they often precede tics, the cardinal feature of this common neurodevelopmental disorder. Altered sensorimotor integration processes in GTS as well as evidence for increased binding of stimulus- and response-related features (“hyper-binding”) in the visual domain suggest enhanced perception–action binding also in the somatosensory modality. In the current study, the Theory of Event Coding (TEC) was used as an overarching cognitive framework to examine somatosensory-motor binding. For this purpose, a somatosensory-motor version of a task measuring stimulus–response binding (S-R task) was tested using electro-tactile stimuli. Contrary to the main hypothesis, there were no group differences in binding effects between GTS patients and healthy controls in the somatosensory-motor paradigm. Behavioral data did not indicate differences in binding between examined groups. These data can be interpreted such that a compensatory “downregulation” of increased somatosensory stimulus saliency, e.g., due to the occurrence of somatosensory urges and hypersensitivity to external stimuli, results in reduced binding with associated motor output, which brings binding to a “normal” level. Therefore, “hyper-binding” in GTS seems to be modality-specific.


2018 ◽  
Vol 29 (9) ◽  
pp. 3687-3701 ◽  
Author(s):  
Belinda P P Lay ◽  
Melissa Nicolosi ◽  
Alexandra A Usypchuk ◽  
Guillem R Esber ◽  
Mihaela D Iordanova

Abstract Behavioral change is paramount to adaptive behavior. Two ways to achieve alterations in previously established behavior are extinction and overexpectation. The infralimbic (IL) portion of the medial prefrontal cortex controls the inhibition of previously established aversive behavioral responses in extinction. The role of the IL cortex in behavioral modification in appetitive Pavlovian associations remains poorly understood. Here, we seek to determine if the IL cortex modulates overexpectation and extinction of reward learning. Using overexpectation or extinction to achieve a reduction in behavior, the present findings uncover a dissociable role for the IL cortex in these paradigms. Pharmacologically inactivating the IL cortex left overexpectation intact. In contrast, pre-training manipulations in the IL cortex prior to extinction facilitated the reduction in conditioned responding but led to a disrupted extinction retrieval on test drug-free. Additional studies confirmed that this effect is restricted to the IL and not dependent on the dorsally-located prelimbic cortex. Together, these results show that the IL cortex underlies extinction but not overexpectation-driven reduction in behavior, which may be due to regulating the expression of conditioned responses influenced by stimulus–response associations rather than stimulus–stimulus associations.


2017 ◽  
Author(s):  
Ed David John Berry ◽  
Amanda Waterman ◽  
Alan D. Baddeley ◽  
Graham J. Hitch ◽  
Richard John Allen

Recent research has demonstrated that, when instructed to prioritize a serial position in visual working memory, adults are able to boost performance for this selected item, at a cost to non-prioritized items (e.g. Hu et al., 2014). While executive control appears to play an important role in this ability, the increased likelihood of recalling the most recently presented item (i.e. the recency effect) is relatively automatic, possibly driven by perceptual mechanisms. In three experiments 7 to 10-year-old’s ability to prioritize items in working memory was investigated using a sequential visual task (total N = 208). The relationship between individual differences in working memory and performance on the experimental task was also explored. Participants were unable to prioritize the first (Experiments 1 & 2) or final (Experiment 3) item in a 3-item sequence, while large recency effects for the final item were consistently observed across all experiments. The absence of a priority boost across three experiments indicates that children may not have the necessary executive resources to prioritize an item within a visual sequence, when directed to do so. In contrast, the consistent recency boosts for the final item indicate that children show automatic memory benefits for the most recently encountered stimulus. Finally, for the baseline condition in which children were instructed to remember all three items equally, additional working memory measures predicted performance at the first and second but not the third serial position, further supporting the proposed automaticity of the recency effect in visual working memory.


2019 ◽  
Author(s):  
David Luque ◽  
Sara Molinero ◽  
Poppy Watson ◽  
Francisco J. López ◽  
Mike Le Pelley

Reward-learning theory views habits as stimulus–response links formed through extended reward training. Accordingly, animal research has shown that actions that are initially goal-directed can become habitual after operant overtraining. However, a similar demonstration is absent in human research, which poses a serious problem for translational models of behavior. We propose that response-time (RT) switch cost after operant training can be used as a new, reliable marker for the operation of the habit system in humans. Using a new method, we show that RT switch cost demonstrates the properties that would be expected of a habitual behavior: (1) it increases with overtraining; (2) it increases when rewards are larger, and (3) it increases when time pressure is added to the task, thereby hindering the competing goal-directed system. These results offer a promising new pathway for studying the operation of the habit system in humans.


2021 ◽  
pp. 1-14
Author(s):  
Khoi D. Vo ◽  
Audrey Siqi-Liu ◽  
Alondra Chaire ◽  
Sophia Li ◽  
Elise Demeter ◽  
...  

Abstract Attention and working memory (WM) have classically been considered as two separate cognitive functions, but more recent theories have conceptualized them as operating on shared representations and being distinguished primarily by whether attention is directed internally (WM) or externally (attention, traditionally defined). Supporting this idea, a recent behavioral study documented a “WM Stroop effect,” showing that maintaining a color word in WM impacts perceptual color-naming performance to the same degree as presenting the color word externally in the classic Stroop task. Here, we employed ERPs to examine the neural processes underlying this WM Stroop task compared to those in the classic Stroop and in a WM-control task. Based on the assumption that holding a color word in WM would (pre-)activate the same color representation as by externally presenting that color word, we hypothesized that the neural cascade of conflict–control processes would occur more rapidly in the WM Stroop than in the classic Stroop task. Our behavioral results replicated equivalent interference behavioral effects for the WM and classic Stroop tasks. Importantly, however, the ERP signatures of conflict detection and resolution displayed substantially shorter latencies in the WM Stroop task. Moreover, delay-period conflict in the WM Stroop task, but not in the WM control task, impacted the ERP and performance measures for the WM probe stimuli. Together, these findings provide new insights into how the brain processes conflict between internal representations and external stimuli, and they support the view of shared representations between internally held WM content and attentional processing of external stimuli.


Author(s):  
Juergen Perl

In particular in technical contexts, information systems and analysing techniques help a lot for gathering data and making information available. Regarding dynamic behavioral systems like athletes or teams in sports, however, the situation is difficult: data from training and competition do not give much information about current and future performance without an appropriate model of interaction and adaptation. Physiologic adaptation is one major aspect of targetoriented behavior, in physical training as well as in mental learning. In a simplified way it can be described by a stimulus- response-model, where external stimuli change situation or status of an organism and so cause activities in order to adapt. This aspect can appear in quite different dimensions like individual biochemical adaptation that needs only milliseconds up to selection of the fittest of a species, which can last millions of years. Well-known examples can be taken from learning processes or other mental work as well as from sport and exercising. Most of those examples are characterized by a phenomenon that we call antagonism: The input stimulus causes two contradicting responses, which control each other and – by balancing out – finally enable to reach a given target. For example, the move of a limb is controlled by antagonistic groups of muscles, and the result of a game is controlled by the efforts of competing teams. In order to understand and eventually improve such adaptation, models are necessary that make the processes transparent and help for simulating dynamics like for example, the increase of heart rate as an reaction of speeding up in jogging. With such models it becomes possible not only to analyze past processes but also to predict and schedule indented future ones. In the Background section, main aspects of modeling antagonistic adaptation systems are briefly discussed, which is followed by a more detailed description of the developed PerPot-model and a number of examples of application in the Main Focus section.


2020 ◽  
Vol 117 (39) ◽  
pp. 24590-24598
Author(s):  
Freek van Ede ◽  
Alexander G. Board ◽  
Anna C. Nobre

Adaptive behavior relies on the selection of relevant sensory information from both the external environment and internal memory representations. In understanding external selection, a classic distinction is made between voluntary (goal-directed) and involuntary (stimulus-driven) guidance of attention. We have developed a task—the anti-retrocue task—to separate and examine voluntary and involuntary guidance of attention to internal representations in visual working memory. We show that both voluntary and involuntary factors influence memory performance but do so in distinct ways. Moreover, by tracking gaze biases linked to attentional focusing in memory, we provide direct evidence for an involuntary “retro-capture” effect whereby external stimuli involuntarily trigger the selection of feature-matching internal representations. We show that stimulus-driven and goal-directed influences compete for selection in memory, and that the balance of this competition—as reflected in oculomotor signatures of internal attention—predicts the quality of ensuing memory-guided behavior. Thus, goal-directed and stimulus-driven factors together determine the fate not only of perception, but also of internal representations in working memory.


Sign in / Sign up

Export Citation Format

Share Document