scholarly journals Robust Pavlovian-to-Instrumental and Pavlovian-to-Metacognitive Transfers in human reinforcement learning

2019 ◽  
Author(s):  
Chih-Chung Ting ◽  
Stefano Palminteri ◽  
Jan B. Engelmann ◽  
Maël Lebreton

AbstractIn simple instrumental-learning tasks, humans learn to seek gains and to avoid losses equally well. Yet, two effects of valence are observed. First, decisions in loss-contexts are slower, which is consistent with the Pavlovian-instrumental transfer (PIT) hypothesis. Second, loss contexts decrease individuals’ confidence in their choices – a bias akin to a Pavlovian-to-metacognitive transfer (PMT). Whether these two effects are two manifestations of a single mechanism or whether they can be partially dissociated is unknown. Here, across six experiments, we attempted to disrupt the PIT effects by manipulating the mapping between decisions and actions and imposing constraints on response times (RTs). Our goal was to assess the presence of the metacognitive bias in the absence of the RT bias. Were observed both PIT and PMT despite our disruption attempts, establishing that the effects of valence on motor and metacognitive responses are very robust and replicable. Nonetheless, within- and between-individual inferences reveal that the confidence bias resists the disruption of the RT bias. Therefore, although concomitant in most cases, PMT and PIT seem to be – partly – dissociable. These results highlight new important mechanistic constraints that should be incorporated in learning models to jointly explain choice, reaction times and confidence.

2020 ◽  
Vol 20 (6) ◽  
pp. 1184-1199 ◽  
Author(s):  
Chih-Chung Ting ◽  
Stefano Palminteri ◽  
Jan B. Engelmann ◽  
Maël Lebreton

AbstractIn simple instrumental-learning tasks, humans learn to seek gains and to avoid losses equally well. Yet, two effects of valence are observed. First, decisions in loss-contexts are slower. Second, loss contexts decrease individuals’ confidence in their choices. Whether these two effects are two manifestations of a single mechanism or whether they can be partially dissociated is unknown. Across six experiments, we attempted to disrupt the valence-induced motor bias effects by manipulating the mapping between decisions and actions and imposing constraints on response times (RTs). Our goal was to assess the presence of the valence-induced confidence bias in the absence of the RT bias. We observed both motor and confidence biases despite our disruption attempts, establishing that the effects of valence on motor and metacognitive responses are very robust and replicable. Nonetheless, within- and between-individual inferences reveal that the confidence bias resists the disruption of the RT bias. Therefore, although concomitant in most cases, valence-induced motor and confidence biases seem to be partly dissociable. These results highlight new important mechanistic constraints that should be incorporated in learning models to jointly explain choice, reaction times and confidence.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Steven Miletić ◽  
Russell J Boag ◽  
Anne C Trutti ◽  
Niek Stevenson ◽  
Birte U Forstmann ◽  
...  

Learning and decision-making are interactive processes, yet cognitive modeling of error-driven learning and decision-making have largely evolved separately. Recently, evidence accumulation models (EAMs) of decision-making and reinforcement learning (RL) models of error-driven learning have been combined into joint RL-EAMs that can in principle address these interactions. However, we show that the most commonly used combination, based on the diffusion decision model (DDM) for binary choice, consistently fails to capture crucial aspects of response times observed during reinforcement learning. We propose a new RL-EAM based on an advantage racing diffusion (ARD) framework for choices among two or more options that not only addresses this problem but captures stimulus difficulty, speed-accuracy trade-off, and stimulus-response-mapping reversal effects. The RL-ARD avoids fundamental limitations imposed by the DDM on addressing effects of absolute values of choices, as well as extensions beyond binary choice, and provides a computationally tractable basis for wider applications.


2020 ◽  
Author(s):  
Steven Miletić ◽  
Russell J. Boag ◽  
Anne C. Trutti ◽  
Birte U. Forstmann ◽  
Andrew Heathcote

AbstractLearning and decision making are interactive processes, yet cognitive modelling of error-driven learning and decision making have largely evolved separately. Recently, evidence accumulation models (EAMs) of decision making and reinforcement learning (RL) models of error-driven learning have been combined into joint RL-EAMs that can in principle address these interactions. However, we show that the most commonly used combination, based on the diffusion decision model (DDM) for binary choice, consistently fails to capture crucial aspects of response times observed during reinforcement learning. We propose a new RL-EAM based on an advantage racing diffusion (ARD) framework for choices among two or more options that not only addresses this problem but captures stimulus difficulty, speed-accuracy trade-off, and stimulus-response-mapping reversal effects. The RL-ARD avoids fundamental limitations imposed by the DDM on addressing effects of absolute values of choices, as well as extensions beyond binary choice, and provides a computationally tractable basis for wider applications.


2018 ◽  
Author(s):  
Ian C. Ballard ◽  
Samuel M. McClure

AbstractBackgroundReinforcement learning models provide excellent descriptions of learning in multiple species across a variety of tasks. Many researchers are interested in relating parameters of reinforcement learning models to neural measures, psychological variables or experimental manipulations. We demonstrate that parameter identification is difficult because a range of parameter values provide approximately equal quality fits to data. This identification problem has a large impact on power: we show that a researcher who wants to detect a medium sized correlation (r = .3) with 80% power between a variable and learning rate must collect 60% more subjects than specified by a typical power analysis in order to account for the noise introduced by model fitting.New MethodWe derive a Bayesian optimal model fitting technique that takes advantage of information contained in choices and reaction times to constrain parameter estimates.ResultsWe show using simulation and empirical data that this method substantially improves the ability to recover learning rates.Comparison with Existing MethodsWe compare this method against the use of Bayesian priors. We show in simulations that the combined use of Bayesian priors and reaction times confers the highest parameter identifiability. However, in real data where the priors may have been misspecified, the use of Bayesian priors interferes with the ability of reaction time data to improve parameter identifiability.ConclusionsWe present a simple technique that takes advantage of readily available data to substantially improve the quality of inferences that can be drawn from parameters of reinforcement learning models.Highlights–Parameters of reinforcement learning models are particularly difficult to estimate–Incorporating reaction times into model fitting improves parameter identifiability–Bayesian weighting of choice and reaction times improves the power of analyses assessing learning rate


1964 ◽  
Vol 16 (3) ◽  
pp. 216-223 ◽  
Author(s):  
G. H. Mowbray

Previous findings suggested that selective response times might be affected both by the inter-stimulus interval and by the probability of occurrence of the stimulus for reaction. These two factors have been tested independently and have been found to influence reaction times in a fashion that an expectancy hypothesis would predict.


2003 ◽  
Vol 17 (3) ◽  
pp. 113-123 ◽  
Author(s):  
Jukka M. Leppänen ◽  
Mirja Tenhunen ◽  
Jari K. Hietanen

Abstract Several studies have shown faster choice-reaction times to positive than to negative facial expressions. The present study examined whether this effect is exclusively due to faster cognitive processing of positive stimuli (i.e., processes leading up to, and including, response selection), or whether it also involves faster motor execution of the selected response. In two experiments, response selection (onset of the lateralized readiness potential, LRP) and response execution (LRP onset-response onset) times for positive (happy) and negative (disgusted/angry) faces were examined. Shorter response selection times for positive than for negative faces were found in both experiments but there was no difference in response execution times. Together, these results suggest that the happy-face advantage occurs primarily at premotoric processing stages. Implications that the happy-face advantage may reflect an interaction between emotional and cognitive factors are discussed.


Decision ◽  
2016 ◽  
Vol 3 (2) ◽  
pp. 115-131 ◽  
Author(s):  
Helen Steingroever ◽  
Ruud Wetzels ◽  
Eric-Jan Wagenmakers

Sign in / Sign up

Export Citation Format

Share Document