reward schedule Latest Research Papers

Noradrenergic regulation of Win-Stay/Lose-Shift policy and choice determinism in a two-armed bandit task

10.1101/2020.11.13.382069 ◽

2020 ◽

Author(s):

Kyra Swanson ◽

Bruno B. Averbeck ◽

Mark Laubach

Keyword(s):

Adrenergic System ◽

Inverse Temperature ◽

Reward Schedule ◽

General Role ◽

Learning Rates ◽

Feedback Sensitivity ◽

Decision Policies ◽

Measure Of Performance ◽

Primary Measure

AbstractRecent studies have established that one-trial-back decision policies (Win-Stay/Lose-Shift) and measures of reinforcement learning (RL), e.g. learning rate, can explain how animals perform two-armed bandit tasks. In many published studies, outcomes reverse after one option is selected repeatedly (e.g. 8 selections in a row), and the primary measure of performance is the number of reversals completed. Performance and Win-Stay likelihood are confounded by using recent performance to drive reversals. An alternative design reverses outcomes across options over fixed blocks of trials. We used this blocked design and tested rats in a spatial two-armed bandit task. We analyzed performance using Win-Stay/Lose-Shift (WSLS) metrics and a RL algorithm. We found that WSLS policies remain stable with increasing reward uncertainty, while choice accuracy decreases. Within test sessions, learning rates increased as rats adapted their strategies over the first few reversals but inverse temperature remains stable. We found that muscimol inactivation of medial orbital cortex (mOFC) mediates task performance and negative feedback sensitivity. Finally, we examined the role of the adrenergic system in bandit performance, and found yohimbine (2 mg/kg) dramatically decreased sensitivity to positive feedback, leading to decreases in accuracy and inverse temperature. These effects are partially dependent on a2 adrenergic receptors in OFC. Our findings demonstrate a correspondence between reward schedule, WSLS policies and RL metrics in a task design that is free of the confound between Wins and reversals, and that the noradrenergic influence of mOFC on WSLS policy is dissociable from the regions general role in cognitive flexibility.

Download Full-text

Blockchain without Waste: Proof-of-Stake

Review of Financial Studies ◽

10.1093/rfs/hhaa075 ◽

2020 ◽

Cited By ~ 4

Author(s):

Fahad Saleh

Keyword(s):

Economic Model ◽

Alternative Proof ◽

Reward Schedule

Abstract Permissionless blockchains require a protocol to generate consensus. Many prominent permissionless blockchains employ Proof-of-Work (PoW) for that purpose, but PoW possesses significant shortcomings. Various alternatives have been proposed. This paper provides the first formal economic model of the most famous alternative, Proof-of-Stake (PoS), and establishes conditions under which PoS generates consensus. A sufficiently modest reward schedule not only implies the existence of an equilibrium in which consensus obtains as soon as possible but also precludes a persistent forking equilibrium. The latter result arises because PoS, unlike PoW, requires that validators are also stakeholders.

Download Full-text

Learning fast and slow: deviations from the matching law can reflect an optimal strategy under uncertainty

10.1101/141309 ◽

2017 ◽

Cited By ~ 5

Author(s):

Kiyohito Iigaya ◽

Yashar Ahmadian ◽

Leo P. Sugrue ◽

Greg S. Corrado ◽

Yonatan Loewenstein ◽

...

Keyword(s):

Computational Models ◽

Matching Law ◽

Reward Schedule ◽

Uncertain Environments ◽

Total Rewards ◽

Large Fluctuations ◽

Normative Expectations ◽

Plausible Mechanism ◽

Bias Variance ◽

Foraging Task

AbstractBehavior which deviates from our normative expectations often appears irrational. A classic example concerns the question of how choice should be distributed among multiple alternatives. The so-called matching law predicts that the fraction of choices made to any option should match the fraction of total rewards earned from the option. This choice strategy can maximize reward in a stationary reward schedule. Empirically, however, behavior often deviates from this ideal. While such deviations have often been interpreted as reflecting ‘noisy’, suboptimal, decision-making, here we instead suggest that they reflect a strategy which is adaptive in nonstationary and uncertain environments. We analyze the results of a dynamic foraging task. Animals exhibited significant deviations from matching, and animals turned out to be able to collect more rewards when deviation was larger. We show that this behavior can be understood if one considers that animals had incomplete information about the environments dynamics. In particular, using computational models, we show that in such nonstationary environments, learning on both fast and slow timescales is beneficial. Learning on fast timescales means that an animal can react to sudden changes in the environment, though this inevitably introduces large fluctuations (variance) in value estimates. Concurrently, learning on slow timescales reduces the amplitude of these fluctuations at the price of introducing a bias that causes systematic deviations. We confirm this prediction in data – monkeys indeed solved the bias-variance tradeoff by combining learning on both fast and slow timescales. Our work suggests that multi-timescale learning could be a biologically plausible mechanism for optimizing decisions under uncertainty.

Download Full-text

Diversity of Adjustments to Reward Downshifts in Vertebrates

International Journal of Comparative Psychology ◽

10.46867/ijcp.2014.27.03.05 ◽

2014 ◽

Vol 27 (3) ◽

Author(s):

Mauricio R. Papini

Keyword(s):

Negative Contrast ◽

Emotional Memory ◽

Consummatory Behavior ◽

Reward Schedule ◽

Incentive Contrast ◽

Amphibian Species ◽

Carry Over ◽

Schedule Effect ◽

Taxonomic Groups ◽

Schedule Effects

This review focuses on reward-schedule effects, a family of learning phenomena involving surprising devaluations in reward quality or quantity (as in incentive contrast), and reward omissions (as in appetitive extinction), as studied in three taxonomic groups of vertebrates: mammals, birds, and amphibians. The largest database of dependable data comes from research with mammals in general, and with rats in particular. These experiments show a variety of behavioral adjustments to situations involving reward downshifts. For example, rats show disruption of instrumental and consummatory behavior directed at a small reward after receiving a substantially larger reward (called successive negative contrast, SNC)—a reward-schedule effect. However, instrumental SNC does not seem to occur when animals work for sucrose solutions—a reversed reward-schedule effect. Similar modes of adjustment have been reported in analogous experiments with avian and amphibian species. A review of the evidence suggests that carry-over signals across successive trials can acquire control over behavior under massed practice, but emotional memory is required to account for reward-schedule effects observed under widely spaced practice. There is evidence for an emotional component to reward-schedule effects in mammals, but similar evidence for other vertebrates is scanty and inconsistent. Progress in the comparative analysis of reward-schedule effects will require the intense study of a set of selected species, in selected reward-downshift situations, and aiming at identifying underlying neural mechanisms.

Download Full-text

Neurons in Monkey Dorsal Raphe Nucleus Code Beginning and Progress of Step-by-Step Schedule, Reward Expectation, and Amount of Reward Outcome in the Reward Schedule Task

Journal of Neuroscience ◽

10.1523/jneurosci.4388-12.2013 ◽

2013 ◽

Vol 33 (8) ◽

pp. 3477-3491 ◽

Cited By ~ 27

Author(s):

K. Inaba ◽

T. Mizuhiki ◽

T. Setogawa ◽

K. Toda ◽

B. J. Richmond ◽

...

Keyword(s):

Dorsal Raphe Nucleus ◽

Dorsal Raphe ◽

Raphe Nucleus ◽

Reward Schedule ◽

Schedule Task ◽

Dorsal Raphé

Download Full-text

Encoding of reward expectation by monkey anterior insular neurons

Journal of Neurophysiology ◽

10.1152/jn.00282.2011 ◽

2012 ◽

Vol 107 (11) ◽

pp. 2996-3007 ◽

Cited By ~ 20

Author(s):

Takashi Mizuhiki ◽

Barry J. Richmond ◽

Munetaka Shidara

Keyword(s):

Anterior Insula ◽

Reward Schedule ◽

Population Activity ◽

Visual Cue ◽

Current Trial ◽

Reward Seeking ◽

Reward Delivery ◽

Expected Outcome ◽

Schedule Task ◽

Dynamics Of Population

The insula, a cortical brain region that is known to encode information about autonomic, visceral, and olfactory functions, has recently been shown to encode information during reward-seeking tasks in both single neuronal recording and functional magnetic resonance imaging studies. To examine the reward-related activation, we recorded from 170 single neurons in anterior insula of 2 monkeys during a multitrial reward schedule task, where the monkeys had to complete a schedule of 1, 2, 3, or 4 trials to earn a reward. In one block of trials a visual cue indicated whether a reward would or would not be delivered in the current trial after the monkey successfully detected that a red spot turned green, and in other blocks the visual cue was random with respect to reward delivery. Over one-quarter of 131 responsive neurons were activated when the current trial would (certain or uncertain) be rewarded if performed correctly. These same neurons failed to respond in trials that were certain, as indicated by the cue, to be unrewarded. Another group of neurons responded when the reward was delivered, similar to results reported previously. The dynamics of population activity in anterior insula also showed strong signals related to knowing when a reward is coming. The most parsimonious explanation is that this activity codes for a type of expected outcome, where the expectation encompasses both certain and uncertain rewards.

Download Full-text

Comparison of behavioral performance between reward schedule task with and without decision-making in rhesus monkey

Neuroscience Research ◽

10.1016/j.neures.2010.07.1277 ◽

2010 ◽

Vol 68 ◽

pp. e287

Author(s):

Tsuyoshi Setogawa ◽

Takashi Mizuhiki ◽

Kiyonori Inaba ◽

Munetaka Shidara

Keyword(s):

Decision Making ◽

Rhesus Monkey ◽

Behavioral Performance ◽

Reward Schedule ◽

Schedule Task

Download Full-text

The behavioral response of the monkey performing a decision-making task of choosing reward schedule and amount

Neuroscience Research ◽

10.1016/j.neures.2009.09.1031 ◽

2009 ◽

Vol 65 ◽

pp. S191

Author(s):

Takashi Mizuhiki ◽

Kiyonori Inaba ◽

Kanako Yaguchi ◽

Munetaka Shidara

Keyword(s):

Decision Making ◽

Behavioral Response ◽

Reward Schedule

Download Full-text

Single neurons in monkey dorsal raphe nucleus responded in multi-trial reward schedule task with different reward amount

Neuroscience Research ◽

10.1016/j.neures.2009.09.1029 ◽

2009 ◽

Vol 65 ◽

pp. S190

Author(s):

Kiyonori Inaba ◽

Takashi Mizuhiki ◽

Koji Toda ◽

Shigeru Ozaki ◽

Kanako Yaguchi ◽

...

Keyword(s):

Dorsal Raphe Nucleus ◽

Dorsal Raphe ◽

Raphe Nucleus ◽

Single Neurons ◽

Reward Schedule ◽

Schedule Task ◽

Reward Amount ◽

Dorsal Raphé

Download Full-text

Activity of Primate Orbitofrontal and Dorsolateral Prefrontal Neurons: Effect of Reward Schedule on Task-related Activity

Journal of Cognitive Neuroscience ◽

10.1162/jocn.2008.20047 ◽

2008 ◽

Vol 20 (4) ◽

pp. 563-579 ◽

Cited By ~ 17

Author(s):

Satoe Ichihara-Takeda ◽

Shintaro Funahashi

Keyword(s):

Spatial Information ◽

Delayed Response ◽

Reward Schedule ◽

Related Activity ◽

Motivational State ◽

Correct Trial ◽

Reward Delivery ◽

Cognitive Operations ◽

Dorsolateral Prefrontal ◽

Reward Trial

Recent studies show that task-related activity in the dorsolateral prefrontal cortex (DLPFC) is modulated by the quality and quantity of the reward, suggesting that the subject's motivational state affects cognitive operations in the DLPFC. The orbito-frontal cortex (OFC) is a possible source of motivational inputs to the DLPFC. However, it is not well known whether these two areas exhibit similar motivational effects on task-related activity. We compared motivational effects on task-related activity in these areas while a monkey performed an oculomotor delayed-response (ODR) task under two reward schedules. In the ODR-1 schedule, reward was given only after the successful completion of four consecutive trials, whereas in the ODR-2 schedule, reward was given after every correct trial. Task-related activities in both areas showed spatial selectivity. The spatial characteristics of task-related activity remained constant in both schedules. Task-related activity in both areas, especially delay-period activity, was also affected by the reward schedule and the magnitude of the activity gradually increased depending on the proximity of the reward trial in the ODR-1 schedule. More task-related OFC activities were affected by reward schedules, whereas more task-related DLPFC activities were affected by spatial factors and reward schedules. These results indicate that the OFC plays a role in monitoring the proximity of the reward trial and detecting reward delivery, whereas the DLPFC plays a role in performing cognitive operations and integrating cognitive and motivational information. These results also indicate that spatial information and the animal's motivational state independently affect neuronal activity in both areas.

Download Full-text

reward schedule
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Noradrenergic regulation of Win-Stay/Lose-Shift policy and choice determinism in a two-armed bandit task

Blockchain without Waste: Proof-of-Stake

Learning fast and slow: deviations from the matching law can reflect an optimal strategy under uncertainty

Diversity of Adjustments to Reward Downshifts in Vertebrates

Neurons in Monkey Dorsal Raphe Nucleus Code Beginning and Progress of Step-by-Step Schedule, Reward Expectation, and Amount of Reward Outcome in the Reward Schedule Task

Encoding of reward expectation by monkey anterior insular neurons

Comparison of behavioral performance between reward schedule task with and without decision-making in rhesus monkey

The behavioral response of the monkey performing a decision-making task of choosing reward schedule and amount

Single neurons in monkey dorsal raphe nucleus responded in multi-trial reward schedule task with different reward amount

Activity of Primate Orbitofrontal and Dorsolateral Prefrontal Neurons: Effect of Reward Schedule on Task-related Activity

Export Citation Format

reward scheduleRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Noradrenergic regulation of Win-Stay/Lose-Shift policy and choice determinism in a two-armed bandit task

Blockchain without Waste: Proof-of-Stake

Learning fast and slow: deviations from the matching law can reflect an optimal strategy under uncertainty

Diversity of Adjustments to Reward Downshifts in Vertebrates

Neurons in Monkey Dorsal Raphe Nucleus Code Beginning and Progress of Step-by-Step Schedule, Reward Expectation, and Amount of Reward Outcome in the Reward Schedule Task

Encoding of reward expectation by monkey anterior insular neurons

Comparison of behavioral performance between reward schedule task with and without decision-making in rhesus monkey

The behavioral response of the monkey performing a decision-making task of choosing reward schedule and amount

Single neurons in monkey dorsal raphe nucleus responded in multi-trial reward schedule task with different reward amount

Activity of Primate Orbitofrontal and Dorsolateral Prefrontal Neurons: Effect of Reward Schedule on Task-related Activity

reward schedule
Recently Published Documents