reward schedule
Recently Published Documents


TOTAL DOCUMENTS

41
(FIVE YEARS 0)

H-INDEX

13
(FIVE YEARS 0)

2020 ◽  
Author(s):  
Kyra Swanson ◽  
Bruno B. Averbeck ◽  
Mark Laubach

AbstractRecent studies have established that one-trial-back decision policies (Win-Stay/Lose-Shift) and measures of reinforcement learning (RL), e.g. learning rate, can explain how animals perform two-armed bandit tasks. In many published studies, outcomes reverse after one option is selected repeatedly (e.g. 8 selections in a row), and the primary measure of performance is the number of reversals completed. Performance and Win-Stay likelihood are confounded by using recent performance to drive reversals. An alternative design reverses outcomes across options over fixed blocks of trials. We used this blocked design and tested rats in a spatial two-armed bandit task. We analyzed performance using Win-Stay/Lose-Shift (WSLS) metrics and a RL algorithm. We found that WSLS policies remain stable with increasing reward uncertainty, while choice accuracy decreases. Within test sessions, learning rates increased as rats adapted their strategies over the first few reversals but inverse temperature remains stable. We found that muscimol inactivation of medial orbital cortex (mOFC) mediates task performance and negative feedback sensitivity. Finally, we examined the role of the adrenergic system in bandit performance, and found yohimbine (2 mg/kg) dramatically decreased sensitivity to positive feedback, leading to decreases in accuracy and inverse temperature. These effects are partially dependent on a2 adrenergic receptors in OFC. Our findings demonstrate a correspondence between reward schedule, WSLS policies and RL metrics in a task design that is free of the confound between Wins and reversals, and that the noradrenergic influence of mOFC on WSLS policy is dissociable from the regions general role in cognitive flexibility.


Author(s):  
Fahad Saleh

Abstract Permissionless blockchains require a protocol to generate consensus. Many prominent permissionless blockchains employ Proof-of-Work (PoW) for that purpose, but PoW possesses significant shortcomings. Various alternatives have been proposed. This paper provides the first formal economic model of the most famous alternative, Proof-of-Stake (PoS), and establishes conditions under which PoS generates consensus. A sufficiently modest reward schedule not only implies the existence of an equilibrium in which consensus obtains as soon as possible but also precludes a persistent forking equilibrium. The latter result arises because PoS, unlike PoW, requires that validators are also stakeholders.


2017 ◽  
Author(s):  
Kiyohito Iigaya ◽  
Yashar Ahmadian ◽  
Leo P. Sugrue ◽  
Greg S. Corrado ◽  
Yonatan Loewenstein ◽  
...  

AbstractBehavior which deviates from our normative expectations often appears irrational. A classic example concerns the question of how choice should be distributed among multiple alternatives. The so-called matching law predicts that the fraction of choices made to any option should match the fraction of total rewards earned from the option. This choice strategy can maximize reward in a stationary reward schedule. Empirically, however, behavior often deviates from this ideal. While such deviations have often been interpreted as reflecting ‘noisy’, suboptimal, decision-making, here we instead suggest that they reflect a strategy which is adaptive in nonstationary and uncertain environments. We analyze the results of a dynamic foraging task. Animals exhibited significant deviations from matching, and animals turned out to be able to collect more rewards when deviation was larger. We show that this behavior can be understood if one considers that animals had incomplete information about the environments dynamics. In particular, using computational models, we show that in such nonstationary environments, learning on both fast and slow timescales is beneficial. Learning on fast timescales means that an animal can react to sudden changes in the environment, though this inevitably introduces large fluctuations (variance) in value estimates. Concurrently, learning on slow timescales reduces the amplitude of these fluctuations at the price of introducing a bias that causes systematic deviations. We confirm this prediction in data – monkeys indeed solved the bias-variance tradeoff by combining learning on both fast and slow timescales. Our work suggests that multi-timescale learning could be a biologically plausible mechanism for optimizing decisions under uncertainty.


Author(s):  
Mauricio R. Papini

This review focuses on reward-schedule effects, a family of learning phenomena involving surprising devaluations in reward quality or quantity (as in incentive contrast), and reward omissions (as in appetitive extinction), as studied in three taxonomic groups of vertebrates: mammals, birds, and amphibians. The largest database of dependable data comes from research with mammals in general, and with rats in particular. These experiments show a variety of behavioral adjustments to situations involving reward downshifts. For example, rats show disruption of instrumental and consummatory behavior directed at a small reward after receiving a substantially larger reward (called successive negative contrast, SNC)—a reward-schedule effect. However, instrumental SNC does not seem to occur when animals work for sucrose solutions—a reversed reward-schedule effect. Similar modes of adjustment have been reported in analogous experiments with avian and amphibian species. A review of the evidence suggests that carry-over signals across successive trials can acquire control over behavior under massed practice, but emotional memory is required to account for reward-schedule effects observed under widely spaced practice. There is evidence for an emotional component to reward-schedule effects in mammals, but similar evidence for other vertebrates is scanty and inconsistent. Progress in the comparative analysis of reward-schedule effects will require the intense study of a set of selected species, in selected reward-downshift situations, and aiming at identifying underlying neural mechanisms.


2012 ◽  
Vol 107 (11) ◽  
pp. 2996-3007 ◽  
Author(s):  
Takashi Mizuhiki ◽  
Barry J. Richmond ◽  
Munetaka Shidara

The insula, a cortical brain region that is known to encode information about autonomic, visceral, and olfactory functions, has recently been shown to encode information during reward-seeking tasks in both single neuronal recording and functional magnetic resonance imaging studies. To examine the reward-related activation, we recorded from 170 single neurons in anterior insula of 2 monkeys during a multitrial reward schedule task, where the monkeys had to complete a schedule of 1, 2, 3, or 4 trials to earn a reward. In one block of trials a visual cue indicated whether a reward would or would not be delivered in the current trial after the monkey successfully detected that a red spot turned green, and in other blocks the visual cue was random with respect to reward delivery. Over one-quarter of 131 responsive neurons were activated when the current trial would (certain or uncertain) be rewarded if performed correctly. These same neurons failed to respond in trials that were certain, as indicated by the cue, to be unrewarded. Another group of neurons responded when the reward was delivered, similar to results reported previously. The dynamics of population activity in anterior insula also showed strong signals related to knowing when a reward is coming. The most parsimonious explanation is that this activity codes for a type of expected outcome, where the expectation encompasses both certain and uncertain rewards.


2010 ◽  
Vol 68 ◽  
pp. e287
Author(s):  
Tsuyoshi Setogawa ◽  
Takashi Mizuhiki ◽  
Kiyonori Inaba ◽  
Munetaka Shidara

2009 ◽  
Vol 65 ◽  
pp. S191
Author(s):  
Takashi Mizuhiki ◽  
Kiyonori Inaba ◽  
Kanako Yaguchi ◽  
Munetaka Shidara

2009 ◽  
Vol 65 ◽  
pp. S190
Author(s):  
Kiyonori Inaba ◽  
Takashi Mizuhiki ◽  
Koji Toda ◽  
Shigeru Ozaki ◽  
Kanako Yaguchi ◽  
...  

2008 ◽  
Vol 20 (4) ◽  
pp. 563-579 ◽  
Author(s):  
Satoe Ichihara-Takeda ◽  
Shintaro Funahashi

Recent studies show that task-related activity in the dorsolateral prefrontal cortex (DLPFC) is modulated by the quality and quantity of the reward, suggesting that the subject's motivational state affects cognitive operations in the DLPFC. The orbito-frontal cortex (OFC) is a possible source of motivational inputs to the DLPFC. However, it is not well known whether these two areas exhibit similar motivational effects on task-related activity. We compared motivational effects on task-related activity in these areas while a monkey performed an oculomotor delayed-response (ODR) task under two reward schedules. In the ODR-1 schedule, reward was given only after the successful completion of four consecutive trials, whereas in the ODR-2 schedule, reward was given after every correct trial. Task-related activities in both areas showed spatial selectivity. The spatial characteristics of task-related activity remained constant in both schedules. Task-related activity in both areas, especially delay-period activity, was also affected by the reward schedule and the magnitude of the activity gradually increased depending on the proximity of the reward trial in the ODR-1 schedule. More task-related OFC activities were affected by reward schedules, whereas more task-related DLPFC activities were affected by spatial factors and reward schedules. These results indicate that the OFC plays a role in monitoring the proximity of the reward trial and detecting reward delivery, whereas the DLPFC plays a role in performing cognitive operations and integrating cognitive and motivational information. These results also indicate that spatial information and the animal's motivational state independently affect neuronal activity in both areas.


Sign in / Sign up

Export Citation Format

Share Document