Understanding Human Learning: Estimating Parameters of Temporal Difference Learning

Mapping Intimacies ◽

10.31219/osf.io/ukshf ◽

2020 ◽

Author(s):

Xiao Yang

Keyword(s):

Estimation Method ◽

Estimation Algorithm ◽

Human Learning ◽

Learning Rate ◽

Discount Factor ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Wagner Model ◽

State Action ◽

Health Related

Previous work in psychology has demonstrated how to use the Rescorla-Wagner model to estimate learning parameters from experimental design data (e.g., Iowa gambling test). Yet, the effect of actions on states often occur with a temporal delay in naturalistic settings, which the Rescorla-Wagner model does not model. To explain how humans learn about the time-delayed consequence of their actions requires a temporal difference (TD) learning model, like the state-action-reward-state-action model (SARSA), to incorporate the process of how humans learn about the temporal relations between state and action. This paper proposes a SARSA-based algorithm to estimate the learning rate and discount factor in such temporal difference learning processes, in order to quantify human learning process from behavior sequence data in naturalistic settings (e.g., experience sampling). Specifically, this paper uses a grid search over possible parameter space of learning rate and discount factor to find the best fitting values. To evaluate this estimation algorithm, simulations are conducted to provide evidence that the estimation algorithm can accurately recover the TD learning parameters. Then this estimation method is applied on an empirical dataset of exercise and stress. This new estimation method of TD learning parameters can open opportunities for important health-related empirical applications, including explaining individual-level TD learning, specifically, how human change their behaviors to achieve health-related goals. Additionally, the estimated learning parameters can also be used to design just-in-time adaptive personalized intervention (control) to induce behavior change.

Download Full-text

An Enhanced Smoothed L0-Norm Direction of Arrival Estimation Method Using Covariance Matrix

Sensors ◽

10.3390/s21134403 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4403

Author(s):

Ji Woong Paik ◽

Joon-Ho Lee ◽

Wooyoung Hong

Keyword(s):

Covariance Matrix ◽

Phased Array ◽

Null Space ◽

Estimation Method ◽

Direction Of Arrival ◽

Estimation Algorithm ◽

Doa Estimation ◽

Direction Of Arrival Estimation ◽

Improve Performance ◽

Space Projection

An enhanced smoothed l0-norm algorithm for the passive phased array system, which uses the covariance matrix of the received signal, is proposed in this paper. The SL0 (smoothed l0-norm) algorithm is a fast compressive-sensing-based DOA (direction-of-arrival) estimation algorithm that uses a single snapshot from the received signal. In the conventional SL0 algorithm, there are limitations in the resolution and the DOA estimation performance, since a single sample is used. If multiple snapshots are used, the conventional SL0 algorithm can improve performance in terms of the DOA estimation. In this paper, a covariance-fitting-based SL0 algorithm is proposed to further reduce the number of optimization variables when using multiple snapshots of the received signal. A cost function and a new null-space projection term of the sparse recovery for the proposed scheme are presented. In order to verify the performance of the proposed algorithm, we present the simulation results and the experimental results based on the measured data.

Download Full-text

Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling

Machine Learning ◽

10.1007/s10994-020-05912-5 ◽

2021 ◽

Author(s):

L. A. Prashanth ◽

Nathaniel Korda ◽

Rémi Munos

Keyword(s):

Linear Function ◽

Function Approximation ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Uniform Sampling ◽

Linear Function Approximation ◽

Concentration Bounds ◽

Batch Data

Download Full-text

Gaussian Process Temporal-Difference Learning with Scalability and Worst-Case Performance Guarantees

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414667 ◽

2021 ◽

Author(s):

Qin Lu ◽

Georgios B. Giannakis

Keyword(s):

Gaussian Process ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Worst Case ◽

Performance Guarantees

Download Full-text

High-precision roll attitude estimation of decoupled canards relative to the projectile body using bipolar hall-effect sensors

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189718 ◽

2021 ◽

pp. 1-9

Author(s):

Tingting Yin ◽

Zhong Yang ◽

Youlong Wu ◽

Fangxiu Jia

Keyword(s):

Hall Effect ◽

Real Time ◽

High Precision ◽

Solution Method ◽

Estimation Method ◽

Estimation Algorithm ◽

Attitude Estimation ◽

Fuzzy Algorithms ◽

Positioning Method ◽

Hall Effect Sensors

The high-precision roll attitude estimation of the decoupled canards relative to the projectile body based on the bipolar hall-effect sensors is proposed. Firstly, the basis engineering positioning method based on the edge detection is introduced. Secondly, the simplified dynamic relative roll model is established where the feature parameters are identified by fuzzy algorithms, while the high-precision real-time relative roll attitude estimation algorithm is proposed. Finally, the trajectory simulations and grounded experiments have been conducted to evaluate the advantages of the proposed method. The positioning error is compared with the engineering solution method, and it is proved that the proposed estimation method has the advantages of the high accuracy and good real-time performance.

Download Full-text

Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning

Neural Computation ◽

10.1162/089976601750541787 ◽

2001 ◽

Vol 13 (10) ◽

pp. 2221-2237 ◽

Cited By ~ 118

Author(s):

Rajesh P. N. Rao ◽

Terrence J. Sejnowski

Keyword(s):

Cortical Neuron ◽

Action Potentials ◽

Learning Rule ◽

Spike Timing ◽

Biophysical Model ◽

Excitatory Synapses ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Neocortical Neurons ◽

Hebbian Plasticity

A spike-timing-dependent Hebbian mechanism governs the plasticity of recurrent excitatory synapses in the neocortex: synapses that are activated a few milliseconds before a postsynaptic spike are potentiated, while those that are activated a few milliseconds after are depressed. We show that such a mechanism can implement a form of temporal difference learning for prediction of input sequences. Using a biophysical model of a cortical neuron, we show that a temporal difference rule used in conjunction with dendritic backpropagating action potentials reproduces the temporally asymmetric window of Hebbian plasticity observed physiologically. Furthermore, the size and shape of the window vary with the distance of the synapse from the soma. Using a simple example, we show how a spike-timing-based temporal difference learning rule can allow a network of neocortical neurons to predict an input a few milliseconds before the input's expected arrival.

Download Full-text

Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach

SSRN Electronic Journal ◽

10.2139/ssrn.3905379 ◽

2021 ◽

Author(s):

Yanwei Jia ◽

Xunyu Zhou

Keyword(s):

Policy Evaluation ◽

Continuous Time ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Time And Space ◽

Martingale Approach

Download Full-text

Using temporal-difference learning for multi-agent bargaining

Electronic Commerce Research and Applications ◽

10.1016/j.elerap.2007.04.001 ◽

2008 ◽

Vol 7 (4) ◽

pp. 432-442 ◽

Cited By ~ 4

Author(s):

Shiu-li Huang ◽

Fu-ren Lin

Keyword(s):

Temporal Difference ◽

Temporal Difference Learning ◽

Multi Agent

Download Full-text

Double-State-Temporal Difference Learning for Resource Provisioning in Uncertain Fog Computing Environment

10.1109/iemcon53756.2021.9623085 ◽

2021 ◽

Author(s):

Bhargavi Krishna Murthy ◽

Sajjan G Shiva

Keyword(s):

Fog Computing ◽

Resource Provisioning ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Computing Environment

Download Full-text

On the Analysis of Parameter Convergence for Temporal Difference Learning of an Exemplar Balance Problem

Towards Autonomous Robotic Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-23232-9_49 ◽

2011 ◽

pp. 404-405 ◽

Cited By ~ 1

Author(s):

Martin Brown ◽

Onder Tutsoy

Keyword(s):

Temporal Difference ◽

Temporal Difference Learning ◽

Balance Problem

Download Full-text

Deep Reinforcement Learning by Balancing Offline Monte Carlo and Online Temporal Difference Use Based on Environment Experiences

Symmetry ◽

10.3390/sym12101685 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1685 ◽

Cited By ~ 1

Author(s):

Chayoung Kim

Keyword(s):

Monte Carlo ◽

Reinforcement Learning ◽

Real Time ◽

Temporal Difference ◽

Q Learning ◽

State Action ◽

Proposed Model ◽

Reward Functions ◽

And Performance ◽

The Internet Of Things

Owing to the complexity involved in training an agent in a real-time environment, e.g., using the Internet of Things (IoT), reinforcement learning (RL) using a deep neural network, i.e., deep reinforcement learning (DRL) has been widely adopted on an online basis without prior knowledge and complicated reward functions. DRL can handle a symmetrical balance between bias and variance—this indicates that the RL agents are competently trained in real-world applications. The approach of the proposed model considers the combinations of basic RL algorithms with online and offline use based on the empirical balances of bias–variance. Therefore, we exploited the balance between the offline Monte Carlo (MC) technique and online temporal difference (TD) with on-policy (state-action–reward-state-action, Sarsa) and an off-policy (Q-learning) in terms of a DRL. The proposed balance of MC (offline) and TD (online) use, which is simple and applicable without a well-designed reward, is suitable for real-time online learning. We demonstrated that, for a simple control task, the balance between online and offline use without an on- and off-policy shows satisfactory results. However, in complex tasks, the results clearly indicate the effectiveness of the combined method in improving the convergence speed and performance in a deep Q-network.

Download Full-text