Understanding Human Learning: Estimating Parameters of Temporal Difference Learning

2020 ◽  
Author(s):  
Xiao Yang

Previous work in psychology has demonstrated how to use the Rescorla-Wagner model to estimate learning parameters from experimental design data (e.g., Iowa gambling test). Yet, the effect of actions on states often occur with a temporal delay in naturalistic settings, which the Rescorla-Wagner model does not model. To explain how humans learn about the time-delayed consequence of their actions requires a temporal difference (TD) learning model, like the state-action-reward-state-action model (SARSA), to incorporate the process of how humans learn about the temporal relations between state and action. This paper proposes a SARSA-based algorithm to estimate the learning rate and discount factor in such temporal difference learning processes, in order to quantify human learning process from behavior sequence data in naturalistic settings (e.g., experience sampling). Specifically, this paper uses a grid search over possible parameter space of learning rate and discount factor to find the best fitting values. To evaluate this estimation algorithm, simulations are conducted to provide evidence that the estimation algorithm can accurately recover the TD learning parameters. Then this estimation method is applied on an empirical dataset of exercise and stress. This new estimation method of TD learning parameters can open opportunities for important health-related empirical applications, including explaining individual-level TD learning, specifically, how human change their behaviors to achieve health-related goals. Additionally, the estimated learning parameters can also be used to design just-in-time adaptive personalized intervention (control) to induce behavior change.

Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4403
Author(s):  
Ji Woong Paik ◽  
Joon-Ho Lee ◽  
Wooyoung Hong

An enhanced smoothed l0-norm algorithm for the passive phased array system, which uses the covariance matrix of the received signal, is proposed in this paper. The SL0 (smoothed l0-norm) algorithm is a fast compressive-sensing-based DOA (direction-of-arrival) estimation algorithm that uses a single snapshot from the received signal. In the conventional SL0 algorithm, there are limitations in the resolution and the DOA estimation performance, since a single sample is used. If multiple snapshots are used, the conventional SL0 algorithm can improve performance in terms of the DOA estimation. In this paper, a covariance-fitting-based SL0 algorithm is proposed to further reduce the number of optimization variables when using multiple snapshots of the received signal. A cost function and a new null-space projection term of the sparse recovery for the proposed scheme are presented. In order to verify the performance of the proposed algorithm, we present the simulation results and the experimental results based on the measured data.


Author(s):  
Tingting Yin ◽  
Zhong Yang ◽  
Youlong Wu ◽  
Fangxiu Jia

The high-precision roll attitude estimation of the decoupled canards relative to the projectile body based on the bipolar hall-effect sensors is proposed. Firstly, the basis engineering positioning method based on the edge detection is introduced. Secondly, the simplified dynamic relative roll model is established where the feature parameters are identified by fuzzy algorithms, while the high-precision real-time relative roll attitude estimation algorithm is proposed. Finally, the trajectory simulations and grounded experiments have been conducted to evaluate the advantages of the proposed method. The positioning error is compared with the engineering solution method, and it is proved that the proposed estimation method has the advantages of the high accuracy and good real-time performance.


2001 ◽  
Vol 13 (10) ◽  
pp. 2221-2237 ◽  
Author(s):  
Rajesh P. N. Rao ◽  
Terrence J. Sejnowski

A spike-timing-dependent Hebbian mechanism governs the plasticity of recurrent excitatory synapses in the neocortex: synapses that are activated a few milliseconds before a postsynaptic spike are potentiated, while those that are activated a few milliseconds after are depressed. We show that such a mechanism can implement a form of temporal difference learning for prediction of input sequences. Using a biophysical model of a cortical neuron, we show that a temporal difference rule used in conjunction with dendritic backpropagating action potentials reproduces the temporally asymmetric window of Hebbian plasticity observed physiologically. Furthermore, the size and shape of the window vary with the distance of the synapse from the soma. Using a simple example, we show how a spike-timing-based temporal difference learning rule can allow a network of neocortical neurons to predict an input a few milliseconds before the input's expected arrival.


Symmetry ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 1685 ◽  
Author(s):  
Chayoung Kim

Owing to the complexity involved in training an agent in a real-time environment, e.g., using the Internet of Things (IoT), reinforcement learning (RL) using a deep neural network, i.e., deep reinforcement learning (DRL) has been widely adopted on an online basis without prior knowledge and complicated reward functions. DRL can handle a symmetrical balance between bias and variance—this indicates that the RL agents are competently trained in real-world applications. The approach of the proposed model considers the combinations of basic RL algorithms with online and offline use based on the empirical balances of bias–variance. Therefore, we exploited the balance between the offline Monte Carlo (MC) technique and online temporal difference (TD) with on-policy (state-action–reward-state-action, Sarsa) and an off-policy (Q-learning) in terms of a DRL. The proposed balance of MC (offline) and TD (online) use, which is simple and applicable without a well-designed reward, is suitable for real-time online learning. We demonstrated that, for a simple control task, the balance between online and offline use without an on- and off-policy shows satisfactory results. However, in complex tasks, the results clearly indicate the effectiveness of the combined method in improving the convergence speed and performance in a deep Q-network.


Sign in / Sign up

Export Citation Format

Share Document