Understanding Human Learning: Estimating Parameters of Temporal Difference Learning
Previous work in psychology has demonstrated how to use the Rescorla-Wagner model to estimate learning parameters from experimental design data (e.g., Iowa gambling test). Yet, the effect of actions on states often occur with a temporal delay in naturalistic settings, which the Rescorla-Wagner model does not model. To explain how humans learn about the time-delayed consequence of their actions requires a temporal difference (TD) learning model, like the state-action-reward-state-action model (SARSA), to incorporate the process of how humans learn about the temporal relations between state and action. This paper proposes a SARSA-based algorithm to estimate the learning rate and discount factor in such temporal difference learning processes, in order to quantify human learning process from behavior sequence data in naturalistic settings (e.g., experience sampling). Specifically, this paper uses a grid search over possible parameter space of learning rate and discount factor to find the best fitting values. To evaluate this estimation algorithm, simulations are conducted to provide evidence that the estimation algorithm can accurately recover the TD learning parameters. Then this estimation method is applied on an empirical dataset of exercise and stress. This new estimation method of TD learning parameters can open opportunities for important health-related empirical applications, including explaining individual-level TD learning, specifically, how human change their behaviors to achieve health-related goals. Additionally, the estimated learning parameters can also be used to design just-in-time adaptive personalized intervention (control) to induce behavior change.