Using Reinforcement Learning to Estimate Human Joint Moments via EMG Signals or Joint Kinematics: An Alternative Solution to Musculoskeletal-Based Biomechanics

Author(s):  
Wen Wu ◽  
Kate Saul ◽  
He (Helen) Huang

Abstract Reinforcement learning (RL) has potential to provide innovative solutions to existing challenges in estimating joint moments in motion analysis, such as kinematic or electromyography (EMG) noise and unknown model parameters. Here we explore feasibility of RL to assist joint moment estimation for biomechanical applications. Forearm and hand kinematics and forearm EMGs from 4 muscles during free finger and wrist movement were collected from six healthy subjects. Using the Proximal Policy Optimization approach, we trained and tested two types of RL agents that estimated joint moment based on measured kinematics or measured EMGs, respectively. To quantify the performance of RL agents, the estimated joint moment was used to drive a forward dynamic model for estimating kinematics, which were then compared with measured kinematics. The results demonstrated that both RL agents can accurately reproduce wrist and metacarpophalangeal joint motion. The correlation coefficients between estimated and measured kinematics, derived from the kinematics-driven agent and subject-specific EMG-driven agents, were 0.98±0.01 and 0.94±0.03 for the wrist, respectively, and were 0.95±0.02 and 0.84±0.06 for the metacarpophalangeal joint, respectively. In addition, a biomechanically reasonable joint moment-angle-EMG relationship (i.e. dependence of joint moment on joint angle and EMG) was predicted using only 15 seconds of collected data. In conclusion, this study serves as a proof of concept that an RL approach can assist in biomechanical analysis and human-machine interface applications by deriving joint moments from kinematic or EMG data.

2003 ◽  
Vol 03 (02) ◽  
pp. 169-186 ◽  
Author(s):  
Richard Heine ◽  
Kurt Manal ◽  
Thomas S. Buchanan

There has been considerable interest in estimating muscle forces and joint moments from EMG signals, but most approaches have not been very successful. This is largely because robust models of muscle activation dynamics, Hill-type muscle contraction dynamics, and musculoskeletal geometry are generally not included. Here we present a model which includes these sub-models and we determine which model parameters are most important. The models abilities to predict joint moments about the human elbow during time-varying isometric tasks were examined. Inputs to the models were EMGs from eight muscles. Joint moment was the output, which was compared with the measured moment. Models varied in complexity, having up to 59 adjustable parameters. It was found that a seven adjustable parameter model could adequately estimate time-varying joint moments without substantial sacrifice in performance. The key parameters that were fit for each subject were two global gain factors, a time delay term, a non-linear EMG-force term, two muscle activation terms, and a term for skewing the length-tension curve with muscle activation. This approach offers advantages over optimization-based methods for estimating individual muscle forces. Most importantly, it accounts for the way muscles are activated, which makes it potentially powerful to evaluate patients with pathologies.


Sensors ◽  
2020 ◽  
Vol 20 (5) ◽  
pp. 1359 ◽  
Author(s):  
Hyun-Kyo Lim ◽  
Ju-Bong Kim ◽  
Joo-Seong Heo ◽  
Youn-Hee Han

Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy on their own IoT devices of the same type but with slightly different dynamics. For such multiple IoT devices, there is no guarantee that an agent who interacts only with one IoT device and learns the optimal control policy will also control another IoT device well. Therefore, we may need to apply independent reinforcement learning to each IoT device individually, which requires a costly or time-consuming effort. To solve this problem, we propose a new federated reinforcement learning architecture where each agent working on its independent IoT device shares their learning experience (i.e., the gradient of loss function) with each other, and transfers a mature policy model parameters into other agents. They accelerate its learning process by using mature parameters. We incorporate the actor–critic proximal policy optimization (Actor–Critic PPO) algorithm into each agent in the proposed collaborative architecture and propose an efficient procedure for the gradient sharing and the model transfer. Using multiple rotary inverted pendulum devices interconnected via a network switch, we demonstrate that the proposed federated reinforcement learning scheme can effectively facilitate the learning process for multiple IoT devices and that the learning speed can be faster if more agents are involved.


2011 ◽  
Vol 2011 ◽  
pp. 1-12 ◽  
Author(s):  
Karim El-Laithy ◽  
Martin Bogdan

An integration of both the Hebbian-based and reinforcement learning (RL) rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.


2021 ◽  
Author(s):  
Srivatsan Krishnan ◽  
Behzad Boroujerdian ◽  
William Fu ◽  
Aleksandra Faust ◽  
Vijay Janapa Reddi

AbstractWe introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies’ performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to $$40\%$$ 40 % longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73% to 0.5%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute’s choice affects the aerial robot’s performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, Air Learning enables a broad class of deep RL research on UAVs. The source code is available at: https://github.com/harvard-edge/AirLearning.


2021 ◽  
Vol 11 (4) ◽  
pp. 1514 ◽  
Author(s):  
Quang-Duy Tran ◽  
Sang-Hoon Bae

To reduce the impact of congestion, it is necessary to improve our overall understanding of the influence of the autonomous vehicle. Recently, deep reinforcement learning has become an effective means of solving complex control tasks. Accordingly, we show an advanced deep reinforcement learning that investigates how the leading autonomous vehicles affect the urban network under a mixed-traffic environment. We also suggest a set of hyperparameters for achieving better performance. Firstly, we feed a set of hyperparameters into our deep reinforcement learning agents. Secondly, we investigate the leading autonomous vehicle experiment in the urban network with different autonomous vehicle penetration rates. Thirdly, the advantage of leading autonomous vehicles is evaluated using entire manual vehicle and leading manual vehicle experiments. Finally, the proximal policy optimization with a clipped objective is compared to the proximal policy optimization with an adaptive Kullback–Leibler penalty to verify the superiority of the proposed hyperparameter. We demonstrate that full automation traffic increased the average speed 1.27 times greater compared with the entire manual vehicle experiment. Our proposed method becomes significantly more effective at a higher autonomous vehicle penetration rate. Furthermore, the leading autonomous vehicles could help to mitigate traffic congestion.


2022 ◽  
pp. 1-12
Author(s):  
Shuailong Li ◽  
Wei Zhang ◽  
Huiwen Zhang ◽  
Xin Zhang ◽  
Yuquan Leng

Model-free reinforcement learning methods have successfully been applied to practical applications such as decision-making problems in Atari games. However, these methods have inherent shortcomings, such as a high variance and low sample efficiency. To improve the policy performance and sample efficiency of model-free reinforcement learning, we propose proximal policy optimization with model-based methods (PPOMM), a fusion method of both model-based and model-free reinforcement learning. PPOMM not only considers the information of past experience but also the prediction information of the future state. PPOMM adds the information of the next state to the objective function of the proximal policy optimization (PPO) algorithm through a model-based method. This method uses two components to optimize the policy: the error of PPO and the error of model-based reinforcement learning. We use the latter to optimize a latent transition model and predict the information of the next state. For most games, this method outperforms the state-of-the-art PPO algorithm when we evaluate across 49 Atari games in the Arcade Learning Environment (ALE). The experimental results show that PPOMM performs better or the same as the original algorithm in 33 games.


Processes ◽  
2018 ◽  
Vol 6 (8) ◽  
pp. 126 ◽  
Author(s):  
Lina Aboulmouna ◽  
Shakti Gupta ◽  
Mano Maurya ◽  
Frank DeVilbiss ◽  
Shankar Subramaniam ◽  
...  

The goal-oriented control policies of cybernetic models have been used to predict metabolic phenomena such as the behavior of gene knockout strains, complex substrate uptake patterns, and dynamic metabolic flux distributions. Cybernetic theory builds on the principle that metabolic regulation is driven towards attaining goals that correspond to an organism’s survival or displaying a specific phenotype in response to a stimulus. Here, we have modeled the prostaglandin (PG) metabolism in mouse bone marrow derived macrophage (BMDM) cells stimulated by Kdo2-Lipid A (KLA) and adenosine triphosphate (ATP), using cybernetic control variables. Prostaglandins are a well characterized set of inflammatory lipids derived from arachidonic acid. The transcriptomic and lipidomic data for prostaglandin biosynthesis and conversion were obtained from the LIPID MAPS database. The model parameters were estimated using a two-step hybrid optimization approach. A genetic algorithm was used to determine the population of near optimal parameter values, and a generalized constrained non-linear optimization employing a gradient search method was used to further refine the parameters. We validated our model by predicting an independent data set, the prostaglandin response of KLA primed ATP stimulated BMDM cells. We show that the cybernetic model captures the complex regulation of PG metabolism and provides a reliable description of PG formation.


Sign in / Sign up

Export Citation Format

Share Document