Analysis of the Impact of Interpolation Methods of Missing RR-intervals Caused by Motion Artifacts on HRV Features Estimations

Davide Morelli; Alessio Rossi; Massimo Cairo; David A. Clifton

doi:10.3390/s19143163

Analysis of the Impact of Interpolation Methods of Missing RR-intervals Caused by Motion Artifacts on HRV Features Estimations

Sensors ◽

10.3390/s19143163 ◽

2019 ◽

Vol 19 (14) ◽

pp. 3163 ◽

Cited By ~ 8

Author(s):

Davide Morelli ◽

Alessio Rossi ◽

Massimo Cairo ◽

David A. Clifton

Keyword(s):

Heart Rate ◽

Missing Data ◽

Missing Values ◽

Time Windows ◽

Motion Artifacts ◽

Linear Quadratic ◽

Optimal Method ◽

Rr Intervals ◽

Interpolation Methods ◽

The Impact

Wearable physiological monitors have become increasingly popular, often worn during people’s daily life, collecting data 24 hours a day, 7 days a week. In the last decade, these devices have attracted the attention of the scientific community as they allow us to automatically extract information about user physiology (e.g., heart rate, sleep quality and physical activity) enabling inference on their health. However, the biggest issue about the data recorded by wearable devices is the missing values due to motion and mechanical artifacts induced by external stimuli during data acquisition. This missing data could negatively affect the assessment of heart rate (HR) response and estimation of heart rate variability (HRV), that could in turn provide misleading insights concerning the health status of the individual. In this study, we focus on healthy subjects with normal heart activity and investigate the effects of missing variation of the timing between beats (RR-intervals) caused by motion artifacts on HRV features estimation by randomly introducing missing values within a five min time windows of RR-intervals obtained from the nsr2db PhysioNet dataset by using Gilbert burst method. We then evaluate several strategies for estimating HRV in the presence of missing values by interpolating periods of missing values, covering the range of techniques often deployed in the literature, via linear, quadratic, cubic, and cubic spline functions. We thereby compare the HRV features obtained by handling missing data in RR-interval time series against HRV features obtained from the same data without missing values. Finally, we assess the difference between the use of interpolation methods on time (i.e., the timestamp when the heartbeats happen) and on duration (i.e., the duration of the heartbeats), in order to identify the best methodology to handle the missing RR-intervals. The main novel finding of this study is that the interpolation of missing data on time produces more reliable HRV estimations when compared to interpolation on duration. Hence, we can conclude that interpolation on duration modifies the power spectrum of the RR signal, negatively affecting the estimation of the HRV features as the amount of missing values increases. We can conclude that interpolation in time is the optimal method among those considered for handling data with large amounts of missing values, such as data from wearable sensors.

Download Full-text

Impact of strong El Niño events on river discharge in South America

10.5194/egusphere-egu21-10383 ◽

2021 ◽

Author(s):

Markus Deppner ◽

Bedartha Goswami

Keyword(s):

Machine Learning ◽

Missing Data ◽

South America ◽

River Discharge ◽

Amazon Basin ◽

Missing Values ◽

Southern Oscillation ◽

Enso Events ◽

Streamflow Data ◽

The Impact

<p>The impact of the El Ni&#241;o Southern Oscillation (ENSO) on rivers are well known, but most existing studies involving streamflow data are severely limited by data coverage. Time series of gauging stations fade in and out over time, which makes hydrological large scale and long time analysis or studies of rarely occurring extreme events challenging. Here, we use a machine learning approach to infer missing streamflow data based on temporal correlations of stations with missing values to others with data. By using 346 stations, from the &#8220;Global Streamflow Indices and Metadata archive&#8221; (GSIM), that initially cover the 40 year timespan in conjunction with Gaussian processes we were able to extend our data by estimating missing data for an additional 646 stations, allowing us to include a total of 992 stations. We then investigate the impact of the 6 strongest El Ni&#241;o (EN) events on rivers in South America between 1960 and 2000. Our analysis shows a strong correlation between ENSO events and extreme river dynamics in the southeast of Brazil, Carribean South America and parts of the Amazon basin. Furthermore we see a peak in the number of stations showing maximum river discharge all over Brazil during the EN of 1982/83 which has been linked to severe floods in the east of Brazil, parts of Uruguay and Paraguay. However EN events in other years with similar intensity did not evoke floods with such magnitude and therefore the additional drivers of the 1982/83&#160; floods need further investigation. By using machine learning methods to infer data for gauging stations with missing data we were able to extend our data by almost three-fold, revealing a possible heavier and spatially larger impact of the 1982/83 EN on South America's hydrology than indicated in literature.</p>

Download Full-text

Real-Time Quality Index to Control Data Loss in Real-Life Cardiac Monitoring Applications

Sensors ◽

10.3390/s21165357 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5357

Author(s):

Gaël Vila ◽

Christelle Godin ◽

Sylvie Charbonnier ◽

Aurélie Campagne

Keyword(s):

Heart Rate ◽

Quality Index ◽

Real Life ◽

Motion Artifacts ◽

Cardiac Monitoring ◽

Data Loss ◽

Median Error ◽

Monitoring Applications ◽

The Impact ◽

Frequency Power

Wearable cardiac sensors pave the way for advanced cardiac monitoring applications based on heart rate variability (HRV). In real-life settings, heart rate (HR) measurements are subject to motion artifacts that may lead to frequent data loss (missing samples in the HR signal), especially for commercial devices based on photoplethysmography (PPG). The current study had two main goals: (i) to provide a white-box quality index that estimates the amount of missing samples in any piece of HR signal; and (ii) to quantify the impact of data loss on feature extraction in a PPG-based HR signal. This was done by comparing real-life recordings from commercial sensors featuring both PPG (Empatica E4) and ECG (Zephyr BioHarness 3). After an outlier rejection process, our quality index was used to isolate portions of ECG-based HR signals that could be used as benchmark, to validate the output of Empatica E4 at the signal level and at the feature level. Our results showed high accuracy in estimating the mean HR (median error: 3.2%), poor accuracy for short-term HRV features (e.g., median error: 64% for high-frequency power), and mild accuracy for longer-term HRV features (e.g., median error: 25% for low-frequency power). These levels of errors could be reduced by using our quality index to identify time windows with few or no data loss (median errors: 0.0%, 27%, and 6.4% respectively, when no sample was missing). This quality index should be useful in future work to extract reliable cardiac features in real-life measurements, or to conduct a field validation study on wearable cardiac sensors.

Download Full-text

Influence of a 100-mile ultramarathon on the heart rate and the heart rate variability

10.21203/rs.3.rs-19496/v1 ◽

2020 ◽

Author(s):

Simone Schrieber ◽

Christian Paech ◽

Jan Wüstenfeld ◽

Ingo Dähnert ◽

Bernd Wolfarth ◽

...

Keyword(s):

Body Mass Index ◽

Heart Rate ◽

Heart Rate Variability ◽

Sympathetic Activity ◽

Baseline Heart Rate ◽

Significant Drop ◽

Rr Intervals ◽

Suitable Parameter ◽

The Impact

Abstract INTRODUCTION: The aim of this study was to investigate the impact of an ultra-marathon (UM) with a distance of 100 miles on heart rate (HR) and heart rate variability (HRV). METHODS: Altogether, 28 runners (25 men and 3 women) received a 24-hour long-term ECG one week before the UM (U1), immediately after (U2) and after a week of recovery (U3). The influence of age, body mass index (BMI), HR and HRV on the run time as well as recovery were investigated. RESULTS: A rise in the baseline heart rate accompanied by a significant drop in SDNN values (the standard deviation of all normal RR intervals) was found. Except for the age of the runners, BMI, HF and HRV did not predict the competition time. Full return of HRV to the athlete’s individual baseline did not occur within one week. There were no significant differences between finishers and non-finishers in analyzed parameters. CONCLUSION: The present results show that a 100-mile run leads to an increase in sympathetic activity and thus to an increase in heart rate and a decrease in HRV. In addition, HRV seems to be a suitable parameter to evaluate full recovery after a 100-mile run.

Download Full-text

Influence of a 100-mile ultramarathon on heart rate and heart rate variability

BMJ Open Sport & Exercise Medicine ◽

10.1136/bmjsem-2020-001005 ◽

2021 ◽

Vol 7 (2) ◽

pp. e001005

Author(s):

Christian Paech ◽

Simone Schrieber ◽

Ingo Daehnert ◽

Paul Jürgen Schmidt-Hellinger ◽

Bernd Wolfarth ◽

...

Keyword(s):

Body Mass Index ◽

Heart Rate ◽

Heart Rate Variability ◽

Sympathetic Activity ◽

Ecg Monitoring ◽

Significant Drop ◽

Rr Intervals ◽

Suitable Parameter ◽

The Status ◽

The Impact

AimsThis study aimed to investigate the impact of an ultramarathon (UM) with a distance of 100 miles on heart rate (HR) and heart rate variability (HRV).Methods28 runners (25 men and 3 women) underwent 24-hour Holter ECG monitoring 1 week before the UM, immediately after the UM and after a week of recovery. The influence of age, body mass index (BMI), HR and HRV on the run time and recovery was investigated.ResultsA rise in the baseline HR (18.98%) immediately after the run accompanied by a significant drop in the SD of all normal RR intervals (7.12%) 1 week after. Except for the runners’ age, BMI, HR and HRV showed no influence on the competition time. Full return of HRV to the athletes’ baseline did not occur within 1 week. There were no significant differences between finishers and non-finishers in the analysed parameters.ConclusionThe present results show that a 100-mile run leads to an increase in sympathetic activity and thus to an increase in HR and a decrease in HRV. Also, HRV might be a suitable parameter to evaluate the state of recovery after a 100-mile run but does not help to quantify the status of recovery, as the damage to the tendomuscular system primarily characterises this after completing a UM.

Download Full-text

Comparison of pediatric scoring systems for mortality in septic patients and the impact of missing information on their predictive power: a retrospective analysis

PeerJ ◽

10.7717/peerj.9993 ◽

2020 ◽

Vol 8 ◽

pp. e9993

Author(s):

Christian Niederwanger ◽

Thomas Varga ◽

Tobias Hell ◽

Daniel Stuerzel ◽

Jennifer Prem ◽

...

Keyword(s):

Missing Data ◽

Missing Values ◽

Scoring Systems ◽

Prognostic Scores ◽

Specific Patient ◽

Icu Admission ◽

Disease Survey ◽

Patient Prognosis ◽

Pediatric Sepsis ◽

The Impact

Background Scores can assess the severity and course of disease and predict outcome in an objective manner. This information is needed for proper risk assessment and stratification. Furthermore, scoring systems support optimal patient care, resource management and are gaining in importance in terms of artificial intelligence. Objective This study evaluated and compared the prognostic ability of various common pediatric scoring systems (PRISM, PRISM III, PRISM IV, PIM, PIM2, PIM3, PELOD, PELOD 2) in order to determine which is the most applicable score for pediatric sepsis patients in terms of timing of disease survey and insensitivity to missing data. Methods We retrospectively examined data from 398 patients under 18 years of age, who were diagnosed with sepsis. Scores were assessed at ICU admission and re-evaluated on the day of peak C-reactive protein. The scores were compared for their ability to predict mortality in this specific patient population and for their impairment due to missing data. Results PIM (AUC 0.76 (0.68–0.76)), PIM2 (AUC 0.78 (0.72–0.78)) and PIM3 (AUC 0.76 (0.68–0.76)) scores together with PRSIM III (AUC 0.75 (0.68–0.75)) and PELOD 2 (AUC 0.75 (0.66–0.75)) are the most suitable scores for determining patient prognosis at ICU admission. Once sepsis is pronounced, PELOD 2 (AUC 0.84 (0.77–0.91)) and PRISM IV (AUC 0.8 (0.72–0.88)) become significantly better in their performance and count among the best prognostic scores for use at this time together with PRISM III (AUC 0.81 (0.73–0.89)). PELOD 2 is good for monitoring and, like the PIM scores, is also largely insensitive to missing values. Conclusion Overall, PIM scores show comparatively good performance, are stable as far as timing of the disease survey is concerned, and they are also relatively stable in terms of missing parameters. PELOD 2 is best suitable for monitoring clinical course.

Download Full-text

Missing data: the impact of what is not there

Acta Endocrinologica ◽

10.1530/eje-20-0732 ◽

2020 ◽

Vol 183 (4) ◽

pp. E7-E9

Author(s):

Rolf H H Groenwold ◽

Olaf M Dekkers

Keyword(s):

Missing Data ◽

Clinical Research ◽

Missing Values ◽

The Impact

The validity of clinical research is potentially threatened by missing data. Any variable measured in a study can have missing values, including the exposure, the outcome, and confounders. When missing values are ignored in the analysis, only those subjects with complete records will be included in the analysis. This may lead to biased results and loss of power. We explain why missing data may lead to bias and discuss a commonly used classification of missing data.

Download Full-text

Using the CES-D scale in a large cohort study and dealing with missing data: Application to the French E3N cohort

European Psychiatry ◽

10.1016/s0924-9338(11)72279-9 ◽

2011 ◽

Vol 26 (S2) ◽

pp. 572-572

Author(s):

N. Resseguier ◽

H. Verdoux ◽

F. Clavel-Chapelon ◽

X. Paoletti

Keyword(s):

Sensitivity Analysis ◽

Missing Data ◽

Multiple Imputation ◽

Missing Values ◽

Large Population ◽

Missing At Random ◽

Population Based ◽

Missing Value ◽

Perform Sensitivity Analysis ◽

The Impact

IntroductionThe CES-D scale is commonly used to assess depressive symptoms (DS) in large population-based studies. Missing values in items of the scale may create biases.ObjectivesTo explore reasons for not completing items of the CES-D scale and to perform sensitivity analysis of the prevalence of DS to assess the impact of different missing data hypotheses.Methods71412 women included in the French E3N cohort returned in 2005 a questionnaire containing the CES-D scale. 45% presented at least one missing value in the scale. An interview study was carried out on a random sample of 204 participants to examine the different hypotheses for the missing value mechanism. The prevalence of DS was estimated according to different methods for handling missing values: complete cases analysis, single imputation, multiple imputation under MAR (missing at random) and MNAR (missing not at random) assumptions.ResultsThe interviews showed that participants were not embarrassed to fill in questions about DS. Potential reasons of nonresponse were identified. MAR and MNAR hypotheses remained plausible and were explored.Among complete responders, the prevalence of DS was 26.1%. After multiple imputation under MAR assumption, it was 28.6%, 29.8% and 31.7% among women presenting up to 4, to 10 and to 20 missing values, respectively. The estimates were robust after applying various scenarios of MNAR data for the sensitivity analysis.ConclusionsThe CES-D scale can easily be used to assess DS in large cohorts. Multiple imputation under MAR assumption allows to reliably handle missing values.

Download Full-text

Error Estimation of Ultra-Short Heart Rate Variability Parameters: Effect of Missing Data Caused by Motion Artifacts

Sensors ◽

10.3390/s20247122 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7122

Author(s):

Alessio Rossi ◽

Dino Pedreschi ◽

David A. Clifton ◽

Davide Morelli

Keyword(s):

Heart Rate ◽

Heart Rate Variability ◽

Missing Values ◽

Mean Squared Error ◽

Time Window ◽

Small Error ◽

Motion Artifacts ◽

Window Length ◽

Short Time Window ◽

Short Time

Application of ultra–short Heart Rate Variability (HRV) is desirable in order to increase the applicability of HRV features to wrist-worn wearable devices equipped with heart rate sensors that are nowadays becoming more and more popular in people’s daily life. This study is focused in particular on the the two most used HRV parameters, i.e., the standard deviation of inter-beat intervals (SDNN) and the root Mean Squared error of successive inter-beat intervals differences (rMSSD). The huge problem of extracting these HRV parameters from wrist-worn devices is that their data are affected by the motion artifacts. For this reason, estimating the error caused by this huge quantity of missing values is fundamental to obtain reliable HRV parameters from these devices. To this aim, we simulate missing values induced by motion artifacts (from 0 to 70%) in an ultra-short time window (i.e., from 4 min to 30 s) by the random walk Gilbert burst model in 22 young healthy subjects. In addition, 30 s and 2 min ultra-short time windows are required to estimate rMSSD and SDNN, respectively. Moreover, due to the fact that ultra-short time window does not permit assessing very low frequencies, and the SDNN is highly affected by these frequencies, the bias for estimating SDNN continues to increase as the time window length decreases. On the contrary, a small error is detected in rMSSD up to 30 s due to the fact that it is highly affected by high frequencies which are possible to be evaluated even if the time window length decreases. Finally, the missing values have a small effect on rMSSD and SDNN estimation. As a matter of fact, the HRV parameter errors increase slightly as the percentage of missing values increase.

Download Full-text

Influence of a 100-mile Ultramarathon on Heart Rate and Heart Rate Variability

10.21203/rs.3.rs-19496/v2 ◽

2020 ◽

Author(s):

Christian Paech ◽

Simone Schrieber ◽

Ingo Dähnert ◽

Paul Schmidt- Hellinger ◽

Bernd Wolfarth ◽

...

Keyword(s):

Body Mass Index ◽

Heart Rate ◽

Heart Rate Variability ◽

Sympathetic Activity ◽

Baseline Heart Rate ◽

Ecg Monitoring ◽

Significant Drop ◽

Rr Intervals ◽

Suitable Parameter ◽

The Impact

Abstract BACKGROUND: The aim of this study was to investigate the impact of an ultra-marathon (UM) with a distance of 100 miles on heart rate (HR) and heart rate variability (HRV). METHODS: 28 runners (25 males and 3 females) underwent 24-hour Holter ECG monitoring one week before the UM (U1), immediately after (U2) the UM and after a week of recovery (U3). The influence of age, body mass index (BMI), HR and HRV on the run time as well as recovery were investigated. RESULTS: A rise in the baseline heart rate (18.98%) immediately after the run accompanied by a significant drop in the standard deviation of all normal RR intervals (SDNN) (7.12%) one week after. Except for the age of the runners, BMI, HR and HRV showed no influence on the competition time. Full return of HRV to the athletes´ individual baseline did not occur within one week. There were no significant differences between finishers and non-finishers in analysed parameters.CONCLUSION: The present results show that a 100-mile run leads to an increase in sympathetic activity and thus to an increase in heart rate and a decrease in HRV. In addition, HRV might be a suitable parameter to evaluate full recovery after a 100-mile run.

Download Full-text

Imputing Biomarker Status from RWE Datasets—A Comparative Study

Journal of Personalized Medicine ◽

10.3390/jpm11121356 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1356

Author(s):

Carlos Traynor ◽

Tarjinder Sahota ◽

Helen Tomkinson ◽

Ignacio Gonzalez-Garcia ◽

Neil Evans ◽

...

Keyword(s):

Missing Data ◽

Missing Values ◽

Synthetic Data ◽

Generative Adversarial Networks ◽

Adversarial Networks ◽

Predictive Mean Matching ◽

Real World Evidence ◽

Expectation Maximisation ◽

Incomplete Datasets ◽

The Impact

Missing data is a universal problem in analysing Real-World Evidence (RWE) datasets. In RWE datasets, there is a need to understand which features best correlate with clinical outcomes. In this context, the missing status of several biomarkers may appear as gaps in the dataset that hide meaningful values for analysis. Imputation methods are general strategies that replace missing values with plausible values. Using the Flatiron NSCLC dataset, including more than 35,000 subjects, we compare the imputation performance of six such methods on missing data: predictive mean matching, expectation-maximisation, factorial analysis, random forest, generative adversarial networks and multivariate imputations with tabular networks. We also conduct extensive synthetic data experiments with structural causal models. Statistical learning from incomplete datasets should select an appropriate imputation algorithm accounting for the nature of missingness, the impact of missing data, and the distribution shift induced by the imputation algorithm. For our synthetic data experiments, tabular networks had the best overall performance. Methods using neural networks are promising for complex datasets with non-linearities. However, conventional methods such as predictive mean matching work well for the Flatiron NSCLC biomarker dataset.

Download Full-text