scholarly journals Deep Learning Approach for Imputation of Missing Values in Actigraphy Data: Algorithm Development Study (Preprint)

2019 ◽  
Author(s):  
Jong-Hwan Jang ◽  
Junggu Choi ◽  
Hyun Woong Roh ◽  
Sang Joon Son ◽  
Chang Hyung Hong ◽  
...  

BACKGROUND Data collected by an actigraphy device worn on the wrist or waist can provide objective measurements for studies related to physical activity; however, some data may contain intervals where values are missing. In previous studies, statistical methods have been applied to impute missing values on the basis of statistical assumptions. Deep learning algorithms, however, can learn features from the data without any such assumptions and may outperform previous approaches in imputation tasks. OBJECTIVE The aim of this study was to impute missing values in data using a deep learning approach. METHODS To develop an imputation model for missing values in accelerometer-based actigraphy data, a denoising convolutional autoencoder was adopted. We trained and tested our deep learning–based imputation model with the National Health and Nutrition Examination Survey data set and validated it with the external Korea National Health and Nutrition Examination Survey and the Korean Chronic Cerebrovascular Disease Oriented Biobank data sets which consist of daily records measuring activity counts. The partial root mean square error and partial mean absolute error of the imputed intervals (partial RMSE and partial MAE, respectively) were calculated using our deep learning–based imputation model (zero-inflated denoising convolutional autoencoder) as well as using other approaches (mean imputation, zero-inflated Poisson regression, and Bayesian regression). RESULTS The zero-inflated denoising convolutional autoencoder exhibited a partial RMSE of 839.3 counts and partial MAE of 431.1 counts, whereas mean imputation achieved a partial RMSE of 1053.2 counts and partial MAE of 545.4 counts, the zero-inflated Poisson regression model achieved a partial RMSE of 1255.6 counts and partial MAE of 508.6 counts, and Bayesian regression achieved a partial RMSE of 924.5 counts and partial MAE of 605.8 counts. CONCLUSIONS Our deep learning–based imputation model performed better than the other methods when imputing missing values in actigraphy data.

10.2196/16113 ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. e16113
Author(s):  
Jong-Hwan Jang ◽  
Junggu Choi ◽  
Hyun Woong Roh ◽  
Sang Joon Son ◽  
Chang Hyung Hong ◽  
...  

Background Data collected by an actigraphy device worn on the wrist or waist can provide objective measurements for studies related to physical activity; however, some data may contain intervals where values are missing. In previous studies, statistical methods have been applied to impute missing values on the basis of statistical assumptions. Deep learning algorithms, however, can learn features from the data without any such assumptions and may outperform previous approaches in imputation tasks. Objective The aim of this study was to impute missing values in data using a deep learning approach. Methods To develop an imputation model for missing values in accelerometer-based actigraphy data, a denoising convolutional autoencoder was adopted. We trained and tested our deep learning–based imputation model with the National Health and Nutrition Examination Survey data set and validated it with the external Korea National Health and Nutrition Examination Survey and the Korean Chronic Cerebrovascular Disease Oriented Biobank data sets which consist of daily records measuring activity counts. The partial root mean square error and partial mean absolute error of the imputed intervals (partial RMSE and partial MAE, respectively) were calculated using our deep learning–based imputation model (zero-inflated denoising convolutional autoencoder) as well as using other approaches (mean imputation, zero-inflated Poisson regression, and Bayesian regression). Results The zero-inflated denoising convolutional autoencoder exhibited a partial RMSE of 839.3 counts and partial MAE of 431.1 counts, whereas mean imputation achieved a partial RMSE of 1053.2 counts and partial MAE of 545.4 counts, the zero-inflated Poisson regression model achieved a partial RMSE of 1255.6 counts and partial MAE of 508.6 counts, and Bayesian regression achieved a partial RMSE of 924.5 counts and partial MAE of 605.8 counts. Conclusions Our deep learning–based imputation model performed better than the other methods when imputing missing values in actigraphy data.


SLEEP ◽  
2020 ◽  
Vol 43 (Supplement_1) ◽  
pp. A402-A402
Author(s):  
S Williams ◽  
A Seixas ◽  
G Avirappattu ◽  
R Robbins ◽  
L Lough ◽  
...  

Abstract Introduction Epidemiologic data show strong associations between self-reported sleep duration and hypertension (HTN). Modeling these associations is suboptimal when utilizing traditional logistic regressions. In this study, we modeled the associations of sleep duration and HTN using Deep Learning Network. Methods Data were extracted from participants (n=38,540) in the National Health and Nutrition Examination Survey (2006-2016), a nationally representative study of the US civilian non-institutionalized population. Self-reported demographic, medical history and sleep duration were determined from household interview questions. HTN was determined as SBP ≥ 130 mmHg and DBP ≥ 80 mmHg. We used a deep neural network architecture with three hidden layers with two input features and one binary output to model associations of sleep duration with HTN. The input features are the hours of sleep (limited to between 4 and 10 hours) and its square; and the output variable HTN. Probability predictions were generated 100 times from resampled (with replacement) data and averaged. Results Participants ranged from 18 to 85 years old; 51% Female, 41% white, 22% black, 26% Hispanic, 46% married, and 25% < high school. The model showed that sleeping 7 hours habitually was associated with the least observed HTN probabilities (P=0.023%). HTN probabilities increased as sleep duration decreased (6hrs=0.05%; 5hrs=0.110%; 4hrs=0.16%); HTN probabilities for long sleepers were: (8hrs=0.027; 9hrs=0.024; 10hrs=0.022). Whites showed sleeping 7hrs or 9hrs was associated with lowest HTN probabilities (0.008 vs. 0.005); blacks showed the lowest HTN probabilities associated with sleeping 8hrs (0.07), and Hispanics showed the lowest HTN probabilities sleeping 7hrs (0.04). Conclusion We found that sleeping 7 hours habitually confers the least amount of risk for HTN. Probability of HTN varies as a function of individual’s sex and race/ethnicity. Likewise, the finding that blacks experience the lowest HTN probability when they sleep habitually 8 hours is of great public health importance. Support This study was supported by funding from the NIH: R01MD007716, R01HL142066, R01AG056531, T32HL129953, K01HL135452, and K07AG052685.


2020 ◽  
Vol 12 (2) ◽  
pp. 264 ◽  
Author(s):  
Lianfa Li

Accurate estimation of fine particulate matter with diameter ≤2.5 μm (PM2.5) at a high spatiotemporal resolution is crucial for the evaluation of its health effects. Previous studies face multiple challenges including limited ground measurements and availability of spatiotemporal covariates. Although the multiangle implementation of atmospheric correction (MAIAC) retrieves satellite aerosol optical depth (AOD) at a high spatiotemporal resolution, massive non-random missingness considerably limits its application in PM2.5 estimation. Here, a deep learning approach, i.e., bootstrap aggregating (bagging) of autoencoder-based residual deep networks, was developed to make robust imputation of MAIAC AOD and further estimate PM2.5 at a high spatial (1 km) and temporal (daily) resolution. The base model consisted of autoencoder-based residual networks where residual connections were introduced to improve learning performance. Bagging of residual networks was used to generate ensemble predictions for better accuracy and uncertainty estimates. As a case study, the proposed approach was applied to impute daily satellite AOD and subsequently estimate daily PM2.5 in the Jing-Jin-Ji metropolitan region of China in 2015. The presented approach achieved competitive performance in AOD imputation (mean test R2: 0.96; mean test RMSE: 0.06) and PM2.5 estimation (test R2: 0.90; test RMSE: 22.3 μg/m3). In the additional independent tests using ground AERONET AOD and PM2.5 measurements at the monitoring station of the U.S. Embassy in Beijing, this approach achieved high R2 (0.82–0.97). Compared with the state-of-the-art machine learning method, XGBoost, the proposed approach generated more reasonable spatial variation for predicted PM2.5 surfaces. Publically available covariates used included meteorology, MERRA2 PBLH and AOD, coordinates, and elevation. Other covariates such as cloud fractions or land-use were not used due to unavailability. The results of validation and independent testing demonstrate the usefulness of the proposed approach in exposure assessment of PM2.5 using satellite AOD having massive missing values.


Sign in / Sign up

Export Citation Format

Share Document