scholarly journals A Nuisance-Free Inference Procedure Accounting for the Unknown Missingness with Application to Electronic Health Records

Entropy ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. 1154
Author(s):  
Jiwei Zhao ◽  
Chi Chen

We study how to conduct statistical inference in a regression model where the outcome variable is prone to missing values and the missingness mechanism is unknown. The model we consider might be a traditional setting or a modern high-dimensional setting where the sparsity assumption is usually imposed and the regularization technique is popularly used. Motivated by the fact that the missingness mechanism, albeit usually treated as a nuisance, is difficult to specify correctly, we adopt the conditional likelihood approach so that the nuisance can be completely ignored throughout our procedure. We establish the asymptotic theory of the proposed estimator and develop an easy-to-implement algorithm via some data manipulation strategy. In particular, under the high-dimensional setting where regularization is needed, we propose a data perturbation method for the post-selection inference. The proposed methodology is especially appealing when the true missingness mechanism tends to be missing not at random, e.g., patient reported outcomes or real world data such as electronic health records. The performance of the proposed method is evaluated by comprehensive simulation experiments as well as a study of the albumin level in the MIMIC-III database.

2021 ◽  
Vol 1 (3) ◽  
pp. 166-181
Author(s):  
Muhammad Adib Uz Zaman ◽  
Dongping Du

Electronic health records (EHRs) can be very difficult to analyze since they usually contain many missing values. To build an efficient predictive model, a complete dataset is necessary. An EHR usually contains high-dimensional longitudinal time series data. Most commonly used imputation methods do not consider the importance of temporal information embedded in EHR data. Besides, most time-dependent neural networks such as recurrent neural networks (RNNs) inherently consider the time steps to be equal, which in many cases, is not appropriate. This study presents a method using the gated recurrent unit (GRU), neural ordinary differential equations (ODEs), and Bayesian estimation to incorporate the temporal information and impute sporadically observed time series measurements in high-dimensional EHR data.


2018 ◽  
Vol 24 (3) ◽  
pp. 95-98 ◽  
Author(s):  
Daphne Guinn ◽  
Erin E Wilhelm ◽  
Grazyna Lieberman ◽  
Sean Khozin

2018 ◽  
Vol 4 ◽  
pp. 205520761880465 ◽  
Author(s):  
Tim Robbins ◽  
Sarah N Lim Choi Keung ◽  
Sailesh Sankar ◽  
Harpal Randeva ◽  
Theodoros N Arvanitis

Introduction Electronic health records provide an unparalleled opportunity for the use of patient data that is routinely collected and stored, in order to drive research and develop an epidemiological understanding of disease. Diabetes, in particular, stands to benefit, being a data-rich, chronic-disease state. This article aims to provide an understanding of the extent to which the healthcare sector is using routinely collected and stored data to inform research and epidemiological understanding of diabetes mellitus. Methods Narrative literature review of articles, published in both the medical- and engineering-based informatics literature. Results There has been a significant increase in the number of papers published, which utilise electronic health records as a direct data source for diabetes research. These articles consider a diverse range of research questions. Internationally, the secondary use of electronic health records, as a research tool, is most prominent in the USA. The barriers most commonly described in research studies include missing values and misclassification, alongside challenges of establishing the generalisability of results. Discussion Electronic health record research is an important and expanding area of healthcare research. Much of the research output remains in the form of conference abstracts and proceedings, rather than journal articles. There is enormous opportunity within the United Kingdom to develop these research methodologies, due to national patient identifiers. Such a healthcare context may enable UK researchers to overcome many of the barriers encountered elsewhere and thus to truly unlock the potential of electronic health records.


2020 ◽  
Vol 29 (11) ◽  
pp. 1373-1381
Author(s):  
John Tazare ◽  
Liam Smeeth ◽  
Stephen J. W. Evans ◽  
Elizabeth Williamson ◽  
Ian J. Douglas

2021 ◽  
Author(s):  
David Chushig-Muzo ◽  
Cristina Soguero-Ruiz ◽  
Pablo de Miguel Bohoyo ◽  
Inmaculada Mora-Jiménez

Abstract Background: Nowadays, patients with chronic diseases such as diabetes and hypertension have reached alarming numbers worldwide. These diseases increase the risk of developing acute complications and involve a substantial economic burden and demand for health resources. The widespread adoption of Electronic Health Records (EHRs) is opening great opportunities for supporting decision-making. Nevertheless, data extracted from EHRs are complex (heterogeneous, high-dimensional and usually noisy), hampering the knowledge extraction with conventional approaches. Methods: We propose the use of the Denoising Autoencoder (DAE), a Machine Learning (ML) technique allowing to transform high-dimensional data into latent representations (LRs), thus addressing the main challenges with clinical data. We explore in this work how the combination of LRs with a visualization method can be used to map the patient data in a two-dimensional space, gaining knowledge about the distribution of patients with different chronic conditions. Furthermore, this representation can be also used to characterize the patient's health status evolution, which is of paramount importance in the clinical setting. Results: To obtain clinical LRs, we considered real-world data extracted from EHRs linked to the University Hospital of Fuenlabrada in Spain. Experimental results showed the great potential of DAEs to identify patients with clinical patterns linked to hypertension, diabetes and multimorbidity. The procedure allowed us to find patients with the same main chronic disease but different clinical characteristics. Thus, we identified two kinds of diabetic patients with differences in their drug therapy (insulin and non-insulin dependant), and also a group of women affected by hypertension and gestational diabetes. We also present a proof of concept for mapping the health status evolution of synthetic patients when considering the most significant diagnoses and drugs associated with chronic patients. Conclusions: Our results highlighted the value of ML techniques to extract clinical knowledge, supporting the identification of patients with certain chronic conditions. Furthermore, the patient's health status progression on the two-dimensional space might be used as a tool for clinicians aiming to characterize health conditions and identify their more relevant clinical codes.


2012 ◽  
Vol 31 (3) ◽  
pp. 497-504 ◽  
Author(s):  
Russell E. Glasgow ◽  
Robert M. Kaplan ◽  
Judith K. Ockene ◽  
Edwin B. Fisher ◽  
Karen M. Emmons

Sign in / Sign up

Export Citation Format

Share Document