Cox regression increases power to detect genotype-phenotype associations in genomic studies using the electronic health record

Mapping Intimacies ◽

10.1101/599910 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jacob J. Hughey ◽

Seth D. Rhoades ◽

Darwin Y. Fu ◽

Lisa Bastarache ◽

Joshua C. Denny ◽

...

Keyword(s):

Logistic Regression ◽

Electronic Health Records ◽

Healthcare System ◽

Cox Regression ◽

Cox Proportional Hazards ◽

Right Censoring ◽

Type I ◽

Health Records ◽

Wide Range ◽

Electronic Health

AbstractBackgroundThe growth of DNA biobanks linked to data from electronic health records (EHRs) has enabled the discovery of numerous associations between genomic variants and clinical phenotypes. Nonetheless, although clinical data are generally longitudinal, standard approaches for detecting genotype-phenotype associations in such linked data, notably logistic regression, do not naturally account for the times at which events occur. Here we explored the advantages of quantifying associations using Cox proportional hazards regression, which can account for the age at which a patient first visited the healthcare system (left truncation) and the age at which a patient either last visited the healthcare system or acquired a particular phenotype (right censoring).ResultsUsing simulated data, we found that, compared to logistic regression, Cox regression had greater power at equivalent Type I error. We then scanned for genotype-phenotype associations using logistic regression and Cox regression on 50 phenotypes derived from the electronic health records of 49 792 genotyped individuals. In terms of effect sizes, the hazard ratios estimated by Cox regression were nearly identical to the odds ratios estimated by logistic regression. Consistent with the findings from our simulations, Cox regression had approximately 10% greater relative sensitivity for detecting known associations from the NHGRI-EBI GWAS Catalog.ConclusionsAs longitudinal health-related data continue to grow, Cox regression may improve our ability to identify the genetic basis for a wide range of human phenotypes.

Download Full-text

Cox regression increases power to detect genotype-phenotype associations in genomic studies using the electronic health record

BMC Genomics ◽

10.1186/s12864-019-6192-1 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 4

Author(s):

Jacob J. Hughey ◽

Seth D. Rhoades ◽

Darwin Y. Fu ◽

Lisa Bastarache ◽

Joshua C. Denny ◽

...

Keyword(s):

Logistic Regression ◽

Healthcare System ◽

Proportional Hazards ◽

Cox Regression ◽

Relative Sensitivity ◽

Cox Proportional Hazards ◽

Right Censoring ◽

Type I ◽

Wide Range ◽

Electronic Health

Abstract Background The growth of DNA biobanks linked to data from electronic health records (EHRs) has enabled the discovery of numerous associations between genomic variants and clinical phenotypes. Nonetheless, although clinical data are generally longitudinal, standard approaches for detecting genotype-phenotype associations in such linked data, notably logistic regression, do not naturally account for variation in the period of follow-up or the time at which an event occurs. Here we explored the advantages of quantifying associations using Cox proportional hazards regression, which can account for the age at which a patient first visited the healthcare system (left truncation) and the age at which a patient either last visited the healthcare system or acquired a particular phenotype (right censoring). Results In comprehensive simulations, we found that, compared to logistic regression, Cox regression had greater power at equivalent Type I error. We then scanned for genotype-phenotype associations using logistic regression and Cox regression on 50 phenotypes derived from the EHRs of 49,792 genotyped individuals. Consistent with the findings from our simulations, Cox regression had approximately 10% greater relative sensitivity for detecting known associations from the NHGRI-EBI GWAS Catalog. In terms of effect sizes, the hazard ratios estimated by Cox regression were strongly correlated with the odds ratios estimated by logistic regression. Conclusions As longitudinal health-related data continue to grow, Cox regression may improve our ability to identify the genetic basis for a wide range of human phenotypes.

Download Full-text

Sensitivity of Clinical Pediatric Obesity Diagnosis Documented in Electronic Health Records

Clinical Pediatrics ◽

10.1177/0009922820941640 ◽

2020 ◽

Vol 59 (14) ◽

pp. 1274-1281

Author(s):

Christine B. San Giovanni ◽

Myla Ebeling ◽

Robert A. Davis ◽

C. Shaun Wagner ◽

William T. Basco

Keyword(s):

Body Mass Index ◽

Logistic Regression ◽

Chronic Disease ◽

Electronic Health Records ◽

Pediatric Obesity ◽

Disease Diagnosis ◽

Health Records ◽

Academic Affiliation ◽

Specific Factors ◽

Electronic Health

Objective. This study tested the sensitivity of obesity diagnosis in electronic health records (EHRs) using body mass index (BMI) classification and identified variables associated with obesity diagnosis. Methods. Eligible children aged 2 to 18 years had a calculable BMI in 2017 and had at least 1 visit in 2016 and 2017. Sensitivity of clinical obesity diagnosis compared with children’s BMI percentile was calculated. Logistic regression was performed to determine variables associated with obesity diagnosis. Results. Analyses included 31 059 children with BMI at or above 95th percentile. Sensitivity of clinical obesity diagnosis was 35.81%. Clinical obesity diagnosis was more likely if the child had a well visit, had Medicaid insurance, was female, Hispanic or Black, had a chronic disease diagnosis, and saw a provider in a practice in an urban area or with academic affiliation. Conclusion. Sensitivity of clinical obesity diagnosis in EHR is low. Clinical obesity diagnosis is associated with nonmodifiable child-specific factors but also modifiable practice-specific factors.

Download Full-text

Determinants and extent of weight recording in UK primary care: an analysis of 5 million adults’ electronic health records from 2000 to 2017

BMC Medicine ◽

10.1186/s12916-019-1446-y ◽

2019 ◽

Vol 17 (1) ◽

Cited By ~ 5

Author(s):

B. D. Nicholson ◽

P. Aveyard ◽

C. R. Bankhead ◽

W. Hamilton ◽

F. D. R. Hobbs ◽

...

Keyword(s):

Primary Care ◽

Electronic Health Records ◽

Negative Binomial ◽

Cox Regression ◽

Clinical Care ◽

Negative Binomial Regression ◽

Routine Activity ◽

Health Records ◽

Clinical Events ◽

Electronic Health

Abstract Background Excess weight and unexpected weight loss are associated with multiple disease states and increased morbidity and mortality, but weight measurement is not routine in many primary care settings. The aim of this study was to characterise who has had their weight recorded in UK primary care, how frequently, by whom and in relation to which clinical events, symptoms and diagnoses. Methods A longitudinal analysis of UK primary care electronic health records (EHR) data from 2000 to 2017. Descriptive statistics were used to summarise weight recording in terms of patient sociodemographic characteristics, health professional encounters, clinical events, symptoms and diagnoses. Negative binomial regression was used to model the likelihood of having a weight record each year, and Cox regression to the likelihood of repeated weight recording. Results A total of 14,049,871 weight records were identified in the EHR of 4,918,746 patients during the study period, representing 26,998,591 person-years of observation. Around a third of patients had a weight record each year. Forty-nine percent of weight records were repeated within a year with an average time to a repeat weight record of 1.92 years. Weight records were most often taken by nursing staff (38–42%) and GPs (37–39%) as part of a routine clinical care, such as chronic disease reviews (16%), medication reviews (6–8%) and health checks (6–7%), or were associated with consultations for contraception (5–8%), respiratory disease (5%) and obesity (1%). Patient characteristics independently associated with an increased likelihood of weight recording were as follows: female sex, younger and older adults, non-drinkers, ex-smokers, low or high BMI, being more deprived, diagnosed with a greater number of comorbidities and consulting more frequently. The effect of policy-level incentives to record weight did not appear to be sustained after they were removed. Conclusion Weight recording is not a routine activity in UK primary care. It is recorded for around a third of patients each year and is repeated on average every 2 years for these patients. It is more common in females with higher BMI and in those with comorbidity. Incentive payments and their removal appear to be associated with increases and decreases in weight recording.

Download Full-text

Drug allergies documented in electronic health records of a large healthcare system

Allergy ◽

10.1111/all.12881 ◽

2016 ◽

Vol 71 (9) ◽

pp. 1305-1313 ◽

Cited By ~ 72

Author(s):

L. Zhou ◽

N. Dhopeshwarkar ◽

K. G. Blumenthal ◽

F. Goss ◽

M. Topaz ◽

...

Keyword(s):

Electronic Health Records ◽

Healthcare System ◽

Health Records ◽

Electronic Health

Download Full-text

AutoScore: A Machine Learning–Based Automatic Clinical Score Generator and Its Application to Mortality Prediction Using Electronic Health Records

JMIR Medical Informatics ◽

10.2196/21798 ◽

2020 ◽

Vol 8 (10) ◽

pp. e21798 ◽

Cited By ~ 1

Author(s):

Feng Xie ◽

Bibhas Chakraborty ◽

Marcus Eng Hock Ong ◽

Benjamin Alan Goldstein ◽

Nan Liu

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Electronic Health Records ◽

Clinical Score ◽

Mortality Prediction ◽

Fine Tuning ◽

Health Records ◽

Scoring Model ◽

Benchmark Database ◽

Electronic Health

Background Risk scores can be useful in clinical risk stratification and accurate allocations of medical resources, helping health providers improve patient care. Point-based scores are more understandable and explainable than other complex models and are now widely used in clinical decision making. However, the development of the risk scoring model is nontrivial and has not yet been systematically presented, with few studies investigating methods of clinical score generation using electronic health records. Objective This study aims to propose AutoScore, a machine learning–based automatic clinical score generator consisting of 6 modules for developing interpretable point-based scores. Future users can employ the AutoScore framework to create clinical scores effortlessly in various clinical applications. Methods We proposed the AutoScore framework comprising 6 modules that included variable ranking, variable transformation, score derivation, model selection, score fine-tuning, and model evaluation. To demonstrate the performance of AutoScore, we used data from the Beth Israel Deaconess Medical Center to build a scoring model for mortality prediction and then compared the data with other baseline models using the receiver operating characteristic analysis. A software package in R 3.5.3 (R Foundation) was also developed to demonstrate the implementation of AutoScore. Results Implemented on the data set with 44,918 individual admission episodes of intensive care, the AutoScore-created scoring models performed comparably well as other standard methods (ie, logistic regression, stepwise regression, least absolute shrinkage and selection operator, and random forest) in terms of predictive accuracy and model calibration but required fewer predictors and presented high interpretability and accessibility. The nine-variable, AutoScore-created, point-based scoring model achieved an area under the curve (AUC) of 0.780 (95% CI 0.764-0.798), whereas the model of logistic regression with 24 variables had an AUC of 0.778 (95% CI 0.760-0.795). Moreover, the AutoScore framework also drives the clinical research continuum and automation with its integration of all necessary modules. Conclusions We developed an easy-to-use, machine learning–based automatic clinical score generator, AutoScore; systematically presented its structure; and demonstrated its superiority (predictive performance and interpretability) over other conventional methods using a benchmark database. AutoScore will emerge as a potential scoring tool in various medical applications.

Download Full-text

Estimate of disease heritability using 7.4 million familial relationships inferred from electronic health records

10.1101/066068 ◽

2016 ◽

Cited By ~ 3

Author(s):

Fernanda Polubriaginof ◽

Rami Vanguri ◽

Kayla Quinnies ◽

Gillian M. Belbin ◽

Alexandre Yahi ◽

...

Keyword(s):

Electronic Health Records ◽

Patient Privacy ◽

Next Of Kin ◽

Health Records ◽

Familial Relationships ◽

Wide Range ◽

Causes Of Disease ◽

Contact Data ◽

Electronic Health ◽

Patient Emergency

AbstractHeritability is essential for understanding the biological causes of disease, but requires laborious patient recruitment and phenotype ascertainment. Electronic health records (EHR) passively capture a wide range of clinically relevant data and provide a novel resource for studying the heritability of traits that are not typically accessible. EHRs contain next-of-kin information collected via patient emergency contact forms, but until now, these data have gone unused in research. We mined emergency contact data at three academic medical centers and identified millions of familial relationships while maintaining patient privacy. Identified relationships were consistent with genetically-derived relatedness. We used EHR data to compute heritability estimates for 500 disease phenotypes. Overall, estimates were consistent with literature and between sites. Inconsistencies were indicative of limitations and opportunities unique to EHR research. These analyses provide a novel validation of the use of EHRs for genetics and disease research.One Sentence SummaryWe demonstrate that next-of-kin information can be used to identify familial relationships in the EHR, providing unique opportunities for precision medicine studies.

Download Full-text

Visual Analytics for Dimension Reduction and Cluster Analysis of High Dimensional Electronic Health Records

Informatics ◽

10.3390/informatics7020017 ◽

2020 ◽

Vol 7 (2) ◽

pp. 17 ◽

Cited By ~ 1

Author(s):

Sheikh S. Abdullah ◽

Neda Rostamzadeh ◽

Kamran Sedig ◽

Amit X. Garg ◽

Eric McArthur

Keyword(s):

Cluster Analysis ◽

Electronic Health Records ◽

Dimension Reduction ◽

Visual Analytics ◽

Machine Learning Techniques ◽

High Dimensional ◽

Health Records ◽

Wide Range ◽

Electronic Health ◽

And Cluster Analysis

Recent advancement in EHR-based (Electronic Health Record) systems has resulted in producing data at an unprecedented rate. The complex, growing, and high-dimensional data available in EHRs creates great opportunities for machine learning techniques such as clustering. Cluster analysis often requires dimension reduction to achieve efficient processing time and mitigate the curse of dimensionality. Given a wide range of techniques for dimension reduction and cluster analysis, it is not straightforward to identify which combination of techniques from both families leads to the desired result. The ability to derive useful and precise insights from EHRs requires a deeper understanding of the data, intermediary results, configuration parameters, and analysis processes. Although these tasks are often tackled separately in existing studies, we present a visual analytics (VA) system, called Visual Analytics for Cluster Analysis and Dimension Reduction of High Dimensional Electronic Health Records (VALENCIA), to address the challenges of high-dimensional EHRs in a single system. VALENCIA brings a wide range of cluster analysis and dimension reduction techniques, integrate them seamlessly, and make them accessible to users through interactive visualizations. It offers a balanced distribution of processing load between users and the system to facilitate the performance of high-level cognitive tasks in such a way that would be difficult without the aid of a VA system. Through a real case study, we have demonstrated how VALENCIA can be used to analyze the healthcare administrative dataset stored at ICES. This research also highlights what needs to be considered in the future when developing VA systems that are designed to derive deep and novel insights into EHRs.

Download Full-text

Challenges in defining Long COVID: Striking differences across literature, Electronic Health Records, and patient-reported information

10.1101/2021.03.20.21253896 ◽

2021 ◽

Author(s):

Halie M. Rando ◽

Tellen D. Bennett ◽

James Brian Byrd ◽

Carolyn Bramante ◽

Tiffany J. Callahan ◽

...

Keyword(s):

Electronic Health Records ◽

Multiple Organ ◽

Health Records ◽

Health Crisis ◽

Organ Systems ◽

Wide Range ◽

Patient Reported ◽

Electronic Health ◽

Novel Coronavirus

Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenges globally. In addition to a complex acute presentation that can affect multiple organ systems, increasing evidence points to long-term sequelae being common and impactful. As the worldwide scientific community forges ahead with efforts to characterize a wide range of outcomes associated with SARS-CoV-2 infection, the proliferation of available data has made it clear that formal definitions are needed in order to design robust and consistent studies of Long COVID that consistently capture variation in long-term outcomes. In the present study, we investigate the definitions used in the literature published to date and compare them against data available from electronic health records and patient-reported information collected via surveys. Long COVID holds the potential to produce a second public health crisis on the heels of the pandemic. Proactive efforts to identify the characteristics of this heterogeneous condition are imperative for a rigorous scientific effort to investigate and mitigate this threat.

Download Full-text

EHRtemporalVariability: delineating temporal dataset shifts in electronic health records

10.1101/2020.04.07.20056564 ◽

2020 ◽

Author(s):

Carlos Sáez ◽

Alba Gutiérrez-Sacristán ◽

Isaac Kohane ◽

Juan M García-Gómez ◽

Paul Avillach

Keyword(s):

Electronic Health Records ◽

R Package ◽

Reliable Data ◽

Data Reuse ◽

Statistical Distributions ◽

Health Records ◽

Link Type ◽

Wide Range ◽

Electronic Health ◽

Over Time

AbstractBackgroundTemporal variability in healthcare processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal dataset shifts can present as trends, abrupt or seasonal changes in the statistical distributions of data over time, being particularly complex to address in multi-modal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large historical data from EHRs, there is a need for specific software methods to help delineate temporal dataset shifts to ensure reliable data reuse.FindingsEHRtemporalVariability is an Open Source R-package and Shiny-app designed to explore and identify temporal dataset shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time, projects their temporal-evolution through non-parametric Information Geometric Temporal plots, and enables the exploration of changes in variables through Data Temporal Heatmaps. We demonstrate the capability of EHRtemporalVariability to delineate dataset shifts in three impact case studies, one of them available for reproducibility.ConclusionsEHRtemporalVariability enables the exploration and identification of dataset shifts, contributing to broadly examine and repurpose large, longitudinal datasets. Our goal is to help ensure reliable data reuse to a wide range of biomedical data users. EHRtemporalVariability is suited to technical users programmatically using the R-package and to those users not familiar with programming using the Shiny user interface.Availabilityhttps://github.com/hms-dbmi/EHRtemporalVariability/ Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.html On-line demo: http://ehrtemporalvariability.upv.es/

Download Full-text

Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification

10.1101/2019.12.26.19015859 ◽

2019 ◽

Author(s):

Lauren J. Beesley ◽

Bhramar Mukherjee

Keyword(s):

Electronic Health Records ◽

Selection Bias ◽

Type I Error ◽

Association Studies ◽

Disease Status ◽

Patient Specific ◽

Type I ◽

Health Records ◽

Electronic Health ◽

New Strategies

AbstractHealth research using electronic health records (EHR) has gained popularity, but misclassification of EHR-derived disease status and lack of representativeness of the study sample can result in substantial bias in effect estimates and can impact power and type I error. In this paper, we develop new strategies for handling disease status misclassification and selection bias in EHR-based association studies. We first focus on each type of bias separately. For misclassification, we propose three novel likelihood-based bias correction strategies. A distinguishing feature of the EHR setting is that misclassification may be related to patient-specific factors, and the proposed methods leverage data in the EHR to estimate misclassification rates without gold standard labels. For addressing selection bias, we describe how calibration and inverse probability weighting methods from the survey sampling literature can be extended and applied to the EHR setting.Addressing misclassification and selection biases simultaneously is a more challenging problem than dealing with each on its own, and we propose several new strategies to address this situation. For all methods proposed, we derive valid standard errors and provide software for implementation. We provide a new suite of statistical estimation and inference strategies for addressing misclassification and selection bias simultaneously that is tailored to problems arising in EHR data analysis. We apply these methods to data from The Michigan Genomics Initiative (MGI), a longitudinal EHR-linked biorepository.

Download Full-text