scholarly journals Estimate of disease heritability using 7.4 million familial relationships inferred from electronic health records

2016 ◽  
Author(s):  
Fernanda Polubriaginof ◽  
Rami Vanguri ◽  
Kayla Quinnies ◽  
Gillian M. Belbin ◽  
Alexandre Yahi ◽  
...  

AbstractHeritability is essential for understanding the biological causes of disease, but requires laborious patient recruitment and phenotype ascertainment. Electronic health records (EHR) passively capture a wide range of clinically relevant data and provide a novel resource for studying the heritability of traits that are not typically accessible. EHRs contain next-of-kin information collected via patient emergency contact forms, but until now, these data have gone unused in research. We mined emergency contact data at three academic medical centers and identified millions of familial relationships while maintaining patient privacy. Identified relationships were consistent with genetically-derived relatedness. We used EHR data to compute heritability estimates for 500 disease phenotypes. Overall, estimates were consistent with literature and between sites. Inconsistencies were indicative of limitations and opportunities unique to EHR research. These analyses provide a novel validation of the use of EHRs for genetics and disease research.One Sentence SummaryWe demonstrate that next-of-kin information can be used to identify familial relationships in the EHR, providing unique opportunities for precision medicine studies.

2019 ◽  
Author(s):  
Zhouzerui Liu ◽  
Nicholas Tatonetti

AbstractHeritability is an important statistic for evaluating genetic contribution to phenotypes. Estimating heritability, however, requires a laborious recruitment of a large number of relatives. Electronic health records (EHR) contain massive relative information in emergency contact forms. Recently, we presented RIFTEHR, an algorithm for extracting relationships from EHR. Here, we present an updated version and reconstructed 4.2 million familial relationships from the latest New York-Presbyterian/Columbia University Irving Medical Center (CUIMC) EHR system. The number of updated relationships is 30 percent more than the last version. We present a new implementation of RIFTEHR, which runs in linear time, thus largely improves the speed of the algorithm. We also present a data encryption method, to protect patient privacy in running the algorithm. These resources can be used for generalized use of familial relationships from EHR in genetic studies.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jitendra Jonnagaddala ◽  
Aipeng Chen ◽  
Sean Batongbacal ◽  
Chandini Nekkantti

AbstractFor research purposes, protected health information is often redacted from unstructured electronic health records to preserve patient privacy and confidentiality. The OpenDeID corpus is designed to assist development of automatic methods to redact sensitive information from unstructured electronic health records. We retrieved 4548 unstructured surgical pathology reports from four urban Australian hospitals. The corpus was developed by two annotators under three different experimental settings. The quality of the annotations was evaluated for each setting. Specifically, we employed serial annotations, parallel annotations, and pre-annotations. Our results suggest that the pre-annotations approach is not reliable in terms of quality when compared to the serial annotations but can drastically reduce annotation time. The OpenDeID corpus comprises 2,100 pathology reports from 1,833 cancer patients with an average of 737.49 tokens and 7.35 protected health information entities annotated per report. The overall inter annotator agreement and deviation scores are 0.9464 and 0.9726, respectively. Realistic surrogates are also generated to make the corpus suitable for distribution to other researchers.


Author(s):  
Akhil Vaid ◽  
Suraj K Jaladanki ◽  
Jie Xu ◽  
Shelly Teng ◽  
Arvind Kumar ◽  
...  

Machine learning (ML) models require large datasets which may be siloed across different healthcare institutions. Using federated learning, a ML technique that avoids locally aggregating raw clinical data across multiple institutions, we predict mortality within seven days in hospitalized COVID-19 patients. Patient data was collected from Electronic Health Records (EHRs) from five hospitals within the Mount Sinai Health System (MSHS). Logistic Regression with L1 regularization (LASSO) and Multilayer Perceptron (MLP) models were trained using local data at each site, a pooled model with combined data from all five sites, and a federated model that only shared parameters with a central aggregator. Both the federated LASSO and federated MLP models performed better than their local model counterparts at four hospitals. The federated MLP model also outperformed the federated LASSO model at all hospitals. Federated learning shows promise in COVID-19 EHR data to develop robust predictive models without compromising patient privacy.


2017 ◽  
pp. 960-973
Author(s):  
Karen Ervin

This chapter examines the literature of healthcare in the United States during the transitioning to electronic records. Key government legislation, such as the Health Insurance Portability and Accountability Act (HIPAA) and the Health Information Technology for Economic and Clinical Health Act (HITECH), which were part of the American Recovery and Reinvestment Act (ARRA) and the Affordable Health Care Act, are reviewed. The review concentrates on patient privacy issues, how they have been addressed in these acts, and what recommendations for improvement have been found in the literature. A comparison of the adoption of electronic health records on a nationwide scale in three countries is included. England, Australia, and the United States are all embarking in and are at different stages of implementing nationwide electronic health database systems. The resources used in locating relevant literature were PubMed, Medline, Highwire Press, State Library of Pennsylvania, and Google Scholar databases.


Informatics ◽  
2020 ◽  
Vol 7 (2) ◽  
pp. 17 ◽  
Author(s):  
Sheikh S. Abdullah ◽  
Neda Rostamzadeh ◽  
Kamran Sedig ◽  
Amit X. Garg ◽  
Eric McArthur

Recent advancement in EHR-based (Electronic Health Record) systems has resulted in producing data at an unprecedented rate. The complex, growing, and high-dimensional data available in EHRs creates great opportunities for machine learning techniques such as clustering. Cluster analysis often requires dimension reduction to achieve efficient processing time and mitigate the curse of dimensionality. Given a wide range of techniques for dimension reduction and cluster analysis, it is not straightforward to identify which combination of techniques from both families leads to the desired result. The ability to derive useful and precise insights from EHRs requires a deeper understanding of the data, intermediary results, configuration parameters, and analysis processes. Although these tasks are often tackled separately in existing studies, we present a visual analytics (VA) system, called Visual Analytics for Cluster Analysis and Dimension Reduction of High Dimensional Electronic Health Records (VALENCIA), to address the challenges of high-dimensional EHRs in a single system. VALENCIA brings a wide range of cluster analysis and dimension reduction techniques, integrate them seamlessly, and make them accessible to users through interactive visualizations. It offers a balanced distribution of processing load between users and the system to facilitate the performance of high-level cognitive tasks in such a way that would be difficult without the aid of a VA system. Through a real case study, we have demonstrated how VALENCIA can be used to analyze the healthcare administrative dataset stored at ICES. This research also highlights what needs to be considered in the future when developing VA systems that are designed to derive deep and novel insights into EHRs.


2021 ◽  
Author(s):  
Halie M. Rando ◽  
Tellen D. Bennett ◽  
James Brian Byrd ◽  
Carolyn Bramante ◽  
Tiffany J. Callahan ◽  
...  

Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenges globally. In addition to a complex acute presentation that can affect multiple organ systems, increasing evidence points to long-term sequelae being common and impactful. As the worldwide scientific community forges ahead with efforts to characterize a wide range of outcomes associated with SARS-CoV-2 infection, the proliferation of available data has made it clear that formal definitions are needed in order to design robust and consistent studies of Long COVID that consistently capture variation in long-term outcomes. In the present study, we investigate the definitions used in the literature published to date and compare them against data available from electronic health records and patient-reported information collected via surveys. Long COVID holds the potential to produce a second public health crisis on the heels of the pandemic. Proactive efforts to identify the characteristics of this heterogeneous condition are imperative for a rigorous scientific effort to investigate and mitigate this threat.


2020 ◽  
Author(s):  
Carlos Sáez ◽  
Alba Gutiérrez-Sacristán ◽  
Isaac Kohane ◽  
Juan M García-Gómez ◽  
Paul Avillach

AbstractBackgroundTemporal variability in healthcare processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal dataset shifts can present as trends, abrupt or seasonal changes in the statistical distributions of data over time, being particularly complex to address in multi-modal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large historical data from EHRs, there is a need for specific software methods to help delineate temporal dataset shifts to ensure reliable data reuse.FindingsEHRtemporalVariability is an Open Source R-package and Shiny-app designed to explore and identify temporal dataset shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time, projects their temporal-evolution through non-parametric Information Geometric Temporal plots, and enables the exploration of changes in variables through Data Temporal Heatmaps. We demonstrate the capability of EHRtemporalVariability to delineate dataset shifts in three impact case studies, one of them available for reproducibility.ConclusionsEHRtemporalVariability enables the exploration and identification of dataset shifts, contributing to broadly examine and repurpose large, longitudinal datasets. Our goal is to help ensure reliable data reuse to a wide range of biomedical data users. EHRtemporalVariability is suited to technical users programmatically using the R-package and to those users not familiar with programming using the Shiny user interface.Availabilityhttps://github.com/hms-dbmi/EHRtemporalVariability/ Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.html On-line demo: http://ehrtemporalvariability.upv.es/


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0243043
Author(s):  
Shekha Chenthara ◽  
Khandakar Ahmed ◽  
Hua Wang ◽  
Frank Whittaker ◽  
Zhenxiang Chen

The privacy of Electronic Health Records (EHRs) is facing a major hurdle with outsourcing private health data in the cloud as there exists danger of leaking health information to unauthorized parties. In fact, EHRs are stored on centralized databases that increases the security risk footprint and requires trust in a single authority which cannot effectively protect data from internal attacks. This research focuses on ensuring the patient privacy and data security while sharing the sensitive data across same or different organisations as well as healthcare providers in a distributed environment. This research develops a privacy-preserving framework viz Healthchain based on Blockchain technology that maintains security, privacy, scalability and integrity of the e-health data. The Blockchain is built on Hyperledger fabric, a permissioned distributed ledger solutions by using Hyperledger composer and stores EHRs by utilizing InterPlanetary File System (IPFS) to build this healthchain framework. Moreover, the data stored in the IPFS is encrypted by using a unique cryptographic public key encryption algorithm to create a robust blockchain solution for electronic health data. The objective of the research is to provide a foundation for developing security solutions against cyber-attacks by exploiting the inherent features of the blockchain, and thus contribute to the robustness of healthcare information sharing environments. Through the results, the proposed model shows that the healthcare records are not traceable to unauthorized access as the model stores only the encrypted hash of the records that proves effectiveness in terms of data security, enhanced data privacy, improved data scalability, interoperability and data integrity while sharing and accessing medical records among stakeholders across the healthchain network.


Author(s):  
Karen Ervin

This chapter examines the literature of healthcare in the United States during the transitioning to electronic records. Key government legislation, such as the Health Insurance Portability and Accountability Act (HIPAA) and the Health Information Technology for Economic and Clinical Health Act (HITECH), which were part of the American Recovery and Reinvestment Act (ARRA) and the Affordable Health Care Act, are reviewed. The review concentrates on patient privacy issues, how they have been addressed in these acts, and what recommendations for improvement have been found in the literature. A comparison of the adoption of electronic health records on a nationwide scale in three countries is included. England, Australia, and the United States are all embarking in and are at different stages of implementing nationwide electronic health database systems. The resources used in locating relevant literature were PubMed, Medline, Highwire Press, State Library of Pennsylvania, and Google Scholar databases.


Sign in / Sign up

Export Citation Format

Share Document