Estimate of disease heritability using 7.4 million familial relationships inferred from electronic health records

Familial relationships in electronic health records (EHR) v2

10.1101/731976 ◽

2019 ◽

Author(s):

Zhouzerui Liu ◽

Nicholas Tatonetti

Keyword(s):

New York ◽

Electronic Health Records ◽

Linear Time ◽

Medical Center ◽

Data Encryption ◽

Columbia University ◽

Patient Privacy ◽

Health Records ◽

Familial Relationships ◽

Electronic Health

AbstractHeritability is an important statistic for evaluating genetic contribution to phenotypes. Estimating heritability, however, requires a laborious recruitment of a large number of relatives. Electronic health records (EHR) contain massive relative information in emergency contact forms. Recently, we presented RIFTEHR, an algorithm for extracting relationships from EHR. Here, we present an updated version and reconstructed 4.2 million familial relationships from the latest New York-Presbyterian/Columbia University Irving Medical Center (CUIMC) EHR system. The number of updated relationships is 30 percent more than the last version. We present a new implementation of RIFTEHR, which runs in linear time, thus largely improves the speed of the algorithm. We also present a data encryption method, to protect patient privacy in running the algorithm. These resources can be used for generalized use of familial relationships from EHR in genetic studies.

Download Full-text

The OpenDeID corpus for patient de-identification

Scientific Reports ◽

10.1038/s41598-021-99554-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jitendra Jonnagaddala ◽

Aipeng Chen ◽

Sean Batongbacal ◽

Chandini Nekkantti

Keyword(s):

Electronic Health Records ◽

Health Information ◽

Sensitive Information ◽

Protected Health Information ◽

Patient Privacy ◽

Health Records ◽

Automatic Methods ◽

Electronic Health ◽

Pathology Reports ◽

Privacy And Confidentiality

AbstractFor research purposes, protected health information is often redacted from unstructured electronic health records to preserve patient privacy and confidentiality. The OpenDeID corpus is designed to assist development of automatic methods to redact sensitive information from unstructured electronic health records. We retrieved 4548 unstructured surgical pathology reports from four urban Australian hospitals. The corpus was developed by two annotators under three different experimental settings. The quality of the annotations was evaluated for each setting. Specifically, we employed serial annotations, parallel annotations, and pre-annotations. Our results suggest that the pre-annotations approach is not reliable in terms of quality when compared to the serial annotations but can drastically reduce annotation time. The OpenDeID corpus comprises 2,100 pathology reports from 1,833 cancer patients with an average of 737.49 tokens and 7.35 protected health information entities annotated per report. The overall inter annotator agreement and deviation scores are 0.9464 and 0.9726, respectively. Realistic surrogates are also generated to make the corpus suitable for distribution to other researchers.

Download Full-text

Federated Learning of Electronic Health Records Improves Mortality Prediction in Patients Hospitalized with COVID-19

10.1101/2020.08.11.20172809 ◽

2020 ◽

Cited By ~ 1

Author(s):

Akhil Vaid ◽

Suraj K Jaladanki ◽

Jie Xu ◽

Shelly Teng ◽

Arvind Kumar ◽

...

Keyword(s):

Electronic Health Records ◽

Large Datasets ◽

Mortality Prediction ◽

Patient Privacy ◽

Local Data ◽

Health Records ◽

Electronic Health ◽

Mlp Model ◽

Combined Data ◽

Better Than

Machine learning (ML) models require large datasets which may be siloed across different healthcare institutions. Using federated learning, a ML technique that avoids locally aggregating raw clinical data across multiple institutions, we predict mortality within seven days in hospitalized COVID-19 patients. Patient data was collected from Electronic Health Records (EHRs) from five hospitals within the Mount Sinai Health System (MSHS). Logistic Regression with L1 regularization (LASSO) and Multilayer Perceptron (MLP) models were trained using local data at each site, a pooled model with combined data from all five sites, and a federated model that only shared parameters with a central aggregator. Both the federated LASSO and federated MLP models performed better than their local model counterparts at four hospitals. The federated MLP model also outperformed the federated LASSO model at all hospitals. Federated learning shows promise in COVID-19 EHR data to develop robust predictive models without compromising patient privacy.

Download Full-text

The Relationship between the Nurses’ Perception of Electronic Health Records and Patient Privacy

Hospital Topics ◽

10.1080/00185868.2020.1799729 ◽

2020 ◽

Vol 98 (4) ◽

pp. 155-162

Author(s):

Özlem Özer ◽

Okan Özkan ◽

Fatih Budak

Keyword(s):

Electronic Health Records ◽

Patient Privacy ◽

Health Records ◽

Electronic Health ◽

The Relationship

Download Full-text

Legal and Ethical Considerations in the Implementation of Electronic Health Records

Healthcare Ethics and Training ◽

10.4018/978-1-5225-2237-9.ch045 ◽

2017 ◽

pp. 960-973

Author(s):

Karen Ervin

Keyword(s):

United States ◽

Electronic Health Records ◽

Relevant Literature ◽

Database Systems ◽

The United States ◽

Patient Privacy ◽

Health Records ◽

Electronic Health ◽

Health Database ◽

State Library

This chapter examines the literature of healthcare in the United States during the transitioning to electronic records. Key government legislation, such as the Health Insurance Portability and Accountability Act (HIPAA) and the Health Information Technology for Economic and Clinical Health Act (HITECH), which were part of the American Recovery and Reinvestment Act (ARRA) and the Affordable Health Care Act, are reviewed. The review concentrates on patient privacy issues, how they have been addressed in these acts, and what recommendations for improvement have been found in the literature. A comparison of the adoption of electronic health records on a nationwide scale in three countries is included. England, Australia, and the United States are all embarking in and are at different stages of implementing nationwide electronic health database systems. The resources used in locating relevant literature were PubMed, Medline, Highwire Press, State Library of Pennsylvania, and Google Scholar databases.

Download Full-text

Visual Analytics for Dimension Reduction and Cluster Analysis of High Dimensional Electronic Health Records

Informatics ◽

10.3390/informatics7020017 ◽

2020 ◽

Vol 7 (2) ◽

pp. 17 ◽

Cited By ~ 1

Author(s):

Sheikh S. Abdullah ◽

Neda Rostamzadeh ◽

Kamran Sedig ◽

Amit X. Garg ◽

Eric McArthur

Keyword(s):

Cluster Analysis ◽

Electronic Health Records ◽

Dimension Reduction ◽

Visual Analytics ◽

Machine Learning Techniques ◽

High Dimensional ◽

Health Records ◽

Wide Range ◽

Electronic Health ◽

And Cluster Analysis

Recent advancement in EHR-based (Electronic Health Record) systems has resulted in producing data at an unprecedented rate. The complex, growing, and high-dimensional data available in EHRs creates great opportunities for machine learning techniques such as clustering. Cluster analysis often requires dimension reduction to achieve efficient processing time and mitigate the curse of dimensionality. Given a wide range of techniques for dimension reduction and cluster analysis, it is not straightforward to identify which combination of techniques from both families leads to the desired result. The ability to derive useful and precise insights from EHRs requires a deeper understanding of the data, intermediary results, configuration parameters, and analysis processes. Although these tasks are often tackled separately in existing studies, we present a visual analytics (VA) system, called Visual Analytics for Cluster Analysis and Dimension Reduction of High Dimensional Electronic Health Records (VALENCIA), to address the challenges of high-dimensional EHRs in a single system. VALENCIA brings a wide range of cluster analysis and dimension reduction techniques, integrate them seamlessly, and make them accessible to users through interactive visualizations. It offers a balanced distribution of processing load between users and the system to facilitate the performance of high-level cognitive tasks in such a way that would be difficult without the aid of a VA system. Through a real case study, we have demonstrated how VALENCIA can be used to analyze the healthcare administrative dataset stored at ICES. This research also highlights what needs to be considered in the future when developing VA systems that are designed to derive deep and novel insights into EHRs.

Download Full-text

Challenges in defining Long COVID: Striking differences across literature, Electronic Health Records, and patient-reported information

10.1101/2021.03.20.21253896 ◽

2021 ◽

Author(s):

Halie M. Rando ◽

Tellen D. Bennett ◽

James Brian Byrd ◽

Carolyn Bramante ◽

Tiffany J. Callahan ◽

...

Keyword(s):

Electronic Health Records ◽

Multiple Organ ◽

Health Records ◽

Health Crisis ◽

Organ Systems ◽

Wide Range ◽

Patient Reported ◽

Electronic Health ◽

Novel Coronavirus

Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenges globally. In addition to a complex acute presentation that can affect multiple organ systems, increasing evidence points to long-term sequelae being common and impactful. As the worldwide scientific community forges ahead with efforts to characterize a wide range of outcomes associated with SARS-CoV-2 infection, the proliferation of available data has made it clear that formal definitions are needed in order to design robust and consistent studies of Long COVID that consistently capture variation in long-term outcomes. In the present study, we investigate the definitions used in the literature published to date and compare them against data available from electronic health records and patient-reported information collected via surveys. Long COVID holds the potential to produce a second public health crisis on the heels of the pandemic. Proactive efforts to identify the characteristics of this heterogeneous condition are imperative for a rigorous scientific effort to investigate and mitigate this threat.

Download Full-text

EHRtemporalVariability: delineating temporal dataset shifts in electronic health records

10.1101/2020.04.07.20056564 ◽

2020 ◽

Author(s):

Carlos Sáez ◽

Alba Gutiérrez-Sacristán ◽

Isaac Kohane ◽

Juan M García-Gómez ◽

Paul Avillach

Keyword(s):

Electronic Health Records ◽

R Package ◽

Reliable Data ◽

Data Reuse ◽

Statistical Distributions ◽

Health Records ◽

Link Type ◽

Wide Range ◽

Electronic Health ◽

Over Time

AbstractBackgroundTemporal variability in healthcare processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal dataset shifts can present as trends, abrupt or seasonal changes in the statistical distributions of data over time, being particularly complex to address in multi-modal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large historical data from EHRs, there is a need for specific software methods to help delineate temporal dataset shifts to ensure reliable data reuse.FindingsEHRtemporalVariability is an Open Source R-package and Shiny-app designed to explore and identify temporal dataset shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time, projects their temporal-evolution through non-parametric Information Geometric Temporal plots, and enables the exploration of changes in variables through Data Temporal Heatmaps. We demonstrate the capability of EHRtemporalVariability to delineate dataset shifts in three impact case studies, one of them available for reproducibility.ConclusionsEHRtemporalVariability enables the exploration and identification of dataset shifts, contributing to broadly examine and repurpose large, longitudinal datasets. Our goal is to help ensure reliable data reuse to a wide range of biomedical data users. EHRtemporalVariability is suited to technical users programmatically using the R-package and to those users not familiar with programming using the Shiny user interface.Availabilityhttps://github.com/hms-dbmi/EHRtemporalVariability/ Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.html On-line demo: http://ehrtemporalvariability.upv.es/

Download Full-text

Healthchain: A novel framework on privacy preservation of electronic health records using blockchain technology

PLoS ONE ◽

10.1371/journal.pone.0243043 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243043

Author(s):

Shekha Chenthara ◽

Khandakar Ahmed ◽

Hua Wang ◽

Frank Whittaker ◽

Zhenxiang Chen

Keyword(s):

Electronic Health Records ◽

Data Security ◽

Data Privacy ◽

Health Data ◽

Cyber Attacks ◽

Security Risk ◽

Patient Privacy ◽

Health Records ◽

Blockchain Technology ◽

Electronic Health

The privacy of Electronic Health Records (EHRs) is facing a major hurdle with outsourcing private health data in the cloud as there exists danger of leaking health information to unauthorized parties. In fact, EHRs are stored on centralized databases that increases the security risk footprint and requires trust in a single authority which cannot effectively protect data from internal attacks. This research focuses on ensuring the patient privacy and data security while sharing the sensitive data across same or different organisations as well as healthcare providers in a distributed environment. This research develops a privacy-preserving framework viz Healthchain based on Blockchain technology that maintains security, privacy, scalability and integrity of the e-health data. The Blockchain is built on Hyperledger fabric, a permissioned distributed ledger solutions by using Hyperledger composer and stores EHRs by utilizing InterPlanetary File System (IPFS) to build this healthchain framework. Moreover, the data stored in the IPFS is encrypted by using a unique cryptographic public key encryption algorithm to create a robust blockchain solution for electronic health data. The objective of the research is to provide a foundation for developing security solutions against cyber-attacks by exploiting the inherent features of the blockchain, and thus contribute to the robustness of healthcare information sharing environments. Through the results, the proposed model shows that the healthcare records are not traceable to unauthorized access as the model stores only the encrypted hash of the records that proves effectiveness in terms of data security, enhanced data privacy, improved data scalability, interoperability and data integrity while sharing and accessing medical records among stakeholders across the healthchain network.

Download Full-text

Legal and Ethical Considerations in the Implementation of Electronic Health Records

Cases on Electronic Records and Resource Management Implementation in Diverse Environments - Advances in Information Quality and Management ◽

10.4018/978-1-4666-4466-3.ch012 ◽

2014 ◽

pp. 193-210

Author(s):

Karen Ervin

Keyword(s):

United States ◽

Electronic Health Records ◽

Relevant Literature ◽

Database Systems ◽

The United States ◽

Patient Privacy ◽

Health Records ◽

Electronic Health ◽

Health Database ◽

State Library

This chapter examines the literature of healthcare in the United States during the transitioning to electronic records. Key government legislation, such as the Health Insurance Portability and Accountability Act (HIPAA) and the Health Information Technology for Economic and Clinical Health Act (HITECH), which were part of the American Recovery and Reinvestment Act (ARRA) and the Affordable Health Care Act, are reviewed. The review concentrates on patient privacy issues, how they have been addressed in these acts, and what recommendations for improvement have been found in the literature. A comparison of the adoption of electronic health records on a nationwide scale in three countries is included. England, Australia, and the United States are all embarking in and are at different stages of implementing nationwide electronic health database systems. The resources used in locating relevant literature were PubMed, Medline, Highwire Press, State Library of Pennsylvania, and Google Scholar databases.

Download Full-text