Federated queries of clinical data repositories: balancing accuracy and privacy

Mapping Intimacies ◽

10.1101/841072 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yun William Yu ◽

Griffin M Weber

Keyword(s):

Clinical Data ◽

Data Networks ◽

Healthcare Organizations ◽

Data Repositories ◽

Streaming Algorithm ◽

Federated Queries ◽

Access Data

AbstractResearchers use large federated clinical data networks that connect dozens of healthcare organizations to access data on millions of patients. However, because patients often receive care from multiple sites in the network, queries frequently double-count patients. Using the probabilistic streaming algorithm HyperLogLog and adding obfuscation, we developed a scalable method for estimating the number of distinct lives that match a query, which balances accuracy and privacy in a “tunable” way.

Download Full-text

Federated queries of clinical data repositories: the sum of the parts does not equal the whole

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2012-001299 ◽

2013 ◽

Vol 20 (e1) ◽

pp. e155-e161 ◽

Cited By ~ 17

Author(s):

G. M. Weber

Keyword(s):

Clinical Data ◽

Data Repositories ◽

Federated Queries

Download Full-text

Federated queries of clinical data repositories: Scaling to a national network

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2015.04.012 ◽

2015 ◽

Vol 55 ◽

pp. 231-236 ◽

Cited By ~ 10

Author(s):

Griffin M. Weber

Keyword(s):

Clinical Data ◽

Data Repositories ◽

National Network ◽

Federated Queries

Download Full-text

Balancing Accuracy and Privacy in Federated Queries of Clinical Data Repositories: Algorithm Development and Validation

Journal of Medical Internet Research ◽

10.2196/18735 ◽

2020 ◽

Vol 22 (11) ◽

pp. e18735

Author(s):

Yun William Yu ◽

Griffin M Weber

Keyword(s):

Clinical Data ◽

Medical Information ◽

Homomorphic Encryption ◽

Probabilistic Approach ◽

Large Networks ◽

Large Hospital ◽

Data Repositories ◽

Trade Offs ◽

Number Of Patients ◽

Federated Queries

Background Over the past decade, the emergence of several large federated clinical data networks has enabled researchers to access data on millions of patients at dozens of health care organizations. Typically, queries are broadcast to each of the sites in the network, which then return aggregate counts of the number of matching patients. However, because patients can receive care from multiple sites in the network, simply adding the numbers frequently double counts patients. Various methods such as the use of trusted third parties or secure multiparty computation have been proposed to link patient records across sites. However, they either have large trade-offs in accuracy and privacy or are not scalable to large networks. Objective This study aims to enable accurate estimates of the number of patients matching a federated query while providing strong guarantees on the amount of protected medical information revealed. Methods We introduce a novel probabilistic approach to running federated network queries. It combines an algorithm called HyperLogLog with obfuscation in the form of hashing, masking, and homomorphic encryption. It is tunable, in that it allows networks to balance accuracy versus privacy, and it is computationally efficient even for large networks. We built a user-friendly free open-source benchmarking platform to simulate federated queries in large hospital networks. Using this platform, we compare the accuracy, k-anonymity privacy risk (with k=10), and computational runtime of our algorithm with several existing techniques. Results In simulated queries matching 1 to 100 million patients in a 100-hospital network, our method was significantly more accurate than adding aggregate counts while maintaining k-anonymity. On average, it required a total of 12 kilobytes of data to be sent to the network hub and added only 5 milliseconds to the overall federated query runtime. This was orders of magnitude better than other approaches, which guaranteed the exact answer. Conclusions Using our method, it is possible to run highly accurate federated queries of clinical data repositories that both protect patient privacy and scale to large networks.

Download Full-text

Balancing Accuracy and Privacy in Federated Queries of Clinical Data Repositories: Algorithm Development and Validation (Preprint)

10.2196/preprints.18735 ◽

2020 ◽

Author(s):

Yun William Yu ◽

Griffin M Weber

Keyword(s):

Clinical Data ◽

Medical Information ◽

Homomorphic Encryption ◽

Probabilistic Approach ◽

Large Networks ◽

Large Hospital ◽

Data Repositories ◽

Trade Offs ◽

Number Of Patients ◽

Federated Queries

BACKGROUND Over the past decade, the emergence of several large federated clinical data networks has enabled researchers to access data on millions of patients at dozens of health care organizations. Typically, queries are broadcast to each of the sites in the network, which then return aggregate counts of the number of matching patients. However, because patients can receive care from multiple sites in the network, simply adding the numbers frequently double counts patients. Various methods such as the use of trusted third parties or secure multiparty computation have been proposed to link patient records across sites. However, they either have large trade-offs in accuracy and privacy or are not scalable to large networks. OBJECTIVE This study aims to enable accurate estimates of the number of patients matching a federated query while providing strong guarantees on the amount of protected medical information revealed. METHODS We introduce a novel probabilistic approach to running federated network queries. It combines an algorithm called HyperLogLog with obfuscation in the form of hashing, masking, and homomorphic encryption. It is tunable, in that it allows networks to balance accuracy versus privacy, and it is computationally efficient even for large networks. We built a user-friendly free open-source benchmarking platform to simulate federated queries in large hospital networks. Using this platform, we compare the accuracy, k-anonymity privacy risk (with k=10), and computational runtime of our algorithm with several existing techniques. RESULTS In simulated queries matching 1 to 100 million patients in a 100-hospital network, our method was significantly more accurate than adding aggregate counts while maintaining k-anonymity. On average, it required a total of 12 kilobytes of data to be sent to the network hub and added only 5 milliseconds to the overall federated query runtime. This was orders of magnitude better than other approaches, which guaranteed the exact answer. CONCLUSIONS Using our method, it is possible to run highly accurate federated queries of clinical data repositories that both protect patient privacy and scale to large networks.

Download Full-text

Imbalanced target prediction with pattern discovery on clinical data repositories

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-017-0443-3 ◽

2017 ◽

Vol 17 (1) ◽

Cited By ~ 6

Author(s):

Tak-Ming Chan ◽

Yuxi Li ◽

Choo-Chiap Chiau ◽

Jane Zhu ◽

Jie Jiang ◽

...

Keyword(s):

Clinical Data ◽

Pattern Discovery ◽

Target Prediction ◽

Data Repositories

Download Full-text

Admission Control and Profitability Analysis in Dynamic Spectrum Access Data Networks

Proceedings of the 7th International Conference on Performance Evaluation Methodologies and Tools ◽

10.4108/icst.valuetools.2013.254390 ◽

2014 ◽

Author(s):

Sinem Kockan ◽

David Starobinski

Keyword(s):

Admission Control ◽

Dynamic Spectrum Access ◽

Dynamic Spectrum ◽

Data Networks ◽

Spectrum Access ◽

Profitability Analysis ◽

Access Data

Download Full-text

An Ontology-based Mediator of Clinical Information for Decision Support Systems

Methods of Information in Medicine ◽

10.3414/me9126 ◽

2008 ◽

Vol 47 (06) ◽

pp. 549-559 ◽

Cited By ~ 8

Author(s):

K. Ohe ◽

Y. Kawazoe

Keyword(s):

Decision Support ◽

Decision Support System ◽

Clinical Data ◽

Support System ◽

Adverse Drug Events ◽

Clinical Information ◽

Temporal Information ◽

Temporal Abstraction ◽

Alert System ◽

Data Repositories

Summary Objective: We have been developing a decision support system that uses electronic clinical data and provides alerts to clinicians. However, the inference rules for such a system are difficult to write in terms of representing domain concepts and temporal reasoning. To address this problem, we have developed an ontologybased mediator of clinical information for the decision support system. Methods: Our approach consists of three steps: 1) development of an ontology-based mediator that represents domain concepts and temporal information; 2) mapping of clinical data to corresponding concepts in the mediator; 3) temporal abstraction that creates high-level, interval-based concepts from time-stamped clinical data. As a result, we can write a concept-based rule expression that is available for use in domain concepts and interval-based temporal information. The proposed approach was applied to a prototype of clinical alert system, and the rules for adverse drug events were executed on data gathered over a 3-month period. Results: The system generated 615 alerts. 346 cases (56%) were considered appropriate and 269 cases (44%) were inappropriate. Of the false alerts, 192 cases were due to data inaccuracy and 77 cases were due to insufficiency of the temporal abstraction. Conclusion: Our approach enabled to represent a concept-based rule expression that was available for the prototype of a clinical alert system. We believe our approach will contribute to narrow the gaps of information model between domain concepts and clinical data repositories.

Download Full-text

A national action plan for sharable and comparable nursing data to support practice and translational research for transforming health care

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocu011 ◽

2015 ◽

Vol 22 (3) ◽

pp. 600-607 ◽

Cited By ~ 10

Author(s):

Bonnie L Westra ◽

Gail E Latimer ◽

Susan A Matney ◽

Jung In Park ◽

Joyce Sensmeier ◽

...

Keyword(s):

Health Care ◽

Big Data ◽

Translational Research ◽

Clinical Data ◽

Action Plan ◽

Information Structures ◽

Data Repositories ◽

National Action Plan ◽

Quality Reporting ◽

Nursing Data

Abstract Background There is wide recognition that, with the rapid implementation of electronic health records (EHRs), large data sets are available for research. However, essential standardized nursing data are seldom integrated into EHRs and clinical data repositories. There are many diverse activities that exist to implement standardized nursing languages in EHRs; however, these activities are not coordinated, resulting in duplicate efforts rather than building a shared learning environment and resources. Objective The purpose of this paper is to describe the historical context of nursing terminologies, challenges to the use of nursing data for purposes other than documentation of care, and a national action plan for implementing and using sharable and comparable nursing data for quality reporting and translational research. Methods In 2013 and 2014, the University of Minnesota School of Nursing hosted a diverse group of nurses to participate in the Nursing Knowledge: Big Data and Science to Transform Health Care consensus conferences. This consensus conference was held to develop a national action plan and harmonize existing and new efforts of multiple individuals and organizations to expedite integration of standardized nursing data within EHRs and ensure their availability in clinical data repositories for secondary use. This harmonization will address the implementation of standardized nursing terminologies and subsequent access to and use of clinical nursing data. Conclusion Foundational to integrating nursing data into clinical data repositories for big data and science, is the implementation of standardized nursing terminologies, common data models, and information structures within EHRs. The 2014 National Action Plan for Sharable and Comparable Nursing Data for Transforming Health and Healthcare builds on and leverages existing, but separate long standing efforts of many individuals and organizations. The plan is action focused, with accountability for coordinating and tracking progress designated.

Download Full-text