Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task

Steven R Chamberlin; Steven D Bedrick; Aaron M Cohen; Yanshan Wang; Andrew Wen; Sijia Liu; Hongfang Liu; William R Hersh

doi:10.1093/jamiaopen/ooaa026

Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task

JAMIA Open ◽

10.1093/jamiaopen/ooaa026 ◽

2020 ◽

Vol 3 (3) ◽

pp. 395-404 ◽

Cited By ~ 2

Author(s):

Steven R Chamberlin ◽

Steven D Bedrick ◽

Aaron M Cohen ◽

Yanshan Wang ◽

Andrew Wen ◽

...

Keyword(s):

Information Retrieval ◽

Test Collection ◽

Electronic Health Record Data ◽

Patient Cohort ◽

Patient Level ◽

Medical Centers ◽

Level Information ◽

Electronic Health ◽

Boolean Queries ◽

Information Retrieval Methods

Abstract Objective Growing numbers of academic medical centers offer patient cohort discovery tools to their researchers, yet the performance of systems for this use case is not well understood. The objective of this research was to assess patient-level information retrieval methods using electronic health records for different types of cohort definition retrieval. Materials and Methods We developed a test collection consisting of about 100 000 patient records and 56 test topics that characterized patient cohort requests for various clinical studies. Automated information retrieval tasks using word-based approaches were performed, varying 4 different parameters for a total of 48 permutations, with performance measured using B-Pref. We subsequently created structured Boolean queries for the 56 topics for performance comparisons. In addition, we performed a more detailed analysis of 10 topics. Results The best-performing word-based automated query parameter settings achieved a mean B-Pref of 0.167 across all 56 topics. The way a topic was structured (topic representation) had the largest impact on performance. Performance not only varied widely across topics, but there was also a large variance in sensitivity to parameter settings across the topics. Structured queries generally performed better than automated queries on measures of recall and precision but were still not able to recall all relevant patients found by the automated queries. Conclusion While word-based automated methods of cohort retrieval offer an attractive solution to the labor-intensive nature of this task currently used at many medical centers, we generally found suboptimal performance in those approaches, with better performance obtained from structured Boolean queries. Future work will focus on using the test collection to develop and evaluate new approaches to query structure, weighting algorithms, and application of semantic methods.

Download Full-text

Evaluation of Patient-Level Retrieval from Electronic Health Record Data for a Cohort Discovery Task

10.1101/19005280 ◽

2019 ◽

Cited By ~ 1

Author(s):

Steven D. Bedrick ◽

Aaron M. Cohen ◽

Yanshan Wang ◽

Andrew Wen ◽

Sijia Liu ◽

...

Keyword(s):

Test Collection ◽

Electronic Health Record Data ◽

Patient Cohort ◽

Patient Level ◽

Medical Centers ◽

Academic Medical ◽

Level Information ◽

Electronic Health ◽

Boolean Queries ◽

Future Work

ABSTRACTObjectiveGrowing numbers of academic medical centers offer patient cohort discovery tools to their researchers, yet the performance of systems for this use case is not well-understood. The objective of this research was to assess patient-level information retrieval (IR) methods using electronic health records (EHR) for different types of cohort definition retrieval.Materials and MethodsWe developed a test collection consisting of about 100,000 patient records and 56 test topics that characterized patient cohort requests for various clinical studies. Automated IR tasks using word-based approaches were performed, varying four different parameters for a total of 48 permutations, with performance measured using B-Pref. We subsequently created structured Boolean queries for the 56 topics for performance comparisons. In addition, we performed a more detailed analysis of 10 topics.ResultsThe best-performing word-based automated query parameter settings achieved a mean B-Pref of 0.167 across all 56 topics. The way a topic was structured (topic representation) had the largest impact on performance. Performance not only varied widely across topics, but there was also a large variance in sensitivity to parameter settings across the topics. Structured queries generally performed better than automated queries on measures of recall and precision, but were still not able to recall all relevant patients found by the automated queries.ConclusionWhile word-based automated methods of cohort retrieval offer an attractive solution to the labor-intensive nature of this task currently used at many medical centers, we generally found suboptimal performance in those approaches, with better performance obtained from structured Boolean queries. Insights gained in this preliminary analysis will help guide future work to develop new methods for patient-level cohort discovery with EHR data.

Download Full-text

A Query Taxonomy Describes Performance of Patient-Level Retrieval from Electronic Health Record Data

10.1101/19012294 ◽

2019 ◽

Cited By ~ 1

Author(s):

Steven R. Chamberlin ◽

Steven D. Bedrick ◽

Aaron M. Cohen ◽

Yanshan Wang ◽

Andrew Wen ◽

...

Keyword(s):

Electronic Health Record ◽

Strong Association ◽

Test Collection ◽

Health Record ◽

Electronic Health Record Data ◽

Complexity Measures ◽

Patient Cohort ◽

Patient Level ◽

Cohort Identification ◽

Electronic Health

AbstractPerformance of systems used for patient cohort identification with electronic health record (EHR) data is not well-characterized. The objective of this research was to evaluate factors that might affect information retrieval (IR) methods and to investigate the interplay between commonly used IR approaches and the characteristics of the cohort definition structure.We used an IR test collection containing 56 test patient cohort definitions, 100,000 patient records originating from an academic medical institution EHR data warehouse, and automated word-base query tasks, varying four parameters. Performance was measured using B-Pref. We then designed 59 taxonomy characteristics to classify the structure of the 56 topics. In addition, six topic complexity measures were derived from these characteristics for further evaluation using a beta regression simulation.We did not find a strong association between the 59 taxonomy characteristics and patient retrieval performance, but we did find strong performance associations with the six topic complexity measures created from these characteristics, and interactions between these measures and the automated query parameter settings.Some of the characteristics derived from a query taxonomy could lead to improved selection of approaches based on the structure of the topic of interest. Insights gained here will help guide future work to develop new methods for patient-level cohort discovery with EHR data.

Download Full-text

Quality improvement of prescribing safety: a pilot study in primary care using UK electronic health records

British Journal of General Practice ◽

10.3399/bjgp19x704597 ◽

2019 ◽

Vol 69 (686) ◽

pp. e605-e611 ◽

Cited By ~ 2

Author(s):

Helen P Booth ◽

Arlene M Gallagher ◽

David Mullett ◽

Lucy Carty ◽

Shivani Padmanabhan ◽

...

Keyword(s):

Primary Care ◽

Quality Improvement ◽

Electronic Health Records ◽

Patient Care ◽

Scale Up ◽

Pilot Phase ◽

Electronic Health Record Data ◽

Health Records ◽

Patient Level ◽

Electronic Health

BackgroundQuality improvement (QI) is a priority for general practice, and GPs are expected to participate in and provide evidence of QI activity. There is growing interest in harnessing the potential of electronic health records (EHR) to improve patient care by supporting practices to find cases that could benefit from a medicines review.AimTo develop scalable and reproducible prescribing safety reports using patient-level EHR data.Design and settingUK general practices that contribute de-identified patient data to the Clinical Practice Research Datalink (CPRD).MethodA scoping phase used stakeholder consultations to identify primary care QI needs and potential indicators. QI reports containing real data were sent to 12 pilot practices that used Vision GP software and had expressed interest. The scale-up phase involved automating production and distribution of reports to all contributing practices that used both Vision and EMIS software systems. Benchmarking reports with patient-level case review lists for two prescribing safety indicators were sent to 457 practices in December 2017 following the initial scale-up (Figure 2).ResultsTwo indicators were selected from the Royal College of General Practitioners Patient Safety Toolkit following stakeholder consultations for the pilot phase involving 12 GP practices. Pilot phase interviews showed that reports were used to review individual patient care, implement wider QI actions in the practice, and for appraisal and revalidation.ConclusionElectronic health record data can be used to provide standardised, reproducible reports that can be delivered at scale with minimal resource requirements. These can be used in a national QI initiative that impacts directly on patient care.

Download Full-text

Improving information retrieval from electronic health records using dynamic and multi-collaborative filtering

PLoS ONE ◽

10.1371/journal.pone.0255467 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255467

Author(s):

Xia Ning ◽

Ziwei Fan ◽

Evan Burgun ◽

Zhiyun Ren ◽

Titus Schleyer

Keyword(s):

Information Retrieval ◽

Electronic Health Records ◽

Markov Model ◽

Collaborative Filtering ◽

Health Information ◽

Information Exchange ◽

New Method ◽

Electronic Health Record Data ◽

Health Records ◽

Electronic Health

Due to the rapid growth of information available about individual patients, most physicians suffer from information overload and inefficiencies when they review patient information in health information technology systems. In this paper, we present a novel hybrid dynamic and multi-collaborative filtering method to improve information retrieval from electronic health records. This method recommends relevant information from electronic health records to physicians during patient visits. It models information search dynamics using a Markov model. It also leverages the key idea of collaborative filtering, originating from Recommender Systems, for prioritizing information based on various similarities among physicians, patients and information items. We tested this new method using electronic health record data from the Indiana Network for Patient Care, a large, inter-organizational clinical data repository maintained by the Indiana Health Information Exchange. Our experimental results demonstrated that, for top-5 recommendations, our method was able to correctly predict the information in which physicians were interested in 46.7% of all test cases. For top-1 recommendations, the corresponding figure was 24.7%. In addition, the new method was 22.3% better than the conventional Markov model for top-1 recommendations.

Download Full-text

Intrainstitutional EHR collections for patient-level information retrieval

Journal of the Association for Information Science and Technology ◽

10.1002/asi.23884 ◽

2017 ◽

Vol 68 (11) ◽

pp. 2636-2648 ◽

Cited By ~ 4

Author(s):

Stephen Wu ◽

Sijia Liu ◽

Yanshan Wang ◽

Tamara Timmons ◽

Harsha Uppili ◽

...

Keyword(s):

Information Retrieval ◽

Patient Level ◽

Level Information

Download Full-text

Building External Control Arms From Patient-Level Electronic Health Record Data to Replicate the Randomized IMblaze370 Control Arm in Metastatic Colorectal Cancer

JCO Clinical Cancer Informatics ◽

10.1200/cci.20.00149 ◽

2021 ◽

pp. 450-458

Author(s):

Carsten Schröder ◽

Marcus Lawrance ◽

Chen Li ◽

Christelle Lenain ◽

Shivani K. Mhatre ◽

...

Keyword(s):

Colorectal Cancer ◽

Metastatic Colorectal Cancer ◽

Cox Proportional Hazards ◽

Phase Iii ◽

External Control ◽

Standardized Mortality Ratio ◽

Electronic Health Record Data ◽

Patient Level ◽

Randomized Control ◽

Electronic Health

PURPOSE External control (EC) arms derived from electronic health records (EHRs) can provide appropriate comparison groups when randomized control arms are not feasible, but have not been explored for metastatic colorectal cancer (mCRC) trials. We constructed EC arms from two patient-level EHR-derived databases and evaluated them against the control arm from a phase III, randomized controlled mCRC trial. METHODS IMblaze370 evaluated atezolizumab with or without cobimetinib versus regorafenib in patients with mCRC. EC arms were constructed from the Flatiron Health (FH) EHR-derived de-identified database and the combined FH/Foundation Medicine Clinico-Genomic Database (CGDB). IMblaze370 eligibility criteria were applied to the EC cohorts. Propensity scores and standardized mortality ratio weighting were used to balance baseline characteristics between the IMblaze370 and EC arms; balance was assessed using standardized mean differences. Kaplan-Meier method estimated median overall survival (OS). Cox proportional hazards models estimated hazard ratios with bootstrapped 95% CIs to compare differences in OS between study arms. RESULTS The FH EC included 184 patients; the CGDB EC included 108 patients. Most characteristics were well-balanced (standardized mean difference < 0.1) between each EC arm and the IMblaze370 population. Median OS was similar between the IMblaze370 control arm (8.5 months [95% CI, 6.41 to 10.71]) and both EC arms: FH (8.5 months [6.93 to 9.92]) and CGDB (8.8 months [7.85 to 9.92]). OS comparisons between the IMblaze370 experimental arm and the FH EC (hazard ratio, 0.85 [0.64 to 1.14]) and CGDB EC (0.86 [0.65 to 1.18]) yielded similar results as the comparison with the IMblaze370 control arm (1.01 [0.75 to 1.37]). CONCLUSION EC arms constructed from the FH database and the CGDB closely replicated the control arm from IMblaze370. EHR-derived EC arms can provide meaningful comparators in mCRC trials when recruiting a randomized control arm is not feasible.

Download Full-text

Automated chronic disease surveillance and visualization using electronic health record data

Emerging Health Threats Journal ◽

10.3402/ehtj.v4i0.11102 ◽

2011 ◽

Vol 4 (0) ◽

Author(s):

Michael Klompas ◽

Chaim Kirby ◽

Jason McVetta ◽

Paul Oppedisano ◽

John Brownstein ◽

...

Keyword(s):

Chronic Disease ◽

Electronic Health Record ◽

Disease Surveillance ◽

Health Record ◽

Electronic Health Record Data ◽

Record Data ◽

Electronic Health

Download Full-text

Faculty Opinions recommendation of Evaluating delivery of low tidal volume ventilation in six icus using electronic health record data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.734212415.793572533 ◽

2020 ◽

Author(s):

Jeremy Beitler

Keyword(s):

Electronic Health Record ◽

Tidal Volume ◽

Health Record ◽

Electronic Health Record Data ◽

Low Tidal Volume ◽

Low Tidal Volume Ventilation ◽

Volume Ventilation ◽

Record Data ◽

Electronic Health

Download Full-text

Improving pragmatic clinical trial design using real-world data

Clinical Trials ◽

10.1177/1740774519833679 ◽

2019 ◽

Vol 16 (3) ◽

pp. 273-282 ◽

Cited By ~ 4

Author(s):

Susan M Shortreed ◽

Carolyn M Rutter ◽

Andrea J Cook ◽

Gregory E Simon

Keyword(s):

Clinical Trials ◽

Trial Design ◽

Suicide Attempts ◽

Real World Data ◽

Electronic Health Record Data ◽

Pragmatic Clinical Trial ◽

Pragmatic Clinical Trials ◽

Sample Size Calculations ◽

Record Data ◽

Electronic Health

Background Pragmatic clinical trials often use automated data sources such as electronic health records, claims, or registries to identify eligible individuals and collect outcome information. A specific advantage that this automated data collection often yields is having data on potential participants when design decisions are being made. We outline how this data can be used to inform trial design. Methods Our work is motivated by a pragmatic clinical trial evaluating the impact of suicide-prevention outreach interventions on fatal and non-fatal suicide attempts in the 18 months after randomization. We illustrate our recommended approaches for designing pragmatic clinical trials using historical data from the health systems participating in this study. Specifically, we illustrate how electronic health record data can be used to inform the selection of trial eligibility requirements, to estimate the distribution of participant characteristics over the course of the trial, and to conduct power and sample size calculations. Results Data from 122,873 people with patient health questionnaire (PHQ) responses, recorded in their electronic health records between 1 July 2010 and 31 March 2012, were used to show that the suicide attempt rate in the 18 months following completion of the questionnaire varies by response to item nine of the PHQ. We estimated that the proportion of individuals with a prior recorded elevated PHQ (i.e. history of suicidal ideation) would decrease from approximately 50% at the beginning of a trial to about 5%, 50 weeks later. Using electronic health record data, we conducted simulations to estimate the power to detect a 25% reduction in suicide attempts. Simulation-based power calculations estimated that randomizing 8000 participants per randomization arm would allow 90% power to detect a 25% reduction in the suicide attempt rate in the intervention arm compared to usual care at an alpha rate of 0.05. Conclusions Historical data can be used to inform the design of pragmatic clinical trials, a strength of trials that use automated data collection for randomizing participants and assessing outcomes. In particular, realistic sample size calculations can be conducted using real-world data from the health systems in which the trial will be conducted. Data-informed trial design should yield more realistic estimates of statistical power and maximize efficiency of trial recruitment.

Download Full-text

Clinical Comparison Between Trial Participants and Potentially Eligible Patients Using Electronic Health Record Data: A Generalizability Assessment Method

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2021.103822 ◽

2021 ◽

pp. 103822

Author(s):

James R. Rogers ◽

George Hripcsak ◽

Ying Kuen Cheung ◽

Chunhua Weng

Keyword(s):

Electronic Health Record ◽

Assessment Method ◽

Health Record ◽

Electronic Health Record Data ◽

Clinical Comparison ◽

Record Data ◽

Electronic Health ◽

Trial Participants

Download Full-text