scholarly journals Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task

JAMIA Open ◽  
2020 ◽  
Vol 3 (3) ◽  
pp. 395-404 ◽  
Author(s):  
Steven R Chamberlin ◽  
Steven D Bedrick ◽  
Aaron M Cohen ◽  
Yanshan Wang ◽  
Andrew Wen ◽  
...  

Abstract Objective Growing numbers of academic medical centers offer patient cohort discovery tools to their researchers, yet the performance of systems for this use case is not well understood. The objective of this research was to assess patient-level information retrieval methods using electronic health records for different types of cohort definition retrieval. Materials and Methods We developed a test collection consisting of about 100 000 patient records and 56 test topics that characterized patient cohort requests for various clinical studies. Automated information retrieval tasks using word-based approaches were performed, varying 4 different parameters for a total of 48 permutations, with performance measured using B-Pref. We subsequently created structured Boolean queries for the 56 topics for performance comparisons. In addition, we performed a more detailed analysis of 10 topics. Results The best-performing word-based automated query parameter settings achieved a mean B-Pref of 0.167 across all 56 topics. The way a topic was structured (topic representation) had the largest impact on performance. Performance not only varied widely across topics, but there was also a large variance in sensitivity to parameter settings across the topics. Structured queries generally performed better than automated queries on measures of recall and precision but were still not able to recall all relevant patients found by the automated queries. Conclusion While word-based automated methods of cohort retrieval offer an attractive solution to the labor-intensive nature of this task currently used at many medical centers, we generally found suboptimal performance in those approaches, with better performance obtained from structured Boolean queries. Future work will focus on using the test collection to develop and evaluate new approaches to query structure, weighting algorithms, and application of semantic methods.

2019 ◽  
Author(s):  
Steven D. Bedrick ◽  
Aaron M. Cohen ◽  
Yanshan Wang ◽  
Andrew Wen ◽  
Sijia Liu ◽  
...  

ABSTRACTObjectiveGrowing numbers of academic medical centers offer patient cohort discovery tools to their researchers, yet the performance of systems for this use case is not well-understood. The objective of this research was to assess patient-level information retrieval (IR) methods using electronic health records (EHR) for different types of cohort definition retrieval.Materials and MethodsWe developed a test collection consisting of about 100,000 patient records and 56 test topics that characterized patient cohort requests for various clinical studies. Automated IR tasks using word-based approaches were performed, varying four different parameters for a total of 48 permutations, with performance measured using B-Pref. We subsequently created structured Boolean queries for the 56 topics for performance comparisons. In addition, we performed a more detailed analysis of 10 topics.ResultsThe best-performing word-based automated query parameter settings achieved a mean B-Pref of 0.167 across all 56 topics. The way a topic was structured (topic representation) had the largest impact on performance. Performance not only varied widely across topics, but there was also a large variance in sensitivity to parameter settings across the topics. Structured queries generally performed better than automated queries on measures of recall and precision, but were still not able to recall all relevant patients found by the automated queries.ConclusionWhile word-based automated methods of cohort retrieval offer an attractive solution to the labor-intensive nature of this task currently used at many medical centers, we generally found suboptimal performance in those approaches, with better performance obtained from structured Boolean queries. Insights gained in this preliminary analysis will help guide future work to develop new methods for patient-level cohort discovery with EHR data.


2019 ◽  
Author(s):  
Steven R. Chamberlin ◽  
Steven D. Bedrick ◽  
Aaron M. Cohen ◽  
Yanshan Wang ◽  
Andrew Wen ◽  
...  

AbstractPerformance of systems used for patient cohort identification with electronic health record (EHR) data is not well-characterized. The objective of this research was to evaluate factors that might affect information retrieval (IR) methods and to investigate the interplay between commonly used IR approaches and the characteristics of the cohort definition structure.We used an IR test collection containing 56 test patient cohort definitions, 100,000 patient records originating from an academic medical institution EHR data warehouse, and automated word-base query tasks, varying four parameters. Performance was measured using B-Pref. We then designed 59 taxonomy characteristics to classify the structure of the 56 topics. In addition, six topic complexity measures were derived from these characteristics for further evaluation using a beta regression simulation.We did not find a strong association between the 59 taxonomy characteristics and patient retrieval performance, but we did find strong performance associations with the six topic complexity measures created from these characteristics, and interactions between these measures and the automated query parameter settings.Some of the characteristics derived from a query taxonomy could lead to improved selection of approaches based on the structure of the topic of interest. Insights gained here will help guide future work to develop new methods for patient-level cohort discovery with EHR data.


2019 ◽  
Vol 69 (686) ◽  
pp. e605-e611 ◽  
Author(s):  
Helen P Booth ◽  
Arlene M Gallagher ◽  
David Mullett ◽  
Lucy Carty ◽  
Shivani Padmanabhan ◽  
...  

BackgroundQuality improvement (QI) is a priority for general practice, and GPs are expected to participate in and provide evidence of QI activity. There is growing interest in harnessing the potential of electronic health records (EHR) to improve patient care by supporting practices to find cases that could benefit from a medicines review.AimTo develop scalable and reproducible prescribing safety reports using patient-level EHR data.Design and settingUK general practices that contribute de-identified patient data to the Clinical Practice Research Datalink (CPRD).MethodA scoping phase used stakeholder consultations to identify primary care QI needs and potential indicators. QI reports containing real data were sent to 12 pilot practices that used Vision GP software and had expressed interest. The scale-up phase involved automating production and distribution of reports to all contributing practices that used both Vision and EMIS software systems. Benchmarking reports with patient-level case review lists for two prescribing safety indicators were sent to 457 practices in December 2017 following the initial scale-up (Figure 2).ResultsTwo indicators were selected from the Royal College of General Practitioners Patient Safety Toolkit following stakeholder consultations for the pilot phase involving 12 GP practices. Pilot phase interviews showed that reports were used to review individual patient care, implement wider QI actions in the practice, and for appraisal and revalidation.ConclusionElectronic health record data can be used to provide standardised, reproducible reports that can be delivered at scale with minimal resource requirements. These can be used in a national QI initiative that impacts directly on patient care.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255467
Author(s):  
Xia Ning ◽  
Ziwei Fan ◽  
Evan Burgun ◽  
Zhiyun Ren ◽  
Titus Schleyer

Due to the rapid growth of information available about individual patients, most physicians suffer from information overload and inefficiencies when they review patient information in health information technology systems. In this paper, we present a novel hybrid dynamic and multi-collaborative filtering method to improve information retrieval from electronic health records. This method recommends relevant information from electronic health records to physicians during patient visits. It models information search dynamics using a Markov model. It also leverages the key idea of collaborative filtering, originating from Recommender Systems, for prioritizing information based on various similarities among physicians, patients and information items. We tested this new method using electronic health record data from the Indiana Network for Patient Care, a large, inter-organizational clinical data repository maintained by the Indiana Health Information Exchange. Our experimental results demonstrated that, for top-5 recommendations, our method was able to correctly predict the information in which physicians were interested in 46.7% of all test cases. For top-1 recommendations, the corresponding figure was 24.7%. In addition, the new method was 22.3% better than the conventional Markov model for top-1 recommendations.


2017 ◽  
Vol 68 (11) ◽  
pp. 2636-2648 ◽  
Author(s):  
Stephen Wu ◽  
Sijia Liu ◽  
Yanshan Wang ◽  
Tamara Timmons ◽  
Harsha Uppili ◽  
...  

2021 ◽  
pp. 450-458
Author(s):  
Carsten Schröder ◽  
Marcus Lawrance ◽  
Chen Li ◽  
Christelle Lenain ◽  
Shivani K. Mhatre ◽  
...  

PURPOSE External control (EC) arms derived from electronic health records (EHRs) can provide appropriate comparison groups when randomized control arms are not feasible, but have not been explored for metastatic colorectal cancer (mCRC) trials. We constructed EC arms from two patient-level EHR-derived databases and evaluated them against the control arm from a phase III, randomized controlled mCRC trial. METHODS IMblaze370 evaluated atezolizumab with or without cobimetinib versus regorafenib in patients with mCRC. EC arms were constructed from the Flatiron Health (FH) EHR-derived de-identified database and the combined FH/Foundation Medicine Clinico-Genomic Database (CGDB). IMblaze370 eligibility criteria were applied to the EC cohorts. Propensity scores and standardized mortality ratio weighting were used to balance baseline characteristics between the IMblaze370 and EC arms; balance was assessed using standardized mean differences. Kaplan-Meier method estimated median overall survival (OS). Cox proportional hazards models estimated hazard ratios with bootstrapped 95% CIs to compare differences in OS between study arms. RESULTS The FH EC included 184 patients; the CGDB EC included 108 patients. Most characteristics were well-balanced (standardized mean difference < 0.1) between each EC arm and the IMblaze370 population. Median OS was similar between the IMblaze370 control arm (8.5 months [95% CI, 6.41 to 10.71]) and both EC arms: FH (8.5 months [6.93 to 9.92]) and CGDB (8.8 months [7.85 to 9.92]). OS comparisons between the IMblaze370 experimental arm and the FH EC (hazard ratio, 0.85 [0.64 to 1.14]) and CGDB EC (0.86 [0.65 to 1.18]) yielded similar results as the comparison with the IMblaze370 control arm (1.01 [0.75 to 1.37]). CONCLUSION EC arms constructed from the FH database and the CGDB closely replicated the control arm from IMblaze370. EHR-derived EC arms can provide meaningful comparators in mCRC trials when recruiting a randomized control arm is not feasible.


2011 ◽  
Vol 4 (0) ◽  
Author(s):  
Michael Klompas ◽  
Chaim Kirby ◽  
Jason McVetta ◽  
Paul Oppedisano ◽  
John Brownstein ◽  
...  

2019 ◽  
Vol 16 (3) ◽  
pp. 273-282 ◽  
Author(s):  
Susan M Shortreed ◽  
Carolyn M Rutter ◽  
Andrea J Cook ◽  
Gregory E Simon

Background Pragmatic clinical trials often use automated data sources such as electronic health records, claims, or registries to identify eligible individuals and collect outcome information. A specific advantage that this automated data collection often yields is having data on potential participants when design decisions are being made. We outline how this data can be used to inform trial design. Methods Our work is motivated by a pragmatic clinical trial evaluating the impact of suicide-prevention outreach interventions on fatal and non-fatal suicide attempts in the 18 months after randomization. We illustrate our recommended approaches for designing pragmatic clinical trials using historical data from the health systems participating in this study. Specifically, we illustrate how electronic health record data can be used to inform the selection of trial eligibility requirements, to estimate the distribution of participant characteristics over the course of the trial, and to conduct power and sample size calculations. Results Data from 122,873 people with patient health questionnaire (PHQ) responses, recorded in their electronic health records between 1 July 2010 and 31 March 2012, were used to show that the suicide attempt rate in the 18 months following completion of the questionnaire varies by response to item nine of the PHQ. We estimated that the proportion of individuals with a prior recorded elevated PHQ (i.e. history of suicidal ideation) would decrease from approximately 50% at the beginning of a trial to about 5%, 50 weeks later. Using electronic health record data, we conducted simulations to estimate the power to detect a 25% reduction in suicide attempts. Simulation-based power calculations estimated that randomizing 8000 participants per randomization arm would allow 90% power to detect a 25% reduction in the suicide attempt rate in the intervention arm compared to usual care at an alpha rate of 0.05. Conclusions Historical data can be used to inform the design of pragmatic clinical trials, a strength of trials that use automated data collection for randomizing participants and assessing outcomes. In particular, realistic sample size calculations can be conducted using real-world data from the health systems in which the trial will be conducted. Data-informed trial design should yield more realistic estimates of statistical power and maximize efficiency of trial recruitment.


Sign in / Sign up

Export Citation Format

Share Document