scholarly journals Evaluation of Patient-Level Retrieval from Electronic Health Record Data for a Cohort Discovery Task

Author(s):  
Steven D. Bedrick ◽  
Aaron M. Cohen ◽  
Yanshan Wang ◽  
Andrew Wen ◽  
Sijia Liu ◽  
...  

ABSTRACTObjectiveGrowing numbers of academic medical centers offer patient cohort discovery tools to their researchers, yet the performance of systems for this use case is not well-understood. The objective of this research was to assess patient-level information retrieval (IR) methods using electronic health records (EHR) for different types of cohort definition retrieval.Materials and MethodsWe developed a test collection consisting of about 100,000 patient records and 56 test topics that characterized patient cohort requests for various clinical studies. Automated IR tasks using word-based approaches were performed, varying four different parameters for a total of 48 permutations, with performance measured using B-Pref. We subsequently created structured Boolean queries for the 56 topics for performance comparisons. In addition, we performed a more detailed analysis of 10 topics.ResultsThe best-performing word-based automated query parameter settings achieved a mean B-Pref of 0.167 across all 56 topics. The way a topic was structured (topic representation) had the largest impact on performance. Performance not only varied widely across topics, but there was also a large variance in sensitivity to parameter settings across the topics. Structured queries generally performed better than automated queries on measures of recall and precision, but were still not able to recall all relevant patients found by the automated queries.ConclusionWhile word-based automated methods of cohort retrieval offer an attractive solution to the labor-intensive nature of this task currently used at many medical centers, we generally found suboptimal performance in those approaches, with better performance obtained from structured Boolean queries. Insights gained in this preliminary analysis will help guide future work to develop new methods for patient-level cohort discovery with EHR data.

JAMIA Open ◽  
2020 ◽  
Vol 3 (3) ◽  
pp. 395-404 ◽  
Author(s):  
Steven R Chamberlin ◽  
Steven D Bedrick ◽  
Aaron M Cohen ◽  
Yanshan Wang ◽  
Andrew Wen ◽  
...  

Abstract Objective Growing numbers of academic medical centers offer patient cohort discovery tools to their researchers, yet the performance of systems for this use case is not well understood. The objective of this research was to assess patient-level information retrieval methods using electronic health records for different types of cohort definition retrieval. Materials and Methods We developed a test collection consisting of about 100 000 patient records and 56 test topics that characterized patient cohort requests for various clinical studies. Automated information retrieval tasks using word-based approaches were performed, varying 4 different parameters for a total of 48 permutations, with performance measured using B-Pref. We subsequently created structured Boolean queries for the 56 topics for performance comparisons. In addition, we performed a more detailed analysis of 10 topics. Results The best-performing word-based automated query parameter settings achieved a mean B-Pref of 0.167 across all 56 topics. The way a topic was structured (topic representation) had the largest impact on performance. Performance not only varied widely across topics, but there was also a large variance in sensitivity to parameter settings across the topics. Structured queries generally performed better than automated queries on measures of recall and precision but were still not able to recall all relevant patients found by the automated queries. Conclusion While word-based automated methods of cohort retrieval offer an attractive solution to the labor-intensive nature of this task currently used at many medical centers, we generally found suboptimal performance in those approaches, with better performance obtained from structured Boolean queries. Future work will focus on using the test collection to develop and evaluate new approaches to query structure, weighting algorithms, and application of semantic methods.


2019 ◽  
Author(s):  
Steven R. Chamberlin ◽  
Steven D. Bedrick ◽  
Aaron M. Cohen ◽  
Yanshan Wang ◽  
Andrew Wen ◽  
...  

AbstractPerformance of systems used for patient cohort identification with electronic health record (EHR) data is not well-characterized. The objective of this research was to evaluate factors that might affect information retrieval (IR) methods and to investigate the interplay between commonly used IR approaches and the characteristics of the cohort definition structure.We used an IR test collection containing 56 test patient cohort definitions, 100,000 patient records originating from an academic medical institution EHR data warehouse, and automated word-base query tasks, varying four parameters. Performance was measured using B-Pref. We then designed 59 taxonomy characteristics to classify the structure of the 56 topics. In addition, six topic complexity measures were derived from these characteristics for further evaluation using a beta regression simulation.We did not find a strong association between the 59 taxonomy characteristics and patient retrieval performance, but we did find strong performance associations with the six topic complexity measures created from these characteristics, and interactions between these measures and the automated query parameter settings.Some of the characteristics derived from a query taxonomy could lead to improved selection of approaches based on the structure of the topic of interest. Insights gained here will help guide future work to develop new methods for patient-level cohort discovery with EHR data.


2021 ◽  
Author(s):  
Rebecca T. Levinson ◽  
Jennifer R. Malinowski ◽  
Suzette J. Bielinski ◽  
Luke V. Rasmussen ◽  
Quinn S. Wells ◽  
...  

ABSTRACTBackgroundHeart failure (HF) is a complex syndrome associated with significant morbidity and healthcare costs. Electronic health records (EHRs) are widely used to identify patients with HF and other phenotypes. Despite widespread use of EHRs for phenotype algorithm development, it is unclear if the characteristics of identified populations mirror those of clinically observed patients and reflect the known spectrum of HF phenotypes.MethodsWe performed a subanalysis within a larger systematic evidence review to assess the different methods used for HF algorithm development and their application to research and clinical care. We queried PubMed for articles published up to November 2020. Out of 318 studies screened, 25 articles were included for primary analysis and 15 studies using only International Classification of Diseases (ICD) codes were evaluated for secondary analysis. Results are reported descriptively.ResultsHF algorithms were most often developed at academic medical centers and the V.A. One health system was responsible for 8 of 10 HF algorithm studies. HF and congestive HF were the most frequent phenotypes observed and less frequently, specific HF subtypes and acute HF. Diagnoses were the most common data type used to identify HF patients and echocardiography was the second most frequent. The majority of studies used rule-based methods to develop their algorithm. Few studies used regression or machine learning methods to identify HF patients. Validation of algorithms varied considerably: only 52.9% of HF and 44.4% of HF subtype algorithms were validated, but 75% of acute HF algorithms were. Demographics of any study population were reported in 68% of algorithm studies and 53% of ICD-only studies. Fewer than half reported demographics of their HF algorithm-identified population. Of those reporting, most identified majority male (>50%) populations, including both algorithms for HF with preserved ejection fraction.ConclusionThere is significant heterogeneity in phenotyping methodologies used to develop HF algorithms using EHRs. Validation of algorithms is inconsistent but largely relies on manual review of patient records. The concentration of algorithm development at one or two sites may reduce potential generalizability of these algorithms to identify HF patients at non-academic medical centers and in populations from underrepresented regions. Differences between the reported demographics of algorithm-identified HF populations those expected based on HF epidemiology suggest that current algorithms do not reflect the full spectrum of HF patient populations.


2019 ◽  
Vol 69 (686) ◽  
pp. e605-e611 ◽  
Author(s):  
Helen P Booth ◽  
Arlene M Gallagher ◽  
David Mullett ◽  
Lucy Carty ◽  
Shivani Padmanabhan ◽  
...  

BackgroundQuality improvement (QI) is a priority for general practice, and GPs are expected to participate in and provide evidence of QI activity. There is growing interest in harnessing the potential of electronic health records (EHR) to improve patient care by supporting practices to find cases that could benefit from a medicines review.AimTo develop scalable and reproducible prescribing safety reports using patient-level EHR data.Design and settingUK general practices that contribute de-identified patient data to the Clinical Practice Research Datalink (CPRD).MethodA scoping phase used stakeholder consultations to identify primary care QI needs and potential indicators. QI reports containing real data were sent to 12 pilot practices that used Vision GP software and had expressed interest. The scale-up phase involved automating production and distribution of reports to all contributing practices that used both Vision and EMIS software systems. Benchmarking reports with patient-level case review lists for two prescribing safety indicators were sent to 457 practices in December 2017 following the initial scale-up (Figure 2).ResultsTwo indicators were selected from the Royal College of General Practitioners Patient Safety Toolkit following stakeholder consultations for the pilot phase involving 12 GP practices. Pilot phase interviews showed that reports were used to review individual patient care, implement wider QI actions in the practice, and for appraisal and revalidation.ConclusionElectronic health record data can be used to provide standardised, reproducible reports that can be delivered at scale with minimal resource requirements. These can be used in a national QI initiative that impacts directly on patient care.


Author(s):  
Victor M. Castro ◽  
Rachel A. Ross ◽  
Sean M. McBride ◽  
Roy H. Perlis

AbstractImportanceAbsent a vaccine or any established treatments for the novel and highly infectious coronavirus-19 (COVID-19), rapid efforts to identify potential therapeutics are required.ObjectiveTo identify commonly-prescribed medications that may be associated with lesser risk of morbidity with COVID-19 across 5 Eastern Massachusetts hospitals.DesignIn silico cohort using electronic health records between 7/1/2019 and 4/07/2020. Setting: Outpatient, emergency department and inpatient settings from 2 academic medical centers and 3 community hospitals.ParticipantsAll individuals presenting to a clinical site and undergoing COVID-19 testing.Main Outcome or MeasureInpatient hospitalization; documented requirement for mechanical ventilation.ResultsAmong 12,818 individuals with COVID-19 testing results available, 2271 (17.7%) were test-positive, and 707/2271 (31.1%) were hospitalized in one of 5 hospitals. Based on a comparison of ranked electronic prescribing frequencies, medications enriched among test-positive individuals not requiring hospitalization included ibuprofen, valacyclovir, and naproxen. Among individuals who were hospitalized, mechanical ventilation was documented in 213 (30.1%); ibuprofen and naproxen were also more commonly prescribed among individuals not requiring ventilation.Conclusions and RelevanceThese preliminary findings suggest that electronic health records may be applied to identify medications associated with lower risk of morbidity with COVID-19, but larger cohorts will be required to address confounding by indication. Larger scale efforts at repositioning may help to identify FDA-approved medications meriting study for prevention of COVID-19 morbidity and mortality.Fundingnone.Key PointsQuestionCan electronic health records identify medications that may be associated with diminished risk of COVID-19 morbidity?FindingsThis cohort study across 5 hospitals identified medications enriched among individuals who did not require hospitalization for COVID-19 despite a positive test.MeaningWhile preliminary and subject to confounding, our results suggest that electronic health records may complement efforts to identify novel therapeutics for COVID-19 by identifying FDA-approved compounds with potential benefit in reducing COVID-19-associated morbidity.


2021 ◽  
pp. 450-458
Author(s):  
Carsten Schröder ◽  
Marcus Lawrance ◽  
Chen Li ◽  
Christelle Lenain ◽  
Shivani K. Mhatre ◽  
...  

PURPOSE External control (EC) arms derived from electronic health records (EHRs) can provide appropriate comparison groups when randomized control arms are not feasible, but have not been explored for metastatic colorectal cancer (mCRC) trials. We constructed EC arms from two patient-level EHR-derived databases and evaluated them against the control arm from a phase III, randomized controlled mCRC trial. METHODS IMblaze370 evaluated atezolizumab with or without cobimetinib versus regorafenib in patients with mCRC. EC arms were constructed from the Flatiron Health (FH) EHR-derived de-identified database and the combined FH/Foundation Medicine Clinico-Genomic Database (CGDB). IMblaze370 eligibility criteria were applied to the EC cohorts. Propensity scores and standardized mortality ratio weighting were used to balance baseline characteristics between the IMblaze370 and EC arms; balance was assessed using standardized mean differences. Kaplan-Meier method estimated median overall survival (OS). Cox proportional hazards models estimated hazard ratios with bootstrapped 95% CIs to compare differences in OS between study arms. RESULTS The FH EC included 184 patients; the CGDB EC included 108 patients. Most characteristics were well-balanced (standardized mean difference < 0.1) between each EC arm and the IMblaze370 population. Median OS was similar between the IMblaze370 control arm (8.5 months [95% CI, 6.41 to 10.71]) and both EC arms: FH (8.5 months [6.93 to 9.92]) and CGDB (8.8 months [7.85 to 9.92]). OS comparisons between the IMblaze370 experimental arm and the FH EC (hazard ratio, 0.85 [0.64 to 1.14]) and CGDB EC (0.86 [0.65 to 1.18]) yielded similar results as the comparison with the IMblaze370 control arm (1.01 [0.75 to 1.37]). CONCLUSION EC arms constructed from the FH database and the CGDB closely replicated the control arm from IMblaze370. EHR-derived EC arms can provide meaningful comparators in mCRC trials when recruiting a randomized control arm is not feasible.


2016 ◽  
Vol 24 (2) ◽  
pp. 380-387 ◽  
Author(s):  
Hyeoneui Kim ◽  
Elizabeth Bell ◽  
Jihoon Kim ◽  
Amy Sitapati ◽  
Joe Ramsdell ◽  
...  

Background: Implementation of patient preferences for use of electronic health records for research has been traditionally limited to identifiable data. Tiered e-consent for use of de-identified data has traditionally been deemed unnecessary or impractical for implementation in clinical settings. Methods: We developed a web-based tiered informed consent tool called informed consent for clinical data and bio-sample use for research (iCONCUR) that honors granular patient preferences for use of electronic health record data in research. We piloted this tool in 4 outpatient clinics of an academic medical center. Results: Of patients offered access to iCONCUR, 394 agreed to participate in this study, among whom 126 patients accessed the website to modify their records according to data category and data recipient. The majority consented to share most of their data and specimens with researchers. Willingness to share was greater among participants from an Human Immunodeficiency Virus (HIV) clinic than those from internal medicine clinics. The number of items declined was higher for for-profit institution recipients. Overall, participants were most willing to share demographics and body measurements and least willing to share family history and financial data. Participants indicated that having granular choices for data sharing was appropriate, and that they liked being informed about who was using their data for what purposes, as well as about outcomes of the research. Conclusion: This study suggests that a tiered electronic informed consent system is a workable solution that respects patient preferences, increases satisfaction, and does not significantly affect participation in research.


2011 ◽  
Vol 4 (0) ◽  
Author(s):  
Michael Klompas ◽  
Chaim Kirby ◽  
Jason McVetta ◽  
Paul Oppedisano ◽  
John Brownstein ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document