scholarly journals External Validation of an Algorithm to Identify Patients with High Data-Completeness in Electronic Health Records for Comparative Effectiveness Research

2020 ◽  
Vol Volume 12 ◽  
pp. 133-141
Author(s):  
Kueiyu Joshua Lin ◽  
Gary Rosenthal ◽  
Shawn Murphy ◽  
Kenneth Mandl ◽  
Yinzhu Jin ◽  
...  
2012 ◽  
Vol 30 (34) ◽  
pp. 4243-4248 ◽  
Author(s):  
Benjamin J. Miriovsky ◽  
Lawrence N. Shulman ◽  
Amy P. Abernethy

Rapidly accumulating clinical information can support cancer care and discovery. Future success depends on information management, access, use, and reuse. Electronic health records (EHRs) are highlighted as a critical component of evidence development and implementation, but to fully harness the potential of EHRs, they need to be more than electronic renderings of the traditional paper medical chart. Clinical informatics and structured accessible secure data captured through EHR systems provide mechanisms through which EHRs can facilitate comparative effectiveness research (CER). Use of large linked administrative databases to answer comparative questions is an early version of informatics-enabled CER familiar to oncologists. An updated version of informatics-enabled CER relies on EHR-derived structured data linked with supplemental information to provide patient-level information that can be aggregated and analyzed to support hypothesis generation, comparative assessment, and personalized care. As implementation of EHRs continues to expand, electronic databases containing information collected via EHRs will continuously aggregate; aggregating data enhanced with real-time analytics can provide point-of-care evidence to oncologists, tailored to patient-level characteristics. The system learns when clinical care informs research, and insights derived from research are reinvested in care. Challenges must be overcome, including interoperability, standardization, access, and development of real-time analytics.


Medical Care ◽  
2013 ◽  
Vol 51 ◽  
pp. S30-S37 ◽  
Author(s):  
William R. Hersh ◽  
Mark G. Weiner ◽  
Peter J. Embi ◽  
Judith R. Logan ◽  
Philip R.O. Payne ◽  
...  

2020 ◽  
Vol 38 (4_suppl) ◽  
pp. 679-679
Author(s):  
Limor Appelbaum ◽  
Jose Pablo Cambronero ◽  
Karla Pollick ◽  
George Silva ◽  
Jennifer P. Stevens ◽  
...  

679 Background: Pancreatic Adenocarcinoma (PDAC) is often diagnosed at an advanced stage. We sought to develop a model for early PDAC prediction in the general population, using electronic health records (EHRs) and machine learning. Methods: We used three EHR datasets from Beth-Israel Deaconess Medical Center (BIDMC) and Partners Healthcare (PHC): 1. “BIDMC-Development-Data” (BIDMC-DD) for model development, using a feed-forward neural network (NN) and L2-regularized logistic regression,randomly split (80:20) into training and test groups. We tuned hyperparameters using cross-validation in training, and report performance on the test split. 2. “BIDMC-Large-Data” (BIDMC-LD) to re-fit and calibrate models. 3. “PHC-Data” for external validation. We evaluate using Area Under the Receiver Operating Characteristic Curve (AUC) and compute 95% CI using empirical bootstrap over test data. PDAC patients were selected using ICD9/-10 codes and validated with tumor registries. In contrast to prior work, we did not predefine feature sets based on known clinical correlates and instead employed data-driven feature selection, specifically importance-based feature pruning, regularization, and manual validation, to identify diagnostic-based features. Results: BIDMC-DD included demographics, diagnoses, labs and medications for 1018 patients (cases = 509; age-sex paired controls). BIDMC-LD included diagnoses for 547,917 patients (cases = 509), and PHC included diagnoses for 160,593 patients (cases = 408). We compared our approach to adapted and re-fitted published baselines. With a 365-day lead-time, NN obtained a BIDMC-DD test AUC of 0.84 (CI 0.79 - 0.90) versus the previous best baseline AUC of 0.70 (CI 0.62 - 0.78). We also validated using BIDMC-DD’s test cancer patients and BIDMC LD controls. The AUC was 0.71 (CI 0.67 - 0.76) at the 365-day cutoff. NN’s external validation AUC on PHC-Data was 0.71 (CI 0.63 - 0.79), outperforming an existing model’s AUC of 0.61 (CI 0.52 - 0.70) (Baecker et al, 2019). Conclusions: Models based on data-driven feature selection outperform models that use predefined sets of known clinical correlates and can help in early prediction of PDAC development.


Sign in / Sign up

Export Citation Format

Share Document