Correcting hazard ratio estimates for outcome misclassification using multiple imputation with internal validation data

Jiayi Ni; Aaron Leong; Kaberi Dasgupta; Elham Rahme

doi:10.1002/pds.4223

Accounting for Misclassified Outcomes in Binary Regression Models Using Multiple Imputation With Internal Validation Data

American Journal of Epidemiology ◽

10.1093/aje/kws340 ◽

2013 ◽

Vol 177 (9) ◽

pp. 904-912 ◽

Cited By ~ 40

Author(s):

Jessie K. Edwards ◽

Stephen R. Cole ◽

Melissa A. Troester ◽

David B. Richardson

Keyword(s):

Multiple Imputation ◽

Regression Models ◽

Binary Regression ◽

Validation Data ◽

Internal Validation

Download Full-text

A combined strategy of feature selection and machine learning to identify predictors of prediabetes

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz204 ◽

2019 ◽

Vol 27 (3) ◽

pp. 396-406 ◽

Cited By ~ 1

Author(s):

Kushan De Silva ◽

Daniel Jönsson ◽

Ryan T Demmer

Keyword(s):

Machine Learning ◽

Feature Selection ◽

National Health ◽

Screening Tool ◽

Model Performance ◽

Nutrition Examination Survey ◽

Validation Data ◽

Internal Validation ◽

Health And Nutrition ◽

Wide Range

Abstract Objective To identify predictors of prediabetes using feature selection and machine learning on a nationally representative sample of the US population. Materials and Methods We analyzed n = 6346 men and women enrolled in the National Health and Nutrition Examination Survey 2013–2014. Prediabetes was defined using American Diabetes Association guidelines. The sample was randomly partitioned to training (n = 3174) and internal validation (n = 3172) sets. Feature selection algorithms were run on training data containing 156 preselected exposure variables. Four machine learning algorithms were applied on 46 exposure variables in original and resampled training datasets built using 4 resampling methods. Predictive models were tested on internal validation data (n = 3172) and external validation data (n = 3000) prepared from National Health and Nutrition Examination Survey 2011–2012. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC). Predictors were assessed by odds ratios in logistic models and variable importance in others. The Centers for Disease Control (CDC) prediabetes screening tool was the benchmark to compare model performance. Results Prediabetes prevalence was 23.43%. The CDC prediabetes screening tool produced 64.40% AUROC. Seven optimal (≥ 70% AUROC) models identified 25 predictors including 4 potentially novel associations; 20 by both logistic and other nonlinear/ensemble models and 5 solely by the latter. All optimal models outperformed the CDC prediabetes screening tool (P < 0.05). Discussion Combined use of feature selection and machine learning increased predictive performance outperforming the recommended screening tool. A range of predictors of prediabetes was identified. Conclusion This work demonstrated the value of combining feature selection with machine learning to identify a wide range of predictors that could enhance prediabetes prediction and clinical decision-making.

Download Full-text

Correction for misclassification of caries experience in the absence of internal validation data

Clinical Oral Investigations ◽

10.1007/s00784-013-0993-4 ◽

2013 ◽

Vol 17 (8) ◽

pp. 1799-1805 ◽

Cited By ~ 2

Author(s):

T. Mutsvari ◽

D. Declerck ◽

E. Lesaffre

Keyword(s):

Caries Experience ◽

Validation Data ◽

Internal Validation

Download Full-text

Dealing with Treatment-confounder Feedback and Sparse Follow-up in Longitudinal studies - Application of a Marginal Structural Model in a Multiple Sclerosis Cohort

American Journal of Epidemiology ◽

10.1093/aje/kwaa243 ◽

2020 ◽

Author(s):

Mohammad Ehsanul Karim ◽

Helen Tremlett ◽

Feng Zhu ◽

John Petkau ◽

Elaine Kingwell

Keyword(s):

Multiple Sclerosis ◽

Disease Progression ◽

Multiple Imputation ◽

Structural Model ◽

Potential Effect ◽

Hazard Ratio ◽

Survival Advantage ◽

Marginal Structural Model ◽

Beta Interferon

Abstract The beta-interferons are widely prescribed platform therapies for patients with multiple sclerosis (MS). We accessed a cohort of patients with relapsing onset MS from British Columbia, Canada (1995-2013) to examine the potential survival advantage associated with beta-interferon exposure using a marginal structural model. Accounting for potential treatment-confounder feedback between comorbidity, MS disease progression and beta-interferon exposure, we found an association between beta-interferon exposure of at least 6 contiguous months and improved survival (hazard ratio (HR) = 0.63, 95% confidence interval 0.47-0.86). We also assessed potential effect modifications by sex, baseline age or baseline disease duration, and found these factors to be important effect modifiers. Sparse follow-up due to variability in patient contact with the health system is one of the biggest challenges in longitudinal analyses. We considered several single-level and multi-level multiple imputation approaches to deal with sparse follow-up and disease progression information; both types of approach produced similar estimates. Compared to ad hoc imputation approaches, such as linear interpolation (HR: 0.63), and last observation carried forward (HR: 0.65), all multiple imputation approaches produced a smaller hazard ratio (HR: 0.53), although the direction of effect and conclusions drawn concerning the survival advantage remained the same.

Download Full-text

Validating vascular access data in the Swedish Renal Registry SRR

The Journal of Vascular Access ◽

10.1177/1129729820954737 ◽

2020 ◽

pp. 112972982095473

Author(s):

Gunilla Welander ◽

Birgitta Sigvant

Keyword(s):

Vascular Access ◽

Medical Records ◽

External Validation ◽

Internal Validity ◽

Validation Data ◽

Internal Validation ◽

Clinical Utilization ◽

Surgical Units ◽

National Patient ◽

Access Data

Background: All Swedish dialysis units register data on vascular access in the Swedish Renal Registry (SRR). This study assessed external and internal validity of vascular access data in the SRR and its use as a tool in clinical practice. Methods: For external validation, all procedures for placed fistulas, open and endovascular reinterventions registered in the SRR in 2011 to 2017 were cross-matched with data from the Swedish National Patient Registry. A two-stage sampling selected 12/60 dialysis units for internal validation. Data on current vascular access for 10 randomly selected patients at each unit were compared with medical record data. SRR data on placed fistulas from 2017 were cross-checked with data from local surgical units. Registrations of central venous catheters (CVCs) as temporary or permanent were used as a proxy for clinical utilization of the registry and analyzed separately. Results: External validity increased from 74% to 83% during the observation period. In all, 1037 datapoints were used in internal validation, with a 95% match between SRR registrations and medical records. Registrations of CVCs, fistulas, and interventions were reliable, with few missing data or mismatches. Vascular access type initiating hemodialysis was missing or incorrect in either the SRR or medical records for 14/120 patients. Registrations of placed fistulas in 2017 matched in all but four (pre-dialysis stage) of 135 cases. Some 35% of the CVCs validated ( n = 49) at 7/12 units were not categorized as temporary or permanent. Conclusion: The SRR provides a reliable resource on current vascular access care.

Download Full-text

Evaluating the Impact of Unmeasured Confounding with Internal Validation Data: An Example Cost Evaluation in Type 2 Diabetes

Value in Health ◽

10.1016/j.jval.2012.10.012 ◽

2013 ◽

Vol 16 (2) ◽

pp. 259-266 ◽

Cited By ~ 15

Author(s):

Douglas Faries ◽

Xiaomei Peng ◽

Manjiri Pawaskar ◽

Karen Price ◽

James D. Stamey ◽

...

Keyword(s):

Type 2 Diabetes ◽

Cost Evaluation ◽

Validation Data ◽

Unmeasured Confounding ◽

Internal Validation ◽

The Impact

Download Full-text

A Competing Risk Nomogram for Predicting Cancer-Specific Death of Patients With Maxillary Sinus Carcinoma

Frontiers in Oncology ◽

10.3389/fonc.2021.698955 ◽

2021 ◽

Vol 11 ◽

Author(s):

Mingbin Hu ◽

Xiancai Li ◽

Weiguo Gu ◽

Jinhong Mei ◽

Dewu Liu ◽

...

Keyword(s):

Maxillary Sinus ◽

Cumulative Incidence ◽

External Validation ◽

Competing Risk ◽

Validation Data ◽

Data Set ◽

Internal Validation ◽

Data Resource ◽

Seer Data ◽

Risk Of Cancer

ObjectivesHerein, we purposed to establish and verify a competing risk nomogram for estimating the risk of cancer-specific death (CSD) in Maxillary Sinus Carcinoma (MSC) patients.MethodsThe data of individuals with MSC used in this study was abstracted from the (SEER) Surveillance, Epidemiology, and End Results data resource as well as from the First Affiliated Hospital of Nanchang University (China). The risk predictors linked to CSD were identified using the CIF (cumulative incidence function) along with the Fine-Gray proportional hazards model on the basis of univariate analysis coupled with multivariate analysis implemented in the R-software. After that, a nomogram was created and verified to estimate the three- and five-year CSD probability.ResultsOverall, 478 individuals with MSC were enrolled from the SEER data resource, with a 3- and 5-year cumulative incidence of CSD after diagnosis of 42.1% and 44.3%, respectively. The Fine-Gray analysis illustrated that age, histological type, N stage, grade, surgery, and T stage were independent predictors linked to CSD in the SEER-training data set (n = 343). These variables were incorporated in the prediction nomogram. The nomogram was well calibrated and it demonstrated a remarkable estimation accuracy in the internal validation data set (n = 135) abstracted from the SEER data resource and the external validation data set (n = 200). The nomograms were well-calibrated and had a good discriminative ability with concordance indexes (c-indexes) of 0.810, 0.761, and 0.755 for the 3- and 5-year prognosis prediction of MSC-specific mortality in the training cohort, internal validation, and external validation cohort, respectively.ConclusionsThe competing risk nomogram constructed herein proved to be an optimal assistant tool for estimating CSD in individuals with MSC.

Download Full-text

Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation

BMC Medical Research Methodology ◽

10.1186/s12874-016-0239-7 ◽

2016 ◽

Vol 16 (1) ◽

Cited By ~ 16

Author(s):

Simone Wahl ◽

Anne-Laure Boulesteix ◽

Astrid Zierer ◽

Barbara Thorand ◽

Mark A. van de Wiel

Keyword(s):

Multiple Imputation ◽

Incomplete Data ◽

Predictive Performance ◽

Internal Validation

Download Full-text

A new model for estimating glomerular filtration rate in patients with cancer.

Journal of Clinical Oncology ◽

10.1200/jco.2017.35.15_suppl.e14074 ◽

2017 ◽

Vol 35 (15_suppl) ◽

pp. e14074-e14074

Author(s):

Tobias Janowitz ◽

Edward Hywel Williams ◽

Andrea Marshall ◽

Nicola Ainsworth ◽

Peter B Thomas ◽

...

Keyword(s):

Glomerular Filtration Rate ◽

Glomerular Filtration ◽

Filtration Rate ◽

Statistical Regression ◽

Validation Data ◽

Data Set ◽

Internal Validation ◽

New Model ◽

Patients With Cancer ◽

Carboplatin Dose

e14074 Background: The glomerular filtration rate (GFR) is essential for carboplatin chemotherapy dosing, however, the best method to estimate GFR in patients with cancer is unknown. We identify the most accurate and least biased method. Methods: Data on age, sex, height, weight, serum creatinine, and results for GFR from 51Cr-EDTA excretion measurements (51Cr-EDTA GFR) were obtained from Caucasian patients aged 18 years or older with histologically confirmed cancer diagnoses at the University of Cambridge Hospital NHS Trust, UK. We developed a new multivariable linear model for GFR using statistical regression analysis. 51Cr-EDTA GFR was compared to the estimated GFR (eGFR) from seven published and our new model using an internal validation data set and root-mean-squared-error (RMSE) and median residuals. A comparison of carboplatin dosing accuracy based on an absolute percentage error more than 20% (APE > 20%) was undertaken. Results: Between August 2006 and January 2013 data from 2,471 patients were obtained. The new model improved the eGFR accuracy (RMSE 15.00ml/min (95% CI 14.12-16.00)) compared to all published models. Body surface area (BSA) adjusted CKD-EPI was the most accurate published models for eGFR (RMSE 16.30ml/min (95% CI 15.34-17.38)) for the internal validation set. Importantly, the new model reduced the fraction of patients with a carboplatin dose APE > 20% to 14.17% in contrast to 18.62% for BSA adjusted CKD-EPI and 25.51% for the Cockcroft-Gault model. The results were externally validated. Conclusions: In a large data set, from patients with cancer, a new model improves eGFR and carboplatin dose calculations, when compared to BSA adjusted CKD-EPI, the model we identified as the best published model for determination of eGFR in patients with cancer.

Download Full-text

A comparison of regression calibration approaches for designs with internal validation data

Journal of Statistical Planning and Inference ◽

10.1016/j.jspi.2003.12.015 ◽

2005 ◽

Vol 131 (1) ◽

pp. 175-190 ◽

Cited By ~ 9

Author(s):

Sally W. Thurston ◽

Paige L. Williams ◽

Russ Hauser ◽

Howard Hu ◽

Mauricio Hernandez-Avila ◽

...

Keyword(s):

Regression Calibration ◽

Validation Data ◽

Internal Validation

Download Full-text