Assessing the adequacy of the logistic regression model for matched case-control studies

1985 ◽  
Vol 4 (4) ◽  
pp. 425-435 ◽  
Author(s):  
Suresh H. Moolgavkar ◽  
Edward D. Lustbader ◽  
David J. Venzon
Biostatistics ◽  
2020 ◽  
Author(s):  
Nadim Ballout ◽  
Cedric Garcia ◽  
Vivian Viallon

Summary The analysis of case–control studies with several disease subtypes is increasingly common, e.g. in cancer epidemiology. For matched designs, a natural strategy is based on a stratified conditional logistic regression model. Then, to account for the potential homogeneity among disease subtypes, we adapt the ideas of data shared lasso, which has been recently proposed for the estimation of stratified regression models. For unmatched designs, we compare two standard methods based on $L_1$-norm penalized multinomial logistic regression. We describe formal connections between these two approaches, from which practical guidance can be derived. We show that one of these approaches, which is based on a symmetric formulation of the multinomial logistic regression model, actually reduces to a data shared lasso version of the other. Consequently, the relative performance of the two approaches critically depends on the level of homogeneity that exists among disease subtypes: more precisely, when homogeneity is moderate to high, the non-symmetric formulation with controls as the reference is not recommended. Empirical results obtained from synthetic data are presented, which confirm the benefit of properly accounting for potential homogeneity under both matched and unmatched designs, in terms of estimation and prediction accuracy, variable selection and identification of heterogeneities. We also present preliminary results from the analysis of a case–control study nested within the EPIC (European Prospective Investigation into Cancer and nutrition) cohort, where the objective is to identify metabolites associated with the occurrence of subtypes of breast cancer.


2004 ◽  
Vol 24 (1) ◽  
pp. 121-130 ◽  
Author(s):  
Nico Nagelkerke ◽  
Jeroen Smits ◽  
Saskia le Cessie ◽  
Hans van Houwelingen

Author(s):  
Fei Wan ◽  
Graham A Colditz ◽  
Siobhan Sutcliffe

Abstract Although the need for addressing matching in the analysis of matched case-control studies is well established, debate remains as to the most appropriate analytic method when matching on at least one continuous factor. We compare the bias and efficiency of unadjusted and adjusted conditional logistic regression (CLR) and unconditional logistic regression (ULR) in the setting of both exact and non-exact matching. To demonstrate that case-control matching distorts the association between the matching variables and the outcome in the matched sample relative to the target population, we derive the logit model for the matched case-control sample under exact matching. We conduct simulations to validate our theoretical conclusions and to explore different ways of adjusting for the matching variables in CLR and ULR to reduce biases. When matching is exact, CLR is unbiased in all settings. When matching is not exact, unadjusted CLR tends to be biased and this bias increases with increasing matching caliper size. Spline smoothing of the matching variables in CLR can alleviate biases. Regardless of exact or non-exact matching, adjusted ULR is generally biased unless the functional form of the matched factors is modelled correctly. The validity of adjusted ULR is vulnerable to model specification error. CLR should remain the primary analytic approach.


Sign in / Sign up

Export Citation Format

Share Document