Dimension reduction and variable selection in case control studies via regularized likelihood optimization

Florentina Bunea; Adrian Barbu

doi:10.1214/09-ejs537

Dimension reduction and estimation in the secondary analysis of case-control studies

Electronic Journal of Statistics ◽

10.1214/18-ejs1446 ◽

2018 ◽

Vol 12 (1) ◽

pp. 1782-1821

Author(s):

Liang Liang ◽

Raymond Carroll ◽

Yanyuan Ma

Keyword(s):

Dimension Reduction ◽

Secondary Analysis ◽

Case Control ◽

Case Control Studies

Download Full-text

Bayesian Variable Selection Methods for Matched Case-Control Studies

The International Journal of Biostatistics ◽

10.1515/ijb-2016-0043 ◽

2017 ◽

Vol 13 (1) ◽

Cited By ~ 5

Author(s):

Josephine Asafu-Adjei ◽

Mahlet G. Tadesse ◽

Brent Coull ◽

Raji Balasubramanian ◽

Michael Lev ◽

...

Keyword(s):

Variable Selection ◽

Statistical Power ◽

High Efficiency ◽

Massachusetts General Hospital ◽

Imaging Study ◽

Bayesian Variable Selection ◽

Case Control ◽

Selection Methods ◽

Case Control Studies ◽

Matched Case

AbstractMatched case-control designs are currently used in many biomedical applications. To ensure high efficiency and statistical power in identifying features that best discriminate cases from controls, it is important to account for the use of matched designs. However, in the setting of high dimensional data, few variable selection methods account for matching. Bayesian approaches to variable selection have several advantages, including the fact that such approaches visit a wider range of model subsets. In this paper, we propose a variable selection method to account for case-control matching in a Bayesian context and apply it using simulation studies, a matched brain imaging study conducted at Massachusetts General Hospital, and a matched cardiovascular biomarker study conducted by the High Risk Plaque Initiative.

Download Full-text

Variable selection and Bayesian model averaging in case-control studies

Statistics in Medicine ◽

10.1002/sim.976 ◽

2001 ◽

Vol 20 (21) ◽

pp. 3215-3230 ◽

Cited By ~ 158

Author(s):

Valerie Viallefont ◽

Adrian E. Raftery ◽

Sylvia Richardson

Keyword(s):

Variable Selection ◽

Bayesian Model ◽

Bayesian Model Averaging ◽

Model Averaging ◽

Case Control ◽

Case Control Studies

Download Full-text

Using Kendall's τ b Correlations to Improve Variable Selection Methods in Case-Control Studies

Biometrics ◽

10.2307/2533275 ◽

1995 ◽

Vol 51 (4) ◽

pp. 1451 ◽

Cited By ~ 1

Author(s):

Thomas W. O'Gorman ◽

Robert F. Woolson

Keyword(s):

Variable Selection ◽

Case Control ◽

Selection Methods ◽

Case Control Studies ◽

Kendall’S Τ

Download Full-text

Matched Forest: supervised learning for high-dimensional matched case–control studies

Bioinformatics ◽

10.1093/bioinformatics/btz785 ◽

2019 ◽

Author(s):

Nooshin Shomal Zadeh ◽

Sangdi Lin ◽

George C Runger

Keyword(s):

Variable Selection ◽

Case Control ◽

Supplementary Information ◽

High Dimensional ◽

Control Analysis ◽

Case Control Studies ◽

Control Data ◽

Matched Case ◽

Case Control Analysis ◽

And Control

Abstract Motivation Matched case–control analysis is widely used in biomedical studies to identify exposure variables associated with health conditions. The matching is used to improve the efficiency. Existing variable selection methods for matched case–control studies are challenged in high-dimensional settings where interactions among variables are also important. We describe a quite different method for high-dimensional matched case–control data, based on the potential outcome model, which is not only flexible regarding the number of matching and exposure variables but also able to detect interaction effects. Results We present Matched Forest (MF), an algorithm for variable selection in matched case–control data. The method preserves the case and control values in each instance but transforms the matched case–control data with added counterfactuals. A modified variable importance score from a supervised learner is used to detect important variables. The method is conceptually simple and can be applied with widely available software tools. Simulation studies show the effectiveness of MF in identifying important variables. MF is also applied to data from the biomedical domain and its performance is compared with alternative approaches. Availability and implementation R code for implementing MF is available at https://github.com/NooshinSh/Matched_Forest. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Tree Variable Selection for Paired Case–Control Studies with Application to Microbiome Data

10.1007/978-3-030-73351-3_12 ◽

2021 ◽

pp. 295-310

Author(s):

Min Lu ◽

Hemant Ishwaran

Keyword(s):

Variable Selection ◽

Case Control ◽

Case Control Studies ◽

Selection For ◽

Microbiome Data

Download Full-text

Extending Classification Algorithms to Case-Control Studies

Biomedical Engineering and Computational Biology ◽

10.1177/1179597219858954 ◽

2019 ◽

Vol 10 ◽

pp. 117959721985895 ◽

Cited By ~ 1

Author(s):

Bryan Stanfill ◽

Sarah Reehl ◽

Lisa Bramer ◽

Ernesto S Nakayasu ◽

Stephen S Rich ◽

...

Keyword(s):

Variable Selection ◽

Conditional Logistic Regression ◽

Case Control ◽

Islet Autoimmunity ◽

Classification Algorithms ◽

Case Control Studies ◽

Control Data ◽

Design Data ◽

Matched Design ◽

The Impact

Classification is a common technique applied to ’omics data to build predictive models and identify potential markers of biomedical outcomes. Despite the prevalence of case-control studies, the number of classification methods available to analyze data generated by such studies is extremely limited. Conditional logistic regression is the most commonly used technique, but the associated modeling assumptions limit its ability to identify a large class of sufficiently complicated ’omic signatures. We propose a data preprocessing step which generalizes and makes any linear or nonlinear classification algorithm, even those typically not appropriate for matched design data, available to be used to model case-control data and identify relevant biomarkers in these study designs. We demonstrate on simulated case-control data that both the classification and variable selection accuracy of each method is improved after applying this processing step and that the proposed methods are comparable to or outperform existing variable selection methods. Finally, we demonstrate the impact of conditional classification algorithms on a large cohort study of children with islet autoimmunity.

Download Full-text

Performance of variable selection methods for assessing the health effects of correlated exposures in case–control studies

Occupational and Environmental Medicine ◽

10.1136/oemed-2016-104231 ◽

2017 ◽

Vol 75 (7) ◽

pp. 522-529 ◽

Cited By ~ 12

Author(s):

Virissa Lenters ◽

Roel Vermeulen ◽

Lützen Portengen

Keyword(s):

Logistic Regression ◽

Variable Selection ◽

High Sensitivity ◽

Case Control ◽

Stepwise Logistic Regression ◽

Data Sets ◽

Selection Methods ◽

Case Control Studies ◽

Multiple Exposures ◽

Analytical Approaches

ObjectivesThere is growing recognition that simultaneously assessing multiple exposures may reduce false positive discoveries and improve epidemiological effect estimates. We evaluated the performance of statistical methods for identifying exposure–outcome associations across various data structures typical of environmental and occupational epidemiology analyses.MethodsWe simulated a case–control study, generating 100 data sets for each of 270 different simulation scenarios; varying the number of exposure variables, the correlation between exposures, sample size, the number of effective exposures and the magnitude of effect estimates. We compared conventional analytical approaches, that is, univariable (with and without multiplicity adjustment), multivariable and stepwise logistic regression, with variable selection methods: sparse partial least squares discriminant analysis, boosting, and frequentist and Bayesian penalised regression approaches.ResultsThe variable selection methods consistently yielded more precise effect estimates and generally improved selection accuracy compared with conventional logistic regression methods, especially for scenarios with higher correlation levels. Penalised lasso and elastic net regression both seemed to perform particularly well, specifically when statistical inference based on a balanced weighting of high sensitivity and a low proportion of false discoveries is sought.ConclusionsIn this extensive simulation study with multicollinear data, we found that most variable selection methods consistently outperformed conventional approaches, and demonstrated how performance is influenced by the structure of the data and underlying model.

Download Full-text