scholarly journals Dimension reduction and variable selection in case control studies via regularized likelihood optimization

2009 ◽  
Vol 3 (0) ◽  
pp. 1257-1287 ◽  
Author(s):  
Florentina Bunea ◽  
Adrian Barbu
Author(s):  
Josephine Asafu-Adjei ◽  
Mahlet G. Tadesse ◽  
Brent Coull ◽  
Raji Balasubramanian ◽  
Michael Lev ◽  
...  

AbstractMatched case-control designs are currently used in many biomedical applications. To ensure high efficiency and statistical power in identifying features that best discriminate cases from controls, it is important to account for the use of matched designs. However, in the setting of high dimensional data, few variable selection methods account for matching. Bayesian approaches to variable selection have several advantages, including the fact that such approaches visit a wider range of model subsets. In this paper, we propose a variable selection method to account for case-control matching in a Bayesian context and apply it using simulation studies, a matched brain imaging study conducted at Massachusetts General Hospital, and a matched cardiovascular biomarker study conducted by the High Risk Plaque Initiative.


2001 ◽  
Vol 20 (21) ◽  
pp. 3215-3230 ◽  
Author(s):  
Valerie Viallefont ◽  
Adrian E. Raftery ◽  
Sylvia Richardson

2019 ◽  
Author(s):  
Nooshin Shomal Zadeh ◽  
Sangdi Lin ◽  
George C Runger

Abstract Motivation Matched case–control analysis is widely used in biomedical studies to identify exposure variables associated with health conditions. The matching is used to improve the efficiency. Existing variable selection methods for matched case–control studies are challenged in high-dimensional settings where interactions among variables are also important. We describe a quite different method for high-dimensional matched case–control data, based on the potential outcome model, which is not only flexible regarding the number of matching and exposure variables but also able to detect interaction effects. Results We present Matched Forest (MF), an algorithm for variable selection in matched case–control data. The method preserves the case and control values in each instance but transforms the matched case–control data with added counterfactuals. A modified variable importance score from a supervised learner is used to detect important variables. The method is conceptually simple and can be applied with widely available software tools. Simulation studies show the effectiveness of MF in identifying important variables. MF is also applied to data from the biomedical domain and its performance is compared with alternative approaches. Availability and implementation R code for implementing MF is available at https://github.com/NooshinSh/Matched_Forest. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 10 ◽  
pp. 117959721985895 ◽  
Author(s):  
Bryan Stanfill ◽  
Sarah Reehl ◽  
Lisa Bramer ◽  
Ernesto S Nakayasu ◽  
Stephen S Rich ◽  
...  

Classification is a common technique applied to ’omics data to build predictive models and identify potential markers of biomedical outcomes. Despite the prevalence of case-control studies, the number of classification methods available to analyze data generated by such studies is extremely limited. Conditional logistic regression is the most commonly used technique, but the associated modeling assumptions limit its ability to identify a large class of sufficiently complicated ’omic signatures. We propose a data preprocessing step which generalizes and makes any linear or nonlinear classification algorithm, even those typically not appropriate for matched design data, available to be used to model case-control data and identify relevant biomarkers in these study designs. We demonstrate on simulated case-control data that both the classification and variable selection accuracy of each method is improved after applying this processing step and that the proposed methods are comparable to or outperform existing variable selection methods. Finally, we demonstrate the impact of conditional classification algorithms on a large cohort study of children with islet autoimmunity.


2017 ◽  
Vol 75 (7) ◽  
pp. 522-529 ◽  
Author(s):  
Virissa Lenters ◽  
Roel Vermeulen ◽  
Lützen Portengen

ObjectivesThere is growing recognition that simultaneously assessing multiple exposures may reduce false positive discoveries and improve epidemiological effect estimates. We evaluated the performance of statistical methods for identifying exposure–outcome associations across various data structures typical of environmental and occupational epidemiology analyses.MethodsWe simulated a case–control study, generating 100 data sets for each of 270 different simulation scenarios; varying the number of exposure variables, the correlation between exposures, sample size, the number of effective exposures and the magnitude of effect estimates. We compared conventional analytical approaches, that is, univariable (with and without multiplicity adjustment), multivariable and stepwise logistic regression, with variable selection methods: sparse partial least squares discriminant analysis, boosting, and frequentist and Bayesian penalised regression approaches.ResultsThe variable selection methods consistently yielded more precise effect estimates and generally improved selection accuracy compared with conventional logistic regression methods, especially for scenarios with higher correlation levels. Penalised lasso and elastic net regression both seemed to perform particularly well, specifically when statistical inference based on a balanced weighting of high sensitivity and a low proportion of false discoveries is sought.ConclusionsIn this extensive simulation study with multicollinear data, we found that most variable selection methods consistently outperformed conventional approaches, and demonstrated how performance is influenced by the structure of the data and underlying model.


2011 ◽  
Vol 71 (4) ◽  
pp. 234-245 ◽  
Author(s):  
Saonli Basu ◽  
Wei Pan ◽  
William S. Oetting

Author(s):  
Ruth H. Keogh ◽  
D. R. Cox

Sign in / Sign up

Export Citation Format

Share Document