Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix

Variable selection of high-dimensional non-parametric nonlinear systems by derivative averaging to avoid the curse of dimensionality

Automatica ◽

10.1016/j.automatica.2018.11.019 ◽

2019 ◽

Vol 101 ◽

pp. 138-149 ◽

Cited By ~ 1

Author(s):

Er-Wei Bai ◽

Changming Cheng ◽

Wen-Xiao Zhao

Keyword(s):

Nonlinear Systems ◽

Variable Selection ◽

Curse Of Dimensionality ◽

High Dimensional ◽

Selection Of ◽

Non Parametric

Download Full-text

Stable Variable Selection for High-dimensional Genomic Data with Strong Correlations

10.21203/rs.3.rs-923319/v1 ◽

2021 ◽

Author(s):

Reetika Sarkar ◽

Sithija Manage ◽

Xiaoli Gao

Keyword(s):

Variable Selection ◽

Genomic Data ◽

High Dimensional ◽

Strong Correlations ◽

Computationally Efficient ◽

Two Stage ◽

Selection Of Variables ◽

Level Variable ◽

Stable Variable ◽

Selection Of

Abstract Background: High-dimensional genomic data studies are often found to exhibit strong correlations, which results in instability and inconsistency in the estimates obtained using commonly used regularization approaches including both the Lasso and MCP, and related methods. Result: In this paper, we perform a comparative study of regularization approaches for variable selection under different correlation structures, and propose a two-stage procedure named rPGBS to address the issue of stable variable selection in various strong correlation settings. This approach involves repeatedly running of a two-stage hierarchical approach consisting of a random pseudo-group clustering and bi-level variable selection. Conclusion: Both the simulation studies and high-dimensional genomic data analysis have demonstrated the advantage of the proposed rPGBS method over most commonly used regularization methods. In particular, the rPGBS results in more stable selection of variables across a variety of correlation settings, as compared to recent work addressing variable selection with strong correlations. Moreover, the rPGBS is computationally efficient across various settings.

Download Full-text

Robust check loss-based variable selection of high-dimensional single-index varying-coefficient model

Communications in Nonlinear Science and Numerical Simulation ◽

10.1016/j.cnsns.2015.11.013 ◽

2016 ◽

Vol 36 ◽

pp. 109-128 ◽

Cited By ~ 4

Author(s):

Yunquan Song ◽

Lu Lin ◽

Ling Jian

Keyword(s):

Variable Selection ◽

High Dimensional ◽

Varying Coefficient Model ◽

Varying Coefficient ◽

Single Index ◽

Selection Of

Download Full-text

Independent Variable Selection of High-Dimensional Data in Cox Regression Model

Statistics and Applications ◽

10.12677/sa.2021.102018 ◽

2021 ◽

Vol 10 (02) ◽

pp. 183-192

Author(s):

锋刘

Keyword(s):

Variable Selection ◽

Regression Model ◽

Cox Regression ◽

High Dimensional Data ◽

High Dimensional ◽

Cox Regression Model ◽

Independent Variable ◽

Selection Of

Download Full-text

Variable selection of high-dimensional non-parametric nonlinear systems: A way to avoid the curse of dimensionality

2017 IEEE 56th Annual Conference on Decision and Control (CDC) ◽

10.1109/cdc.2017.8264634 ◽

2017 ◽

Cited By ~ 1

Author(s):

Er-wei Bai ◽

Changmin Cheng ◽

Wenxiao Zhao ◽

Han-Fu Chen

Keyword(s):

Nonlinear Systems ◽

Variable Selection ◽

Curse Of Dimensionality ◽

High Dimensional ◽

Selection Of ◽

Non Parametric

Download Full-text

Randomized boosting with multivariable base-learners for high-dimensional variable selection and prediction

BMC Bioinformatics ◽

10.1186/s12859-021-04340-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Christian Staerk ◽

Andreas Mayr

Keyword(s):

Variable Selection ◽

Prediction Models ◽

Predictor Variable ◽

Predictive Performance ◽

Penalized Regression ◽

Information Criteria ◽

High Dimensional ◽

Gradient Boosting ◽

Biomedical Data ◽

Selection Of

Abstract Background Statistical boosting is a computational approach to select and estimate interpretable prediction models for high-dimensional biomedical data, leading to implicit regularization and variable selection when combined with early stopping. Traditionally, the set of base-learners is fixed for all iterations and consists of simple regression learners including only one predictor variable at a time. Furthermore, the number of iterations is typically tuned by optimizing the predictive performance, leading to models which often include unnecessarily large numbers of noise variables. Results We propose three consecutive extensions of classical component-wise gradient boosting. In the first extension, called Subspace Boosting (SubBoost), base-learners can consist of several variables, allowing for multivariable updates in a single iteration. To compensate for the larger flexibility, the ultimate selection of base-learners is based on information criteria leading to an automatic stopping of the algorithm. As the second extension, Random Subspace Boosting (RSubBoost) additionally includes a random preselection of base-learners in each iteration, enabling the scalability to high-dimensional data. In a third extension, called Adaptive Subspace Boosting (AdaSubBoost), an adaptive random preselection of base-learners is considered, focusing on base-learners which have proven to be predictive in previous iterations. Simulation results show that the multivariable updates in the three subspace algorithms are particularly beneficial in cases of high correlations among signal covariates. In several biomedical applications the proposed algorithms tend to yield sparser models than classical statistical boosting, while showing a very competitive predictive performance also compared to penalized regression approaches like the (relaxed) lasso and the elastic net. Conclusions The proposed randomized boosting approaches with multivariable base-learners are promising extensions of statistical boosting, particularly suited for highly-correlated and sparse high-dimensional settings. The incorporated selection of base-learners via information criteria induces automatic stopping of the algorithms, promoting sparser and more interpretable prediction models.

Download Full-text

Faculty Opinions recommendation of Panning for gold: ‘model-X’ knockoffs for high dimensional controlled variable selection.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.732703320.793559523 ◽

2019 ◽

Author(s):

Karsten Borgwardt

Keyword(s):

Variable Selection ◽

High Dimensional

Download Full-text

Hierarchical selection of variables in sparse high-dimensional regression

Institute of Mathematical Statistics Collections - Borrowing Strength: Theory Powering Applications – A Festschrift for Lawrence D. Brown ◽

10.1214/10-imscoll605 ◽

2010 ◽

pp. 56-69 ◽

Cited By ~ 4

Author(s):

Peter J. Bickel ◽

Ya’acov Ritov ◽

Alexandre B. Tsybakov

Keyword(s):

High Dimensional ◽

Selection Of Variables ◽

High Dimensional Regression ◽

Hierarchical Selection ◽

Selection Of

Download Full-text

Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer

Mathematics ◽

10.3390/math9030222 ◽

2021 ◽

Vol 9 (3) ◽

pp. 222

Author(s):

Juan C. Laria ◽

M. Carmen Aguilera-Morillo ◽

Enrique Álvarez ◽

Rosa E. Lillo ◽

Sara López-Taruella ◽

...

Keyword(s):

Breast Cancer ◽

Variable Selection ◽

Triple Negative Breast Cancer ◽

Triple Negative ◽

A Priori ◽

Simulated Data ◽

Point Of View ◽

High Dimensional ◽

Whole Genome ◽

Genome Context

Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced statistics and can use an agnostic point of view or include some a priori knowledge, but overfitting remains a problem. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data and a real dataset from a triple-negative breast cancer study.

Download Full-text

A new adaptive L1-norm for optimal descriptor selection of high-dimensional QSAR classification model for anti-hepatitis C virus activity of thiourea derivatives

SAR and QSAR in Environmental Research ◽

10.1080/1062936x.2017.1278618 ◽

2017 ◽

Vol 28 (1) ◽

pp. 75-90 ◽

Cited By ~ 16

Author(s):

Z. Y. Algamal ◽

M. H. Lee

Keyword(s):

Hepatitis C Virus ◽

Hepatitis C ◽

Classification Model ◽

High Dimensional ◽

L1 Norm ◽

Thiourea Derivatives ◽

Virus Activity ◽

Descriptor Selection ◽

Optimal Descriptor ◽

Selection Of

Download Full-text