scholarly journals A mixture of local and quadratic approximation variable selection algorithm in nonconcave penalized regression

Author(s):  
Assi N'GUESSAN ◽  
Ibrahim Sidi Zakari ◽  
Assi Mkhadri

International audience We consider the problem of variable selection via penalized likelihood using nonconvex penalty functions. To maximize the non-differentiable and nonconcave objective function, an algorithm based on local linear approximation and which adopts a naturally sparse representation was recently proposed. However, although it has promising theoretical properties, it inherits some drawbacks of Lasso in high dimensional setting. To overcome these drawbacks, we propose an algorithm (MLLQA) for maximizing the penalized likelihood for a large class of nonconvex penalty functions. The convergence property of MLLQA and oracle property of one-step MLLQA estimator are established. Some simulations and application to a real data set are also presented.

2018 ◽  
Vol 34 (3) ◽  
pp. 364-380
Author(s):  
Daoyuan Shi ◽  
Lynn Kuo

The variable selection has been an important topic in regression and Bayesian survival analysis. In the era of rapid development of genomics and precision medicine, the topic is becoming more important and challenging. In addition to the challenges of handling censored data in survival analysis, we are facing increasing demand of handling big data with too many predictors where most of them may not be relevant to the prediction of the survival outcome. With the desire of improving upon the accuracy of prediction, we explore the Bregman divergence criterion in selecting predictive models. We develop sparse Bayesian formulation for parametric regression and semiparametric regression models and demonstrate how variable selection is done using the predictive approach. Model selections for a simulated data set, and two real-data sets (one for a kidney transplant study, and the other for a breast cancer microarray study at the Memorial Sloan-Kettering Cancer Center) are carried out to illustrate our methods.


2011 ◽  
Vol 93 (6) ◽  
pp. 409-417 ◽  
Author(s):  
PASCAL CROISEAU ◽  
ANDRÉS LEGARRA ◽  
FRANÇOIS GUILLAUME ◽  
SÉBASTIEN FRITZ ◽  
AURÉLIA BAUR ◽  
...  

SummaryFor genomic selection methods, the statistical challenge is to estimate the effect of each of the available single-nucleotide polymorphism (SNP). In a context where the number of SNPs (p) is much higher than the number of bulls (n), this task may lead to a poor estimation of these SNP effects if, as for genomic BLUP (gBLUP), all SNPs have a non-null effect. An alternative is to use approaches that have been developed specifically to solve the ‘p>>n’ problem. This is the case of variable selection methods and among them, we focus on the Elastic-Net (EN) algorithm that is a penalized regression approach. Performances of EN, gBLUP and pedigree-based BLUP were compared with data from three French dairy cattle breeds, giving very encouraging results for EN. We tried to push further the idea of improving SNP effect estimates by considering fewer of them. This variable selection strategy was considered both in the case of gBLUP and EN by adding an SNP pre-selection step based on quantitative trait locus (QTL) detection. Similar results were observed with or without a pre-selection step, in terms of correlations between direct genomic value (DGV) and observed daughter yield deviation in a validation data set. However, when applied to the EN algorithm, this strategy led to a substantial reduction of the number of SNPs included in the prediction equation. In a context where the number of genotyped animals and the number of SNPs gets larger and larger, SNP pre-selection strongly alleviates computing requirements and ensures that national evaluations can be completed within a reasonable time frame.


2018 ◽  
Vol 2018 ◽  
pp. 1-9
Author(s):  
Yuping Hu ◽  
Sanying Feng ◽  
Liugen Xue

We introduce a new partially linear functional additive model, and we consider the problem of variable selection for this model. Based on the functional principal components method and the centered spline basis function approximation, a new variable selection procedure is proposed by using the smooth-threshold estimating equation (SEE). The proposed procedure automatically eliminates inactive predictors by setting the corresponding parameters to be zero and simultaneously estimates the nonzero regression coefficients by solving the SEE. The approach avoids the convex optimization problem, and it is flexible and easy to implement. We establish the asymptotic properties of the resulting estimators under some regularity conditions. We apply the proposed procedure to analyze a real data set: the Tecator data set.


2019 ◽  
Vol 22 (1) ◽  
pp. 34-56 ◽  
Author(s):  
Yoshimasa Uematsu ◽  
Shinya Tanaka

Summary This study examines high-dimensional forecasting and variable selection via folded-concave penalized regressions. The penalized regression approach leads to sparse estimates of the regression coefficients and allows the dimensionality of the model to be much larger than the sample size. First, we discuss the theoretical aspects of a penalized regression in a time series setting. Specifically, we show the oracle inequality with ultra-high-dimensional time-dependent regressors. Then we show the validity of the penalized regression using two empirical applications. First, we forecast quarterly US gross domestic product data using a high-dimensional monthly data set and the mixed data sampling (MIDAS) framework with penalization. Second, we examine how well the penalized regression screens a hidden portfolio based on a large New York Stock Exchange stock price data set. Both applications show that a penalized regression provides remarkable results in terms of forecasting performance and variable selection.


2020 ◽  
Vol 17 (2) ◽  
pp. 0550
Author(s):  
Ali Hameed Yousef ◽  
Omar Abdulmohsin Ali

         The issue of penalized regression model has received considerable critical attention to variable selection. It plays an essential role in dealing with high dimensional data. Arctangent denoted by the Atan penalty has been used in both estimation and variable selection as an efficient method recently. However, the Atan penalty is very sensitive to outliers in response to variables or heavy-tailed error distribution. While the least absolute deviation is a good method to get robustness in regression estimation. The specific objective of this research is to propose a robust Atan estimator from combining these two ideas at once. Simulation experiments and real data applications show that the proposed LAD-Atan estimator has superior performance compared with other estimators.  


Author(s):  
Manpreet Kaur ◽  
Chamkaur Singh

Educational Data Mining (EDM) is an emerging research area help the educational institutions to improve the performance of their students. Feature Selection (FS) algorithms remove irrelevant data from the educational dataset and hence increases the performance of classifiers used in EDM techniques. This paper present an analysis of the performance of feature selection algorithms on student data set. .In this papers the different problems that are defined in problem formulation. All these problems are resolved in future. Furthermore the paper is an attempt of playing a positive role in the improvement of education quality, as well as guides new researchers in making academic intervention.


2019 ◽  
Vol XVI (2) ◽  
pp. 1-11
Author(s):  
Farrukh Jamal ◽  
Hesham Mohammed Reyad ◽  
Soha Othman Ahmed ◽  
Muhammad Akbar Ali Shah ◽  
Emrah Altun

A new three-parameter continuous model called the exponentiated half-logistic Lomax distribution is introduced in this paper. Basic mathematical properties for the proposed model were investigated which include raw and incomplete moments, skewness, kurtosis, generating functions, Rényi entropy, Lorenz, Bonferroni and Zenga curves, probability weighted moment, stress strength model, order statistics, and record statistics. The model parameters were estimated by using the maximum likelihood criterion and the behaviours of these estimates were examined by conducting a simulation study. The applicability of the new model is illustrated by applying it on a real data set.


Author(s):  
Parisa Torkaman

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.


2018 ◽  
Vol 21 (2) ◽  
pp. 117-124 ◽  
Author(s):  
Bakhtyar Sepehri ◽  
Nematollah Omidikia ◽  
Mohsen Kompany-Zareh ◽  
Raouf Ghavami

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.


2019 ◽  
Vol 14 (2) ◽  
pp. 148-156
Author(s):  
Nighat Noureen ◽  
Sahar Fazal ◽  
Muhammad Abdul Qadir ◽  
Muhammad Tanvir Afzal

Background: Specific combinations of Histone Modifications (HMs) contributing towards histone code hypothesis lead to various biological functions. HMs combinations have been utilized by various studies to divide the genome into different regions. These study regions have been classified as chromatin states. Mostly Hidden Markov Model (HMM) based techniques have been utilized for this purpose. In case of chromatin studies, data from Next Generation Sequencing (NGS) platforms is being used. Chromatin states based on histone modification combinatorics are annotated by mapping them to functional regions of the genome. The number of states being predicted so far by the HMM tools have been justified biologically till now. Objective: The present study aimed at providing a computational scheme to identify the underlying hidden states in the data under consideration. </P><P> Methods: We proposed a computational scheme HCVS based on hierarchical clustering and visualization strategy in order to achieve the objective of study. Results: We tested our proposed scheme on a real data set of nine cell types comprising of nine chromatin marks. The approach successfully identified the state numbers for various possibilities. The results have been compared with one of the existing models as well which showed quite good correlation. Conclusion: The HCVS model not only helps in deciding the optimal state numbers for a particular data but it also justifies the results biologically thereby correlating the computational and biological aspects.


Sign in / Sign up

Export Citation Format

Share Document