A mixture of local and quadratic approximation variable selection algorithm in nonconcave penalized regression

Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées ◽

10.46298/arima.1962 ◽

2013 ◽

Vol Volume 16, 2012 ◽

Author(s):

Assi N'GUESSAN ◽

Ibrahim Sidi Zakari ◽

Assi Mkhadri

Keyword(s):

Variable Selection ◽

Penalized Likelihood ◽

Real Data ◽

Penalized Regression ◽

Penalty Functions ◽

Selection Algorithm ◽

Data Set ◽

International Audience ◽

One Step ◽

Nonconvex Penalty

International audience We consider the problem of variable selection via penalized likelihood using nonconvex penalty functions. To maximize the non-differentiable and nonconcave objective function, an algorithm based on local linear approximation and which adopts a naturally sparse representation was recently proposed. However, although it has promising theoretical properties, it inherits some drawbacks of Lasso in high dimensional setting. To overcome these drawbacks, we propose an algorithm (MLLQA) for maximizing the penalized likelihood for a large class of nonconvex penalty functions. The convergence property of MLLQA and oracle property of one-step MLLQA estimator are established. Some simulations and application to a real data set are also presented.

Download Full-text

VARIABLE SELECTION FOR BAYESIAN SURVIVAL MODELS USING BREGMAN DIVERGENCE MEASURE

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964818000190 ◽

2018 ◽

Vol 34 (3) ◽

pp. 364-380

Author(s):

Daoyuan Shi ◽

Lynn Kuo

Keyword(s):

Survival Analysis ◽

Variable Selection ◽

Rapid Development ◽

Simulated Data ◽

Semiparametric Regression ◽

Real Data ◽

Survival Models ◽

Cancer Center ◽

Bregman Divergence ◽

Data Set

The variable selection has been an important topic in regression and Bayesian survival analysis. In the era of rapid development of genomics and precision medicine, the topic is becoming more important and challenging. In addition to the challenges of handling censored data in survival analysis, we are facing increasing demand of handling big data with too many predictors where most of them may not be relevant to the prediction of the survival outcome. With the desire of improving upon the accuracy of prediction, we explore the Bregman divergence criterion in selecting predictive models. We develop sparse Bayesian formulation for parametric regression and semiparametric regression models and demonstrate how variable selection is done using the predictive approach. Model selections for a simulated data set, and two real-data sets (one for a kidney transplant study, and the other for a breast cancer microarray study at the Memorial Sloan-Kettering Cancer Center) are carried out to illustrate our methods.

Download Full-text

Fine tuning genomic evaluations in dairy cattle through SNP pre-selection with the Elastic-Net algorithm

Genetics Research ◽

10.1017/s0016672311000358 ◽

2011 ◽

Vol 93 (6) ◽

pp. 409-417 ◽

Cited By ~ 16

Author(s):

PASCAL CROISEAU ◽

ANDRÉS LEGARRA ◽

FRANÇOIS GUILLAUME ◽

SÉBASTIEN FRITZ ◽

AURÉLIA BAUR ◽

...

Keyword(s):

Variable Selection ◽

Dairy Cattle ◽

Substantial Reduction ◽

Penalized Regression ◽

Elastic Net ◽

Fine Tuning ◽

Selection Methods ◽

Validation Data ◽

Data Set ◽

Selection Step

SummaryFor genomic selection methods, the statistical challenge is to estimate the effect of each of the available single-nucleotide polymorphism (SNP). In a context where the number of SNPs (p) is much higher than the number of bulls (n), this task may lead to a poor estimation of these SNP effects if, as for genomic BLUP (gBLUP), all SNPs have a non-null effect. An alternative is to use approaches that have been developed specifically to solve the ‘p>>n’ problem. This is the case of variable selection methods and among them, we focus on the Elastic-Net (EN) algorithm that is a penalized regression approach. Performances of EN, gBLUP and pedigree-based BLUP were compared with data from three French dairy cattle breeds, giving very encouraging results for EN. We tried to push further the idea of improving SNP effect estimates by considering fewer of them. This variable selection strategy was considered both in the case of gBLUP and EN by adding an SNP pre-selection step based on quantitative trait locus (QTL) detection. Similar results were observed with or without a pre-selection step, in terms of correlations between direct genomic value (DGV) and observed daughter yield deviation in a validation data set. However, when applied to the EN algorithm, this strategy led to a substantial reduction of the number of SNPs included in the prediction equation. In a context where the number of genotyped animals and the number of SNPs gets larger and larger, SNP pre-selection strongly alleviates computing requirements and ensures that national evaluations can be completed within a reasonable time frame.

Download Full-text

Automatic Variable Selection for Partially Linear Functional Additive Model and Its Application to the Tecator Data Set

Mathematical Problems in Engineering ◽

10.1155/2018/5683539 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9

Author(s):

Yuping Hu ◽

Sanying Feng ◽

Liugen Xue

Keyword(s):

Variable Selection ◽

Linear Functional ◽

Additive Model ◽

Selection Procedure ◽

Real Data ◽

Regularity Conditions ◽

Data Set ◽

Partially Linear ◽

Functional Additive ◽

Selection For

We introduce a new partially linear functional additive model, and we consider the problem of variable selection for this model. Based on the functional principal components method and the centered spline basis function approximation, a new variable selection procedure is proposed by using the smooth-threshold estimating equation (SEE). The proposed procedure automatically eliminates inactive predictors by setting the corresponding parameters to be zero and simultaneously estimates the nonzero regression coefficients by solving the SEE. The approach avoids the convex optimization problem, and it is flexible and easy to implement. We establish the asymptotic properties of the resulting estimators under some regularity conditions. We apply the proposed procedure to analyze a real data set: the Tecator data set.

Download Full-text

High‐dimensional macroeconomic forecasting and variable selection via penalized regression

Econometrics Journal ◽

10.1111/ectj.12117 ◽

2019 ◽

Vol 22 (1) ◽

pp. 34-56 ◽

Cited By ~ 9

Author(s):

Yoshimasa Uematsu ◽

Shinya Tanaka

Keyword(s):

New York ◽

Variable Selection ◽

Stock Price ◽

Stock Exchange ◽

Penalized Regression ◽

Mixed Data ◽

High Dimensional ◽

Data Sampling ◽

Data Set ◽

Macroeconomic Forecasting

Summary This study examines high-dimensional forecasting and variable selection via folded-concave penalized regressions. The penalized regression approach leads to sparse estimates of the regression coefficients and allows the dimensionality of the model to be much larger than the sample size. First, we discuss the theoretical aspects of a penalized regression in a time series setting. Specifically, we show the oracle inequality with ultra-high-dimensional time-dependent regressors. Then we show the validity of the penalized regression using two empirical applications. First, we forecast quarterly US gross domestic product data using a high-dimensional monthly data set and the mixed data sampling (MIDAS) framework with penalization. Second, we examine how well the penalized regression screens a hidden portfolio based on a large New York Stock Exchange stock price data set. Both applications show that a penalized regression provides remarkable results in terms of forecasting performance and variable selection.

Download Full-text

Proposing Robust LAD-Atan Penalty of Regression Model Estimation for High Dimensional Data

Baghdad Science Journal ◽

10.21123/bsj.2020.17.2.0550 ◽

2020 ◽

Vol 17 (2) ◽

pp. 0550

Author(s):

Ali Hameed Yousef ◽

Omar Abdulmohsin Ali

Keyword(s):

Variable Selection ◽

Regression Model ◽

High Dimensional Data ◽

Real Data ◽

Penalized Regression ◽

Good Method ◽

Superior Performance ◽

High Dimensional ◽

Absolute Deviation ◽

Heavy Tailed

The issue of penalized regression model has received considerable critical attention to variable selection. It plays an essential role in dealing with high dimensional data. Arctangent denoted by the Atan penalty has been used in both estimation and variable selection as an efficient method recently. However, the Atan penalty is very sensitive to outliers in response to variables or heavy-tailed error distribution. While the least absolute deviation is a good method to get robustness in regression estimation. The specific objective of this research is to propose a robust Atan estimator from combining these two ideas at once. Simulation experiments and real data applications show that the proposed LAD-Atan estimator has superior performance compared with other estimators.

Download Full-text

An Enhancement of Feature Selection Algorithm for EDM: A Review

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v8i5.661 ◽

2018 ◽

Vol 8 (5) ◽

pp. 29

Author(s):

Manpreet Kaur ◽

Chamkaur Singh

Keyword(s):

Feature Selection ◽

Educational Data Mining ◽

Problem Formulation ◽

Research Area ◽

Education Quality ◽

Educational Institutions ◽

Selection Algorithm ◽

Positive Role ◽

Data Set ◽

Selection Algorithms

Educational Data Mining (EDM) is an emerging research area help the educational institutions to improve the performance of their students. Feature Selection (FS) algorithms remove irrelevant data from the educational dataset and hence increases the performance of classifiers used in EDM techniques. This paper present an analysis of the performance of feature selection algorithms on student data set. .In this papers the different problems that are defined in problem formulation. All these problems are resolved in future. Furthermore the paper is an attempt of playing a positive role in the improvement of education quality, as well as guides new researchers in making academic intervention.

Download Full-text

EXPONENTIATED HALF-LOGISTIC LOMAX DISTRIBUTION WITH PROPERTIES AND APPLICATION

NED University Journal of Research ◽

10.35453/nedjr-ascn-2018-0033 ◽

2019 ◽

Vol XVI (2) ◽

pp. 1-11

Author(s):

Farrukh Jamal ◽

Hesham Mohammed Reyad ◽

Soha Othman Ahmed ◽

Muhammad Akbar Ali Shah ◽

Emrah Altun

Keyword(s):

Real Data ◽

Continuous Model ◽

Model Parameters ◽

Data Set ◽

Lomax Distribution ◽

Mathematical Properties ◽

Proposed Model ◽

Probability Weighted Moment ◽

Record Statistics ◽

Maximum Likelihood Criterion

A new three-parameter continuous model called the exponentiated half-logistic Lomax distribution is introduced in this paper. Basic mathematical properties for the proposed model were investigated which include raw and incomplete moments, skewness, kurtosis, generating functions, Rényi entropy, Lorenz, Bonferroni and Zenga curves, probability weighted moment, stress strength model, order statistics, and record statistics. The model parameters were estimated by using the maximum likelihood criterion and the behaviours of these estimates were examined by conducting a simulation study. The applicability of the new model is illustrated by applying it on a real data set.

Download Full-text

Evaluation for estimating of the PDF and the CDF of Generalized Inverted Exponential Distribution with Application in Industry

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.1.39 ◽

2020 ◽

pp. 507-522

Author(s):

Parisa Torkaman

Keyword(s):

Least Squares ◽

Exponential Distribution ◽

Mean Squared Error ◽

Weighted Least Squares ◽

Real Data ◽

Minimum Variance ◽

Cumulative Distribution ◽

Estimation Methods ◽

Data Set ◽

Better Than

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.

Download Full-text

Predictive and Descriptive CoMFA Models: The Effect of Variable Selection

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207321666180212162028 ◽

2018 ◽

Vol 21 (2) ◽

pp. 117-124 ◽

Cited By ~ 4

Author(s):

Bakhtyar Sepehri ◽

Nematollah Omidikia ◽

Mohsen Kompany-Zareh ◽

Raouf Ghavami

Keyword(s):

Variable Selection ◽

Predictive Power ◽

Selection Method ◽

Data Sets ◽

Data Set ◽

Comfa Model ◽

Variable Selection Method

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.

Download Full-text

HCVS: Pinpointing Chromatin States Through Hierarchical Clustering and Visualization Scheme

Current Bioinformatics ◽

10.2174/1574893613666180402141107 ◽

2019 ◽

Vol 14 (2) ◽

pp. 148-156

Author(s):

Nighat Noureen ◽

Sahar Fazal ◽

Muhammad Abdul Qadir ◽

Muhammad Tanvir Afzal

Keyword(s):

Hierarchical Clustering ◽

Real Data ◽

Cell Types ◽

Computational Scheme ◽

Data Set ◽

Chromatin States ◽

Functional Regions ◽

Visualization Strategy ◽

Hidden States ◽

Next Generation Sequencing Ngs

Background: Specific combinations of Histone Modifications (HMs) contributing towards histone code hypothesis lead to various biological functions. HMs combinations have been utilized by various studies to divide the genome into different regions. These study regions have been classified as chromatin states. Mostly Hidden Markov Model (HMM) based techniques have been utilized for this purpose. In case of chromatin studies, data from Next Generation Sequencing (NGS) platforms is being used. Chromatin states based on histone modification combinatorics are annotated by mapping them to functional regions of the genome. The number of states being predicted so far by the HMM tools have been justified biologically till now. Objective: The present study aimed at providing a computational scheme to identify the underlying hidden states in the data under consideration. </P><P> Methods: We proposed a computational scheme HCVS based on hierarchical clustering and visualization strategy in order to achieve the objective of study. Results: We tested our proposed scheme on a real data set of nine cell types comprising of nine chromatin marks. The approach successfully identified the state numbers for various possibilities. The results have been compared with one of the existing models as well which showed quite good correlation. Conclusion: The HCVS model not only helps in deciding the optimal state numbers for a particular data but it also justifies the results biologically thereby correlating the computational and biological aspects.

Download Full-text