Variable Selection and Regularization in Quantile Regression via Minimum Covariance Determinant Based Weights

Edmore Ranganai; Innocent Mudhombo

doi:10.3390/e23010033

Variable Selection and Regularization in Quantile Regression via Minimum Covariance Determinant Based Weights

Entropy ◽

10.3390/e23010033 ◽

2020 ◽

Vol 23 (1) ◽

pp. 33

Author(s):

Edmore Ranganai ◽

Innocent Mudhombo

Keyword(s):

Variable Selection ◽

Quantile Regression ◽

Data Sets ◽

Least Absolute Deviation ◽

Minimum Covariance Determinant ◽

Absolute Deviation ◽

Leverage Points ◽

Influential Points ◽

Computationally Intensive ◽

Space Data

The importance of variable selection and regularization procedures in multiple regression analysis cannot be overemphasized. These procedures are adversely affected by predictor space data aberrations as well as outliers in the response space. To counter the latter, robust statistical procedures such as quantile regression which generalizes the well-known least absolute deviation procedure to all quantile levels have been proposed in the literature. Quantile regression is robust to response variable outliers but very susceptible to outliers in the predictor space (high leverage points) which may alter the eigen-structure of the predictor matrix. High leverage points that alter the eigen-structure of the predictor matrix by creating or hiding collinearity are referred to as collinearity influential points. In this paper, we suggest generalizing the penalized weighted least absolute deviation to all quantile levels, i.e., to penalized weighted quantile regression using the RIDGE, LASSO, and elastic net penalties as a remedy against collinearity influential points and high leverage points in general. To maintain robustness, we make use of very robust weights based on the computationally intensive high breakdown minimum covariance determinant. Simulations and applications to well-known data sets from the literature show an improvement in variable selection and regularization due to the robust weighting formulation.

Download Full-text

Least absolute deviation estimator‐bridge variable selection and estimation for quantitative structure–activity relationship model

Journal of Chemometrics ◽

10.1002/cem.3139 ◽

2019 ◽

Vol 33 (7) ◽

Author(s):

Zainab Tawfeeq Al‐Dabbagh ◽

Zakariya Yahya Algamal

Keyword(s):

Variable Selection ◽

Quantitative Structure Activity Relationship ◽

Structure Activity Relationship ◽

Activity Relationship ◽

Quantitative Structure ◽

Least Absolute Deviation ◽

Absolute Deviation ◽

Structure Activity ◽

Relationship Model ◽

Variable Selection And Estimation

Download Full-text

Analysis of quantile regression as alternative to ordinary least squares

International Journal of Advanced Statistics and Probability ◽

10.14419/ijasp.v3i2.4686 ◽

2015 ◽

Vol 3 (2) ◽

pp. 138

Author(s):

Ibrahim Abdullahi ◽

Abubakar Yahaya

Keyword(s):

Analytical Solution ◽

Quantile Regression ◽

Least Squares ◽

Fuel Consumption ◽

Goodness Of Fit ◽

Ordinary Least Squares ◽

Coefficient Of Determination ◽

Test Statistics ◽

Least Absolute Deviation ◽

Absolute Deviation

<p>In this article, an alternative to ordinary least squares (OLS) regression based on analytical solution in the Statgraphics software is considered, and this alternative is no other than quantile regression (QR) model. We also present goodness of fit statistic as well as approximate distributions of the associated test statistics for the parameters. Furthermore, we suggest a goodness of fit statistic called the least absolute deviation (LAD) coefficient of determination. The procedure is well presented, illustrated and validated by a numerical example based on publicly available dataset on fuel consumption in miles per gallon in highway driving.</p>

Download Full-text

Predictive and Descriptive CoMFA Models: The Effect of Variable Selection

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207321666180212162028 ◽

2018 ◽

Vol 21 (2) ◽

pp. 117-124 ◽

Cited By ~ 4

Author(s):

Bakhtyar Sepehri ◽

Nematollah Omidikia ◽

Mohsen Kompany-Zareh ◽

Raouf Ghavami

Keyword(s):

Variable Selection ◽

Predictive Power ◽

Selection Method ◽

Data Sets ◽

Data Set ◽

Comfa Model ◽

Variable Selection Method

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.

Download Full-text

Performance of smoothly clipped absolute deviation as a variable selection method in the artificial neural network‐based QSAR studies

Journal of Chemometrics ◽

10.1002/cem.3338 ◽

2021 ◽

Author(s):

Zeinab Mozafari ◽

Mansour Arab Chamjangali ◽

Mohammad Arashi ◽

Nasser Goudarzi

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Variable Selection ◽

Selection Method ◽

Absolute Deviation ◽

Qsar Studies ◽

Variable Selection Method ◽

Smoothly Clipped Absolute Deviation ◽

Artificial Neural

Download Full-text

Robust communication-efficient distributed composite quantile regression and variable selection for massive data

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2021.107262 ◽

2021 ◽

pp. 107262

Author(s):

Kangning Wang ◽

Shaomin Li ◽

Benle Zhang

Keyword(s):

Variable Selection ◽

Quantile Regression ◽

Massive Data ◽

Composite Quantile Regression ◽

Selection For

Download Full-text

Cluster-based least absolute deviation regression for dimension reduction

Journal of Statistical Theory and Practice ◽

10.1080/15598608.2015.1095136 ◽

2015 ◽

Vol 10 (1) ◽

pp. 121-132 ◽

Cited By ~ 1

Author(s):

Yuexiao Dong ◽

Chaozheng Yang

Keyword(s):

Dimension Reduction ◽

Least Absolute Deviation ◽

Absolute Deviation ◽

Least Absolute Deviation Regression

Download Full-text

ON GENERATING DIGITAL ELEVATION MODELS FROM LIDAR DATA – RESOLUTION VERSUS ACCURACY AND TOPOGRAPHIC WETNESS INDEX INDICES IN NORTHERN PEATLANDS

Geodesy and Cartography ◽

10.3846/20296991.2012.702983 ◽

2012 ◽

Vol 38 (2) ◽

pp. 57-69 ◽

Cited By ~ 12

Author(s):

Abdulghani Hasan ◽

Petter Pilesjö ◽

Andreas Persson

Keyword(s):

Large Scale ◽

Drainage Area ◽

Data Sets ◽

Topographic Wetness Index ◽

Absolute Deviation ◽

Digital Elevation ◽

Elevation Data ◽

Scale Modelling ◽

Data Points ◽

Emission Modelling

Global change and GHG emission modelling are dependent on accurate wetness estimations for predictions of e.g. methane emissions. This study aims to quantify how the slope, drainage area and the TWI vary with the resolution of DEMs for a flat peatland area. Six DEMs with spatial resolutions from 0.5 to 90 m were interpolated with four different search radiuses. The relationship between accuracy of the DEM and the slope was tested. The LiDAR elevation data was divided into two data sets. The number of data points facilitated an evaluation dataset with data points not more than 10 mm away from the cell centre points in the interpolation dataset. The DEM was evaluated using a quantile-quantile test and the normalized median absolute deviation. It showed independence of the resolution when using the same search radius. The accuracy of the estimated elevation for different slopes was tested using the 0.5 meter DEM and it showed a higher deviation from evaluation data for steep areas. The slope estimations between resolutions showed differences with values that exceeded 50%. Drainage areas were tested for three resolutions, with coinciding evaluation points. The model ability to generate drainage area at each resolution was tested by pair wise comparison of three data subsets and showed differences of more than 50% in 25% of the evaluated points. The results show that consideration of DEM resolution is a necessity for the use of slope, drainage area and TWI data in large scale modelling.

Download Full-text

Modelling disparities in health services utilisation for older Blacks: a quantile regression framework

Ageing and Society ◽

10.1017/s0144686x14000440 ◽

2014 ◽

Vol 35 (8) ◽

pp. 1657-1683 ◽

Cited By ~ 4

Author(s):

ANDY SHARMA

Keyword(s):

Health Services ◽

Quantile Regression ◽

Scale Effects ◽

Ethnic Disparities ◽

The United States ◽

Models Of Care ◽

Medical Intervention ◽

Medical Provider ◽

Absolute Deviation ◽

The Best Approximation

ABSTRACTWith the on-going ageing of the United States population, resolving health disparities continues to be a prominent and worthwhile goal, particularly in the areas of promoting minority health and reducing racial/ethnic disparities. This analysis employs the 2004 and 2005 Household Component records from the Medical Expenditures Panel Survey, which correspond to data files H89 and H97, to examine utilisation by race across the entire distribution function; more specifically, applying the behavioural model of health services utilisation and employing a Quantile Regression (QR) framework. This is a noteworthy contribution because the conditional mean may not be the best approximation for a skewed-location distribution. In contrast, QR is robust to outliers and scale effects since the estimation minimises least absolute deviation. The sample consists of 2,525 older adults at least 65 years of age with 303 corresponding to Black and 2,222 corresponding to White. Results suggest older Blacks continue to utilise health services (i.e. office or clinic visits with a physician or medical provider) at lower levels and this is more pronounced at and below the median quantile (i.e. below the 50th cut-off). Usual source of care (USC) continues to play an important role. Beliefs surrounding the need for insurance and medical intervention are also significant and explain some of the racial disparities. Although utilisation disparities persist for older Blacks, collaborative and flexible models of care can reach this group.

Download Full-text

Seasonal prediction of summertime tropical cyclone activity over the East China Sea using the least absolute deviation regression and the Poisson regression

International Journal of Climatology ◽

10.1002/joc.1878 ◽

2009 ◽

pp. n/a-n/a ◽

Cited By ~ 4

Author(s):

Hyeong-Seog Kim ◽

Chang-Hoi Ho ◽

Pao-Shin Chu ◽

Joo-Hong Kim

Keyword(s):

Tropical Cyclone ◽

East China Sea ◽

Poisson Regression ◽

Seasonal Prediction ◽

Tropical Cyclone Activity ◽

Least Absolute Deviation ◽

The East China Sea ◽

Cyclone Activity ◽

Absolute Deviation ◽

Least Absolute Deviation Regression

Download Full-text

VISUAL APPROACH TO SUPERVISED VARIABLE SELECTION BY SELF-ORGANIZING MAP

International Journal of Neural Systems ◽

10.1142/s0129065705000098 ◽

2005 ◽

Vol 15 (01n02) ◽

pp. 101-110 ◽

Cited By ~ 1

Author(s):

TIMO SIMILÄ ◽

SAMPSA LAINE

Keyword(s):

Variable Selection ◽

The Self ◽

Data Sets ◽

Self Organizing Map ◽

Robust Method ◽

Relevant Variables ◽

Visual Approach ◽

Predefined Criterion ◽

Target Data ◽

Self Organizing

Practical data analysis often encounters data sets with both relevant and useless variables. Supervised variable selection is the task of selecting the relevant variables based on some predefined criterion. We propose a robust method for this task. The user manually selects a set of target variables and trains a Self-Organizing Map with these data. This sets a criterion to variable selection and is an illustrative description of the user's problem, even for multivariate target data. The user also defines another set of variables that are potentially related to the problem. Our method returns a subset of these variables, which best corresponds to the description provided by the Self-Organizing Map and, thus, agrees with the user's understanding about the problem. The method is conceptually simple and, based on experiments, allows an accessible approach to supervised variable selection.

Download Full-text