Comparison of robust logistic regression estimators for variables with generalized extreme value distributions

Şaban Kızılarslan; Ceren Camkıran

doi:10.3233/mas-210531

Comparison of robust logistic regression estimators for variables with generalized extreme value distributions

Model Assisted Statistics and Applications ◽

10.3233/mas-210531 ◽

2021 ◽

Vol 16 (3) ◽

pp. 177-187

Author(s):

Şaban Kızılarslan ◽

Ceren Camkıran

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Extreme Values ◽

Real Data ◽

Extreme Value ◽

Small Samples ◽

Robust Estimators ◽

Generalized Extreme Value ◽

Explanatory Variables

The aim of this study is to compare the performance of robust estimators in the presence of explanatory variables with Generalized Extreme Value (GEV) distributions in the logistic regression model. Existence of extreme values in the logistic regression model negatively affects the bias and effectiveness of classical Maximum Likelihood (ML) estimators. For this reason, robust estimators that are less sensitive to extreme values have been developed. Random variables with extreme values may be fit in one of specific distributions. In study, the GEV distribution family was examined and five robust estimators were compared for the Fréchet, Gumbel and Weibull distributions. To the simulation results, the CUBIF estimator is prominent according to both bias and efficiency criteria for small samples. In medium and large samples, while the MALLOWS estimator has the minimum bias, the CUBIF estimator has the best efficiency. The same results apply for different contamination ratios and different scale parameter values of the distributions. Simulation findings were supported by a meteorological real data application.

Download Full-text

A case study of Stroke patients in Senegal: application of Generalized extreme value regression model

African Journal of Applied Statistics ◽

10.16929/ajas/2021.1497.259 ◽

2021 ◽

Vol 8 (1) ◽

pp. 1497-1506

Author(s):

Aba Dio ◽

El Hadji Dème ◽

Idrissa Sy ◽

Aliou Diop

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Binary Data ◽

Logistic Regression Model ◽

Rare Event ◽

Quantile Function ◽

Extreme Value ◽

Binary Response ◽

Generalized Extreme Value ◽

Binary Response Variable

Logistic regression model is widely used in many studies to investigate the relationship between a binary response variable Y and a set of potential predictors X. The binary response may represent, for example, the occurrence of some outcome of interest (Y=1 if the outcome occurred and Y=0 otherwise). When the dependent variable Y represents a rare event, the logistic regression model shows relevant drawbacks. In order to overcome these drawbacks we propose the Generalized Extreme Value (GEV) regression model. In particularly, we suggest the quantile function of the GEV distribution as link function. Strokes are a serious pathology and a neurological emergency involving the vital prognosis and the functional prognosis. In Senegal, strokes account for more than 30% of hospitalizations and are responsible for nearly two thirds of mortality. In this work, we use the GVE regression model for binary data to determine the risk factors leading to stroke and to develop a predictive model of life-threatening outcomes in central Sénégal.

Download Full-text

Selected Robust Logistic Regression Specification for Classification of Multi‑dimensional Functional Data in Presence of Outlier

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.334.04 ◽

2018 ◽

Vol 2 (334) ◽

Author(s):

Mirosław Krzyśko ◽

Łukasz Smaga

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Functional Data ◽

Logistic Regression Model ◽

Binary Classification ◽

Classification Problem ◽

Classification Rule ◽

Unknown Parameters ◽

Explanatory Variables

In this paper, the binary classification problem of multi‑dimensional functional data is considered. To solve this problem a regression technique based on functional logistic regression model is used. This model is re‑expressed as a particular logistic regression model by using the basis expansions of functional coefficients and explanatory variables. Based on re‑expressed model, a classification rule is proposed. To handle with outlying observations, robust methods of estimation of unknown parameters are also considered. Numerical experiments suggest that the proposed methods may behave satisfactory in practice.

Download Full-text

Methodology for the validation of the credit scoring model of the retail portfolio

10.18411/lj-05-2021-265 ◽

2021 ◽

Vol 73 (7) ◽

pp. 41-44

Author(s):

Y.S. Zhieru

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Final Stage ◽

Logistic Regression Model ◽

Credit Scoring ◽

Real Data ◽

Scoring Model ◽

Credit Scoring Model

The final stage of constructing a logistic regression model is checking its validity and testing it on real data. The degree of validity of a logistic regression model is evidenced by its ability to correctly classify borrowers, the model's ability to distinguish "good" borrowers from "bad" borrowers.

Download Full-text

Directly and Simultaneously Expressing Absolute and Relative Treatment Effects in Medical Data Models and Applications

Entropy ◽

10.3390/e23111517 ◽

2021 ◽

Vol 23 (11) ◽

pp. 1517

Author(s):

Hao Yang Teng ◽

Zhengjun Zhang

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Regression Models ◽

Logistic Regression Model ◽

Treatment Effects ◽

Real Data ◽

Medical Data ◽

Severe Problem ◽

New Model ◽

Logistic Regression Models

Logistic regression is widely used in the analysis of medical data with binary outcomes to study treatment effects through (absolute) treatment effect parameters in the models. However, the indicative parameters of relative treatment effects are not introduced in logistic regression models, which can be a severe problem in efficiently modeling treatment effects and lead to the wrong conclusions with regard to treatment effects. This paper introduces a new enhanced logistic regression model that offers a new way of studying treatment effects by measuring the relative changes in the treatment effects and also incorporates the way in which logistic regression models the treatment effects. The new model, called the Absolute and Relative Treatment Effects (AbRelaTEs) model, is viewed as a generalization of logistic regression and an enhanced model with increased flexibility, interpretability, and applicability in real data applications than the logistic regression. The AbRelaTEs model is capable of modeling significant treatment effects via an absolute or relative or both ways. The new model can be easily implemented using statistical software, with the logistic regression model being treated as a special case. As a result, the classical logistic regression models can be replaced by the AbRelaTEs model to gain greater applicability and have a new benchmark model for more efficiently studying treatment effects in clinical trials, economic developments, and many applied areas. Moreover, the estimators of the coefficients are consistent and asymptotically normal under regularity conditions. In both simulation and real data applications, the model provides both significant and more meaningful results.

Download Full-text

Bayesian Variable Selection for Semiparametric Logistic Regression

Al-Qadisiyah Journal Of Pure Science ◽

10.29350/qjps.2021.26.5.1460 ◽

2021 ◽

Vol 26 (5) ◽

pp. 44-57

Author(s):

Zainab Sami ◽

Taha Alshaybawee

Keyword(s):

Logistic Regression ◽

Variable Selection ◽

Regression Model ◽

Logistic Regression Model ◽

Numerical Data ◽

Real Data ◽

Bayesian Variable Selection ◽

Bayesian Lasso ◽

Posterior Inference ◽

Semiparametric Logistic Regression

Lasso variable selection is an attractive approach to improve the prediction accuracy. Bayesian lasso approach is suggested to estimate and select the important variables for single index logistic regression model. Laplace distribution is set as prior to the coefficients vector and prior to the unknown link function (Gaussian process). A hierarchical Bayesian lasso semiparametric logistic regression model is constructed and MCMC algorithm is adopted for posterior inference. To evaluate the performance of the proposed method BSLLR is through comparing it to three existing methods BLR, BPR and BBQR. Simulation examples and numerical data are to be considered. The results indicate that the proposed method get the smallest bias, SD, MSE and MAE in simulation and real data. The proposed method BSLLR performs better than other methods.

Download Full-text

Diagnosing Multicollinearity of Logistic Regression Model

Asian Journal of Probability and Statistics ◽

10.9734/ajpas/2019/v5i230132 ◽

2019 ◽

pp. 1-9 ◽

Cited By ~ 6

Author(s):

N. A. M. R. Senaviratna ◽

T. M. J. A. Cooray

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Sample Size ◽

Logistic Regression Model ◽

Secondary Data ◽

Binary Logistic Regression ◽

Condition Index ◽

Binary Logistic Regression Model ◽

Correlated Variables ◽

Explanatory Variables

One of the key problems arises in binary logistic regression model is that explanatory variables being considered for the logistic regression model are highly correlated among themselves. Multicollinearity will cause unstable estimates and inaccurate variances that affects confidence intervals and hypothesis tests. Aim of this was to discuss some diagnostic measurements to detect multicollinearity namely tolerance, Variance Inflation Factor (VIF), condition index and variance proportions. The adapted diagnostics are illustrated with data based on a study of road accidents. Secondary data used from 2014 to 2016 in this study were acquired from the Traffic Police headquarters, Colombo in Sri Lanka. The response variable is accident severity that consists of two levels particularly grievous and non-grievous. Multicolinearity is identified by correlation matrix, tolerance and VIF values and confirmed by condition index and variance proportions. The range of solutions available for logistic regression such as increasing sample size, dropping one of the correlated variables and combining variables into an index. It is safely concluded that without increasing sample size, to omit one of the correlated variables can reduce multicollinearity considerably.

Download Full-text

Effect of using bias-corrected estimators in logistic regression model in small samples: prostate-specific antigen (PSA) data

Data Science Journal ◽

10.2481/dsj.5.100 ◽

2006 ◽

Vol 5 ◽

pp. 100-107 ◽

Cited By ~ 3

Author(s):

M.A. Matin

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Prostate Specific Antigen ◽

Logistic Regression Model ◽

Specific Antigen ◽

Small Samples

Download Full-text

Transformations of the explanatory variables in the logistic regression model for binary data

Biometrika ◽

10.1093/biomet/74.3.495 ◽

1987 ◽

Vol 74 (3) ◽

pp. 495-501 ◽

Cited By ~ 102

Author(s):

RICHARD KAY ◽

SARAH LITTLE

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Binary Data ◽

Logistic Regression Model ◽

Explanatory Variables

Download Full-text

Using the Analysis of Logistic Regression Model in Auditing and Detection of Frauds

Khazar Journal of Humanities and Social Sciences ◽

10.5782/2223-2621.2019.22.3.5 ◽

2019 ◽

Vol 22 (3) ◽

pp. 5-23

Author(s):

Engin Boztepe ◽

Hayrettin Usul

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Public Hospital ◽

Logistic Regression Model ◽

Customer Relations ◽

Health Sector ◽

Real Data ◽

Third Parties ◽

The Third ◽

Intentional Actions

Fraud is defined as intentional actions in which one or more people, including from the management, employees, or the third parties, venture to obtain an unjust or illegal benefit. According to the researches, the average cost of fraud was determined as 5% of total incomes. The fraud, which has the results like a financial iceberg besides the direct losses, causes damages like loss of reputation, and adverse effects of customer relations. Auditing and detection of fraud, which has such vast effects, is of great importance. In this study, we have developed a model that is designed for detecting mistreatments with logistic regression and the abuses in the performance-based salary system in the health sector. For this, some imaginary surgery data were added into the actual data of laparoscopic cholecystectomy operations performed in a public hospital in 2015, and to distinguish this fictitious data, the success of the generated logistic regression model was tested. Consequently, it shows that the model had 83.30% of the success rate for detecting the false data added to real data.

Download Full-text

Investigating the Role of Opening Partners While Chasing on the Outcome of Twenty20 Cricket Matches

Management and Labour Studies ◽

10.1177/0258042x20912580 ◽

2020 ◽

Vol 45 (2) ◽

pp. 222-232

Author(s):

Priyanka Talukdar

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

The Other ◽

Explanatory Variables ◽

Pressure Index ◽

Other Hand

In cricket, irrespective of the format of the game, batting always happens in pairs. The two batsmen who bat together are called as batting partners. The pair of batsmen who come to bat at the beginning of any innings are called opening batsmen or opening partners. In Twenty20 cricket, the opening partners must start their innings with a definite strategy. In one hand, they have the advantage of only two fielders outside the 30-yard circle for the first six overs (technically called as the powerplay overs), and so both openers are expected to play high scoring shots and attempt to score runs quickly. On the other hand, the odds against them are the ball is new, so is the pitch and the bowlers are fresh and energetic. When any one of the opening batsmen loses his wicket, the partnership comes to an end. This study tries to figure out the influence of the opening partnership of the second innings on the outcome of Twenty20 matches. Pressure Index (developed by earlier researchers), effects of venue or ground and target score are used as explanatory variables in the logistic regression model to check if the performance of opening partnership influences the outcome of Twenty20 matches along with other variables. The data used for the exercise is from Twenty20 international cricket matches played within the period January 2012 to June 2018. The study finds that opening partnership while chasing is a significant factor in deciding the match outcome during the run chase for the said dataset. Also, the best opening batting partners have been identified.

Download Full-text