Comparison of robust logistic regression estimators for variables with generalized extreme value distributions

2021 ◽  
Vol 16 (3) ◽  
pp. 177-187
Author(s):  
Şaban Kızılarslan ◽  
Ceren Camkıran

The aim of this study is to compare the performance of robust estimators in the presence of explanatory variables with Generalized Extreme Value (GEV) distributions in the logistic regression model. Existence of extreme values in the logistic regression model negatively affects the bias and effectiveness of classical Maximum Likelihood (ML) estimators. For this reason, robust estimators that are less sensitive to extreme values have been developed. Random variables with extreme values may be fit in one of specific distributions. In study, the GEV distribution family was examined and five robust estimators were compared for the Fréchet, Gumbel and Weibull distributions. To the simulation results, the CUBIF estimator is prominent according to both bias and efficiency criteria for small samples. In medium and large samples, while the MALLOWS estimator has the minimum bias, the CUBIF estimator has the best efficiency. The same results apply for different contamination ratios and different scale parameter values of the distributions. Simulation findings were supported by a meteorological real data application.

2021 ◽  
Vol 8 (1) ◽  
pp. 1497-1506
Author(s):  
Aba Dio ◽  
El Hadji Dème ◽  
Idrissa Sy ◽  
Aliou Diop

Logistic regression model is widely used in many studies to investigate the relationship between a binary response variable Y and a set of potential predictors X. The binary response may represent, for example, the occurrence of some outcome of interest (Y=1 if the outcome occurred and Y=0 otherwise). When the dependent variable Y represents a rare event, the logistic regression model shows relevant drawbacks. In order to overcome these drawbacks we propose the Generalized Extreme Value (GEV) regression model. In particularly, we suggest the quantile function of the GEV distribution as link function. Strokes are a serious pathology and a neurological emergency involving the vital prognosis and the functional prognosis. In Senegal, strokes account for more than 30% of hospitalizations and are responsible for nearly two thirds of mortality. In this work, we use the GVE regression model for binary data to determine the risk factors leading to stroke and to develop a predictive model of life-threatening outcomes in central Sénégal.


2018 ◽  
Vol 2 (334) ◽  
Author(s):  
Mirosław Krzyśko ◽  
Łukasz Smaga

In this paper, the binary classification problem of multi‑dimensional functional data is considered. To solve this problem a regression technique based on functional logistic regression model is used. This model is re‑expressed as a particular logistic regression model by using the basis expansions of functional coefficients and explanatory variables. Based on re‑expressed model, a classification rule is proposed. To handle with outlying observations, robust methods of estimation of unknown parameters are also considered. Numerical experiments suggest that the proposed methods may behave satisfactory in practice.


2021 ◽  
Vol 73 (7) ◽  
pp. 41-44
Author(s):  
Y.S. Zhieru

The final stage of constructing a logistic regression model is checking its validity and testing it on real data. The degree of validity of a logistic regression model is evidenced by its ability to correctly classify borrowers, the model's ability to distinguish "good" borrowers from "bad" borrowers.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1517
Author(s):  
Hao Yang Teng ◽  
Zhengjun Zhang

Logistic regression is widely used in the analysis of medical data with binary outcomes to study treatment effects through (absolute) treatment effect parameters in the models. However, the indicative parameters of relative treatment effects are not introduced in logistic regression models, which can be a severe problem in efficiently modeling treatment effects and lead to the wrong conclusions with regard to treatment effects. This paper introduces a new enhanced logistic regression model that offers a new way of studying treatment effects by measuring the relative changes in the treatment effects and also incorporates the way in which logistic regression models the treatment effects. The new model, called the Absolute and Relative Treatment Effects (AbRelaTEs) model, is viewed as a generalization of logistic regression and an enhanced model with increased flexibility, interpretability, and applicability in real data applications than the logistic regression. The AbRelaTEs model is capable of modeling significant treatment effects via an absolute or relative or both ways. The new model can be easily implemented using statistical software, with the logistic regression model being treated as a special case. As a result, the classical logistic regression models can be replaced by the AbRelaTEs model to gain greater applicability and have a new benchmark model for more efficiently studying treatment effects in clinical trials, economic developments, and many applied areas. Moreover, the estimators of the coefficients are consistent and asymptotically normal under regularity conditions. In both simulation and real data applications, the model provides both significant and more meaningful results.


2021 ◽  
Vol 26 (5) ◽  
pp. 44-57
Author(s):  
Zainab Sami ◽  
Taha Alshaybawee

Lasso variable selection is an attractive approach to improve the prediction accuracy. Bayesian lasso approach is suggested to estimate and select the important variables for single index logistic regression model. Laplace distribution is set as prior to the coefficients vector and prior to the unknown link function (Gaussian process). A hierarchical Bayesian lasso semiparametric logistic regression model is constructed and MCMC algorithm is adopted for posterior inference. To evaluate the performance of the proposed method BSLLR is through comparing it to three existing methods BLR, BPR and BBQR. Simulation examples and numerical data are to be considered. The results indicate that the proposed method get the smallest bias, SD, MSE and MAE in simulation and real data. The proposed method BSLLR performs better than other methods. 


Author(s):  
N. A. M. R. Senaviratna ◽  
T. M. J. A. Cooray

One of the key problems arises in binary logistic regression model is that explanatory variables being considered for the logistic regression model are highly correlated among themselves. Multicollinearity will cause unstable estimates and inaccurate variances that affects confidence intervals and hypothesis tests. Aim of this was to discuss some diagnostic measurements to detect multicollinearity namely tolerance, Variance Inflation Factor (VIF), condition index and variance proportions. The adapted diagnostics are illustrated with data based on a study of road accidents. Secondary data used from 2014 to 2016 in this study were acquired from the Traffic Police headquarters, Colombo in Sri Lanka. The response variable is accident severity that consists of two levels particularly grievous and non-grievous. Multicolinearity is identified by correlation matrix, tolerance and VIF values and confirmed by condition index and variance proportions. The range of solutions available for logistic regression such as increasing sample size, dropping one of the correlated variables and combining variables into an index. It is safely concluded that without increasing sample size, to omit one of the correlated variables can reduce multicollinearity considerably.


2019 ◽  
Vol 22 (3) ◽  
pp. 5-23
Author(s):  
Engin Boztepe ◽  
Hayrettin Usul

Fraud is defined as intentional actions in which one or more people, including from the management, employees, or the third parties, venture to obtain an unjust or illegal benefit. According to the researches, the average cost of fraud was determined as 5% of total incomes. The fraud, which has the results like a financial iceberg besides the direct losses, causes damages like loss of reputation, and adverse effects of customer relations. Auditing and detection of fraud, which has such vast effects, is of great importance. In this study, we have developed a model that is designed for detecting mistreatments with logistic regression and the abuses in the performance-based salary system in the health sector. For this, some imaginary surgery data were added into the actual data of laparoscopic cholecystectomy operations performed in a public hospital in 2015, and to distinguish this fictitious data, the success of the generated logistic regression model was tested. Consequently, it shows that the model had 83.30% of the success rate for detecting the false data added to real data.


2020 ◽  
Vol 45 (2) ◽  
pp. 222-232
Author(s):  
Priyanka Talukdar

In cricket, irrespective of the format of the game, batting always happens in pairs. The two batsmen who bat together are called as batting partners. The pair of batsmen who come to bat at the beginning of any innings are called opening batsmen or opening partners. In Twenty20 cricket, the opening partners must start their innings with a definite strategy. In one hand, they have the advantage of only two fielders outside the 30-yard circle for the first six overs (technically called as the powerplay overs), and so both openers are expected to play high scoring shots and attempt to score runs quickly. On the other hand, the odds against them are the ball is new, so is the pitch and the bowlers are fresh and energetic. When any one of the opening batsmen loses his wicket, the partnership comes to an end. This study tries to figure out the influence of the opening partnership of the second innings on the outcome of Twenty20 matches. Pressure Index (developed by earlier researchers), effects of venue or ground and target score are used as explanatory variables in the logistic regression model to check if the performance of opening partnership influences the outcome of Twenty20 matches along with other variables. The data used for the exercise is from Twenty20 international cricket matches played within the period January 2012 to June 2018. The study finds that opening partnership while chasing is a significant factor in deciding the match outcome during the run chase for the said dataset. Also, the best opening batting partners have been identified.


Sign in / Sign up

Export Citation Format

Share Document