Weighted Multicollinearity in Logistic Regression: Diagnostics and Biased Estimation Techniques with an Example from Lake Acidification

1990 ◽  
Vol 47 (6) ◽  
pp. 1128-1135 ◽  
Author(s):  
Brian D. Marx ◽  
Eric P. Smith

An historical data set from the Adirondack region of New York is revisited to study the relationship between water chemistry variables associated with acid precipitation and the presence/absence of brook trout (Salvelinus fontinalis) and lake trout (Salvelinus namaycush). For the trout species data sets, water chemistry variables associated with acid precipitation, for example pH and alkalinity, are highly correlated. Regression models to assess their effects on the probability of the presence of fish species are therefore affected by multicollinearity. Because the appropriate regressions are logistic, correction techniques based on least squares do not work. Maximum likelihood parameter estimation is highly unstable for the trout presence/absence data. Developments in weighted multicollinearity diagnostics are used to evaluate maximum likelihood logistic regression parameter estimates. Further, an application of biased parameter estimation is presented as an option to the traditional maximum likelihood logistic regression. Biased estimation methods, like ridge, principal component, or Stein estimation can substantially reduce the variance of the parameter estimates and prediction variance for certain future observations. In many cases, only a slight modification to the converged maximum likelihood estimator is necessary.

2020 ◽  
Vol 70 (5) ◽  
pp. 1211-1230
Author(s):  
Abdus Saboor ◽  
Hassan S. Bakouch ◽  
Fernando A. Moala ◽  
Sheraz Hussain

AbstractIn this paper, a bivariate extension of exponentiated Fréchet distribution is introduced, namely a bivariate exponentiated Fréchet (BvEF) distribution whose marginals are univariate exponentiated Fréchet distribution. Several properties of the proposed distribution are discussed, such as the joint survival function, joint probability density function, marginal probability density function, conditional probability density function, moments, marginal and bivariate moment generating functions. Moreover, the proposed distribution is obtained by the Marshall-Olkin survival copula. Estimation of the parameters is investigated by the maximum likelihood with the observed information matrix. In addition to the maximum likelihood estimation method, we consider the Bayesian inference and least square estimation and compare these three methodologies for the BvEF. A simulation study is carried out to compare the performance of the estimators by the presented estimation methods. The proposed bivariate distribution with other related bivariate distributions are fitted to a real-life paired data set. It is shown that, the BvEF distribution has a superior performance among the compared distributions using several tests of goodness–of–fit.


2018 ◽  
Vol 48 (3) ◽  
pp. 199-204 ◽  
Author(s):  
R. LI ◽  
J. ZHOU ◽  
L. WANG

In this paper, the non-parametric bootstrap and non-parametric Bayesian bootstrap methods are applied for parameter estimation in the binary logistic regression model. A real data study and a simulation study are conducted to compare the Nonparametric bootstrap, Non-parametric Bayesian bootstrap and the maximum likelihood methods. Study results shows that three methods are all effective ways for parameter estimation in the binary logistic regression model. In small sample case, the non-parametric Bayesian bootstrap method performs relatively better than the non-parametric bootstrap and the maximum likelihood method for parameter estimation in the binary logistic regression model.


In this paper, we have defined a new two-parameter new Lindley half Cauchy (NLHC) distribution using Lindley-G family of distribution which accommodates increasing, decreasing and a variety of monotone failure rates. The statistical properties of the proposed distribution such as probability density function, cumulative distribution function, quantile, the measure of skewness and kurtosis are presented. We have briefly described the three well-known estimation methods namely maximum likelihood estimators (MLE), least-square (LSE) and Cramer-Von-Mises (CVM) methods. All the computations are performed in R software. By using the maximum likelihood method, we have constructed the asymptotic confidence interval for the model parameters. We verify empirically the potentiality of the new distribution in modeling a real data set.


2016 ◽  
Author(s):  
Rui J. Costa ◽  
Hilde Wilkinson-Herbots

AbstractThe isolation-with-migration (IM) model is commonly used to make inferences about gene flow during speciation, using polymorphism data. However, Becquet and Przeworski (2009) report that the parameter estimates obtained by fitting the IM model are very sensitive to the model's assumptions (including the assumption of constant gene flow until the present). This paper is concerned with the isolation-with-initial-migration (IIM) model of Wilkinson-Herbots (2012), which drops precisely this assumption. In the IIM model, one ancestral population divides into two descendant subpopulations, between which there is an initial period of gene flow and a subsequent period of isolation. We derive a very fast method of fitting an extended version of the IIM model, which also allows for asymmetric gene flow and unequal population sizes. This is a maximum-likelihood method, applicable to data on the number of segregating sites between pairs of DNA sequences from a large number of independent loci. In addition to obtaining parameter estimates, our method can also be used to distinguish between alternative models representing different evolutionary scenarios, by means of likelihood ratio tests. We illustrate the procedure on pairs of Drosophila sequences from approximately 30,000 loci. The computing time needed to fit the most complex version of the model to this data set is only a couple of minutes. The R code to fit the IIM model can be found in the supplementary files of this paper.


Methodology ◽  
2007 ◽  
Vol 3 (3) ◽  
pp. 100-114 ◽  
Author(s):  
Polina Dimitruk ◽  
Karin Schermelleh-Engel ◽  
Augustin Kelava ◽  
Helfried Moosbrugger

Abstract. Challenges in evaluating nonlinear effects in multiple regression analyses include reliability, validity, multicollinearity, and dichotomization of continuous variables. While reliability and validity issues are solved by employing nonlinear structural equation modeling, multicollinearity remains a problem which may even be aggravated when using latent variable approaches. Further challenges of nonlinear latent analyses comprise the distribution of latent product terms, a problem especially relevant for approaches using maximum likelihood estimation methods based on multivariate normally distributed variables, and unbiased estimates of nonlinear effects under multicollinearity. The only methods that explicitly take the nonnormality of nonlinear latent models into account are latent moderated structural equations (LMS) and quasi-maximum likelihood (QML). In a small simulation study both methods yielded unbiased parameter estimates and correct estimates of standard errors for inferential statistics. The advantages and limitations of nonlinear structural equation modeling are discussed.


1979 ◽  
Vol 16 (3) ◽  
pp. 313-322 ◽  
Author(s):  
Arun K. Jain ◽  
Franklin Acito ◽  
Naresh K. Malhotra ◽  
Vijay Mahajan

Since 1971, interest in the use of decompositional multiattribute preference models in marketing has been increasing. The applications have varied in terms of the type of data used, behavior predicted, and methods used for estimating parameters. The authors examine the effect of different data collection and estimation procedures on parameter estimates and their stability and validity. An actual data base is used. A detailed comparison is made of the alternative approaches of parameter estimation and suggestions are given for the potential users of decompositional multiattribute preference models.


2021 ◽  
Vol 2106 (1) ◽  
pp. 012001
Author(s):  
P R Sihombing ◽  
S R Rohimah ◽  
A Kurnia

Abstract This study aims to compare the efficacy of logistic regression model for identifying the risk factors of low-birth-weight babies in Indonesia using the maximum likelihood estimation (MLE)and the Bayesian estimation methods. The data used in this study is secondary data derived from the 2017 Indonesian Demographic Health Survey with a total sample of 16,344 newborn babies. Selection of the best logistic regression model was based on the smaller Bayesian Schwartz Information Criterion (BIC) value. The logistic regression model with the Bayesian estimation method has a smaller BIC value than the MLE method. Twin births, baby girl, maternal age at risk, birth spacing that is too close, iron deficiency, low education, low economy, inadequate drinking water sources have provided a higher risk of low-birth-weight incidence.


2021 ◽  
Author(s):  
Jan Steinfeld ◽  
Alexander Robitzsch

This article describes the conditional maximum likelihood-based item parameter estimation in probabilistic multistage designs. In probabilistic multistage designs, the routing is not solely based on a raw score j and a cut score c as well as a rule for routing into a module such as j < c or j ≤ c but is based on a probability p(j) for each raw score j. It can be shown that the use of a conventional conditional maximum likelihood parameter estimate in multistage designs leads to severely biased item parameter estimates. Zwitser and Maris (2013) were able to show that with deterministic routing, the integration of the design into the item parameter estimation leads to unbiased estimates. This article extends this approach to probabilistic routing and, at the same time, represents a generalization. In a simulation study, it is shown that the item parameter estimation in probabilistic designs leads to unbiased item parameter estimates.


Sign in / Sign up

Export Citation Format

Share Document