Small sample estimation of regression parameters in the three-variable linear model, with incomplete observations

1974 ◽  
Vol 2 (1-2) ◽  
pp. 181-195
Author(s):  
Marcel G. Dagenais
Stats ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 88-107
Author(s):  
Alfio Marazzi

The distance constrained maximum likelihood procedure (DCML) optimally combines a robust estimator with the maximum likelihood estimator with the purpose of improving its small sample efficiency while preserving a good robustness level. It has been published for the linear model and is now extended to the GLM. Monte Carlo experiments are used to explore the performance of this extension in the Poisson regression case. Several published robust candidates for the DCML are compared; the modified conditional maximum likelihood estimator starting with a very robust minimum density power divergence estimator is selected as the best candidate. It is shown empirically that the DCML remarkably improves its small sample efficiency without loss of robustness. An example using real hospital length of stay data fitted by the negative binomial regression model is discussed.


2014 ◽  
Vol 33 (22) ◽  
pp. 3869-3881 ◽  
Author(s):  
Sudhir Paul ◽  
Xuemao Zhang

2017 ◽  
Vol 8 (3) ◽  
pp. 16-36 ◽  
Author(s):  
Brandon Flessner ◽  
Mary C. Henry ◽  
Jerry Green

The ability to predict American beech distribution (Fagus grandifolia Ehrh.) from environmental data was tested by using a geographic information system (GIS) in tandem with species distribution models (SDMs). The study was conducted in Butler and Preble counties in Ohio, USA. Topography, soils, and disturbance were approximated through 15 predictor variables with presence/absence and basal area serving as the response variables. Using a generalized linear model (GLM) and a boosted regression tree (BRT) model, curvature, elevation, and tasseled cap greenness were shown to be significant predictors of beech presence. Each of these variables was positively related to beech presence. A linear model using presence only data was not effective in predicting basal area due to a small sample size. This study demonstrates that SDMs can be used successfully to advance one's understanding of the relationship between tree species presence and environmental factors. Large sample sizes are needed to successfully model continuous variables.


2010 ◽  
Vol 143-144 ◽  
pp. 1328-1331
Author(s):  
Hai Jun Chen ◽  
Xiao Ling Liu ◽  
Ling Hui Liu

The least squares method is very sensitive to outliers, one of the simple alternative is the least absolute deviation, i.e. L1 regression, which is less sensitive to outliers, so which is more suitable the small sample and much noise situation. In this paper, the L1 problem of linear model is discussed, the previous work is reviewed systematically, different algorithms is compared, it is proved that the dual forms of different algorithms are the same.


2014 ◽  
Vol 27 (9) ◽  
pp. 3393-3404 ◽  
Author(s):  
Michael K. Tippett ◽  
Timothy DelSole ◽  
Anthony G. Barnston

Abstract Regression is often used to calibrate climate model forecasts with observations. Reliability is an aspect of forecast quality that refers to the degree of correspondence between forecast probabilities and observed frequencies of occurrence. While regression-corrected climate forecasts are reliable in principle, the estimated regression parameters used in practice are affected by sampling error. The low skill and small sample sizes typically encountered in climate prediction imply substantial sampling error in the estimated regression parameters. Here the reliability of regression-corrected climate forecasts is analyzed for the case of joint-Gaussian distributed ensemble forecasts and observations with regression parameters estimated by least squares. Hypothesis testing of the regression parameters provides direct information about the skill and reliability of the uncorrected ensemble-based probability forecasts. However, the regression-corrected probability forecasts with estimated parameters are systematically “overconfident” because sampling error causes a positive bias in the regression forecast signal variance, despite the fact that the estimates of the regression parameters are themselves unbiased. An analytical description of the reliability diagram of a generic regression-corrected climate forecast is derived and is shown to depend on sample size and population correlation skill, with small sample size and low skill being factors that increase overconfidence. The analytical reliability estimate is shown to capture the effect of sampling error in synthetic data experiments and in a 29-yr dataset of NOAA Climate Forecast System version 2 predictions of seasonal precipitation totals over the Americas. The impact of sampling error on the reliability of regression-corrected forecast has been previously unrecognized and affects all regression-based forecasts. The use of regression parameters estimated by shrinkage methods such as ridge regression substantially reduces overconfidence.


Sign in / Sign up

Export Citation Format

Share Document