Natural variability and the estimation of empirical relationships: a reassessment of regression methods

1995 ◽  
Vol 52 (4) ◽  
pp. 788-798 ◽  
Author(s):  
Yves T. Prairie ◽  
David F. Bird ◽  
Robert H. Peters

Ecologists often rely on empirically defined statistical relationships to infer how variables might be related. However, the usual method of estimating such relationships (ordinary least-squares (OLS)) is generally inappropriate because of the substantial natural variability of most ecological variables. Natural error variability in the regressor variable can artificially create a significant empirical trend where no underlying or structural relationship exists, or fail to reveal a true structural relationship. In multivariate relationships, natural variability in one variable can induce statistical significance in collinear variables even if they bear no structural relationship. We propose a simple new method, based on instrumental variables, to detect and quantify natural error variability in the regressor variables and to estimate the parameters of the structural relationship. We apply this method to two examples: (1) we show that the structural relationship between adenosine triphosphate concentration (total planktonic biomass) and chlorophyll concentration (autotrophic biomass) does not vary latitudinally in the Southern Ocean despite a significant increase in the OLS slope relating the two at more southerly stations and (2) we demonstrate that the significance of nitrogen in nutrient–chlorophyll relationships in lakes probably reflects natural variability in phosphorus concentration, and not the fertilizing effect of nitrogen.

2007 ◽  
Vol 4 (4) ◽  
pp. 2147-2178 ◽  
Author(s):  
A. Morel ◽  
H. Claustre ◽  
D. Antoine ◽  
B. Gentili

Abstract. The optical properties of Case 1 waters have been empirically related to the chlorophyll concentration, [Chl], historically used as an index of the trophic state and of the abundance of the biological materials. The natural variability around the mean statistical relationships is here examined by comparing the apparent optical properties (spectral downward irradiance attenuation and reflectance as a function of [Chl]) which were determined in two environments, the Pacific and Mediterranean waters. These oceanic zones apparently form two extremes of the bio-optical variability range. The systematic deviations, in both directions with respect to the average laws, mainly result from the differing contents in non-algal detrital materials and dissolved colored substance for a given [Chl] level. These contents are higher and lower than the average, in the Mediterranean Sea and Pacific Ocean, respectively. The divergences between the two water bodies, detected in the visible spectral domain, are considerably accentuated in the UV domain. The bio-optical properties in this spectral domain (310–400 nm) are systematically explored. Their prediction based on the sole [Chl] index is problematic; although it is probably possible on a regional scale, an ubiquitous relationship does not seem to exist for the global scale.


2007 ◽  
Vol 4 (5) ◽  
pp. 913-925 ◽  
Author(s):  
A. Morel ◽  
H. Claustre ◽  
D. Antoine ◽  
B. Gentili

Abstract. The optical properties of Case 1 waters have been empirically related to the chlorophyll concentration, [Chl], historically used as an index of the trophic state and of the abundance of the biological materials. The well-known natural variability around the mean statistical relationships is here examined by comparing the apparent optical properties (spectral downward irradiance attenuation and reflectance) as a function of [Chl] in two Case 1 environments, the Pacific and Mediterranean waters. These oceanic zones apparently represent two extremes of the possible bio-optical variability range around the mean. The systematic deviations, in both directions with respect to the average laws, mainly result from the differing contents in non-algal detrital materials and dissolved colored substance for a given [Chl] level. These contents are higher than the average in the Mediterranean Sea, and lower in the Pacific Ocean, respectively. These divergences between the two water bodies, detectable in the visible spectral domain, are considerably accentuated in the UV domain. The bio-optical properties in this spectral domain (310–400 nm) are systematically explored. They are more varying for a given [Chl] than those in the visible domain. Their prediction based on the sole [Chl] index is thus problematic, although it is probably possible on a regional scale if reliable field data are available. It does not seem, however, that ubiquitous relationships exist for this spectral domain for all Case 1 waters at global scale.


Agriculture ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 553
Author(s):  
Inga-Mareike Bach ◽  
Lisa Essich ◽  
Torsten Müller

Despite phosphorus resources on Earth being limited, over fertilization in many agricultural situations causes significant resource consumption. Phosphorus-recycling within agricultural production can reduce global dilution into the environment and is thus essential to secure sustainable future supply. This study investigated the fertilization efficacy of phosphorus fertilizers recycled from biogas digestates in maize shoots grown under controlled greenhouse conditions, in two soils, in a pot experiment. Variables investigated were plant-available phosphorus in soil, plant biomass production, and concentration of phosphorus, calcium, and magnesium in shoots. Soils were treated with three different fertilizer fractions, separated from biogas digestates, at equivalent phosphorus concentrations, using different combinations and application techniques, isolated or in combination, and compared to triple superphosphate (TSP) as a reference. One of the fractions (P-Salt) had effects on biomass production and plant phosphorus concentration equivalent to TSP in agricultural surface soil. In the second soil (with less active soil life and nutrient content), equivalence to TSP was achieved with combinations of two recycled fractions (P-Salt and dried solids). The enhancement of the phosphorus fertilizing effect by the solids was synergistic, indicating that the solids had a soil conditioning effect. The results show that biogas digestates are a valuable source for phosphorus recycling of fractions that have equivalent or even superior fertilizing properties compared to TSP.


2021 ◽  
pp. 1-11
Author(s):  
Valentina Escott-Price ◽  
Karl Michael Schmidt

<b><i>Background:</i></b> Genome-wide association studies (GWAS) were successful in identifying SNPs showing association with disease, but their individual effect sizes are small and require large sample sizes to achieve statistical significance. Methods of post-GWAS analysis, including gene-based, gene-set and polygenic risk scores, combine the SNP effect sizes in an attempt to boost the power of the analyses. To avoid giving undue weight to SNPs in linkage disequilibrium (LD), the LD needs to be taken into account in these analyses. <b><i>Objectives:</i></b> We review methods that attempt to adjust the effect sizes (β<i>-</i>coefficients) of summary statistics, instead of simple LD pruning. <b><i>Methods:</i></b> We subject LD adjustment approaches to a mathematical analysis, recognising Tikhonov regularisation as a framework for comparison. <b><i>Results:</i></b> Observing the similarity of the processes involved with the more straightforward Tikhonov-regularised ordinary least squares estimate for multivariate regression coefficients, we note that current methods based on a Bayesian model for the effect sizes effectively provide an implicit choice of the regularisation parameter, which is convenient, but at the price of reduced transparency and, especially in smaller LD blocks, a risk of incomplete LD correction. <b><i>Conclusions:</i></b> There is no simple answer to the question which method is best, but where interpretability of the LD adjustment is essential, as in research aiming at identifying the genomic aetiology of disorders, our study suggests that a more direct choice of mild regularisation in the correction of effect sizes may be preferable.


2020 ◽  
Vol 11 (4) ◽  
pp. 1233-1258
Author(s):  
Manuel Schlund ◽  
Axel Lauer ◽  
Pierre Gentine ◽  
Steven C. Sherwood ◽  
Veronika Eyring

Abstract. An important metric for temperature projections is the equilibrium climate sensitivity (ECS), which is defined as the global mean surface air temperature change caused by a doubling of the atmospheric CO2 concentration. The range for ECS assessed by the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report is between 1.5 and 4.5 K and has not decreased over the last decades. Among other methods, emergent constraints are potentially promising approaches to reduce the range of ECS by combining observations and output from Earth System Models (ESMs). In this study, we systematically analyze 11 published emergent constraints on ECS that have mostly been derived from models participating in the Coupled Model Intercomparison Project Phase 5 (CMIP5) project. These emergent constraints are – except for one that is based on temperature variability – all directly or indirectly based on cloud processes, which are the major source of spread in ECS among current models. The focus of the study is on testing if these emergent constraints hold for ESMs participating in the new Phase 6 (CMIP6). Since none of the emergent constraints considered here have been derived using the CMIP6 ensemble, CMIP6 can be used for cross-checking of the emergent constraints on a new model ensemble. The application of the emergent constraints to CMIP6 data shows a decrease in skill and statistical significance of the emergent relationship for nearly all constraints, with this decrease being large in many cases. Consequently, the size of the constrained ECS ranges (66 % confidence intervals) widens by 51 % on average in CMIP6 compared to CMIP5. This is likely because of changes in the representation of cloud processes from CMIP5 to CMIP6, but may in some cases also be due to spurious statistical relationships or a too small number of models in the ensemble that the emergent constraint was originally derived from. The emergently- constrained best estimates of ECS also increased from CMIP5 to CMIP6 by 12 % on average. This can be at least partly explained by the increased number of high-ECS (above 4.5 K) models in CMIP6 without a corresponding change in the constraint predictors, suggesting the emergence of new feedback processes rather than changes in strength of those previously dominant. Our results support previous studies concluding that emergent constraints should be based on an independently verifiable physical mechanism, and that process-based emergent constraints on ECS should rather be thought of as constraints for the process or feedback they are actually targeting.


Entropy ◽  
2022 ◽  
Vol 24 (1) ◽  
pp. 95
Author(s):  
Pontus Söderbäck ◽  
Jörgen Blomvall ◽  
Martin Singull

Liquid financial markets, such as the options market of the S&P 500 index, create vast amounts of data every day, i.e., so-called intraday data. However, this highly granular data is often reduced to single-time when used to estimate financial quantities. This under-utilization of the data may reduce the quality of the estimates. In this paper, we study the impacts on estimation quality when using intraday data to estimate dividends. The methodology is based on earlier linear regression (ordinary least squares) estimates, which have been adapted to intraday data. Further, the method is also generalized in two aspects. First, the dividends are expressed as present values of future dividends rather than dividend yields. Second, to account for heteroscedasticity, the estimation methodology was formulated as a weighted least squares, where the weights are determined from the market data. This method is compared with a traditional method on out-of-sample S&P 500 European options market data. The results show that estimations based on intraday data have, with statistical significance, a higher quality than the corresponding single-times estimates. Additionally, the two generalizations of the methodology are shown to improve the estimation quality further.


1987 ◽  
Vol 44 (5) ◽  
pp. 990-1001 ◽  
Author(s):  
Joseph B. Rasmussen ◽  
Jacob Kalff

Estimates of macrozoobenthos from the literature were regressed against a series of limnological variables to yield empirical models for zoobenthic biomass in the profundal, sublittoral, and littoral zones of lakes. Variables indicative of phytoplankton biomass (chlorophyll concentration, total phosphorus concentration, and Secchi disk transparency) explained between 14 and 57% of the variance of zoobenthic biomass ((g/m2)0.1). Other factors such as humic colour, morphometry (slope, mean depth, ratio of mean to maximum depth, and lake area), and mean annual air temperature substantially increased the amount of explained variance. In the profundal and sublittoral zones, the best models explain 70% of the variance in zoobenthic biomass. Littoral zone models explained less than 50%, and this deficiency was attributed to sampling difficulties and to high local variability of slope and wave exposure in the littoral zone.


2019 ◽  
Vol 31 (2) ◽  
pp. 309-331 ◽  
Author(s):  
Eric G. Lambert ◽  
Linda D. Keena ◽  
Stacy H. Haynes ◽  
David May ◽  
Matthew C. Leone

Job stress is a problem in corrections. Although the very nature of correctional work is stressful, workplace variables also contribute to correctional staff job stress. The job demands-resource model holds that job demands increase negative outcomes (e.g., job stress) and decrease positive outcomes (e.g., job satisfaction), whereas job resources help increase positive outcomes and decrease negative outcomes. An ordinary least squares regression analysis of self-reported survey data from 322 staff at a Southern prison indicated that input into decision-making and quality supervision had statistically significant negative effects on job stress, whereas role overload and fear of victimization had significant positive effects. Instrumental communication, views of training, and role clarity all had nonsignificant associations with stress from the job in the multivariate analysis. The results partially supported the job demands-resources model; however, the specific work environment variables varied in terms of their statistical significance. Correctional administrators need to be aware of the contribution that workplace variables have on job stress and make changes to reduce staff job stress.


Author(s):  
Pascalis Kadaro Matthew ◽  
Abubakar Yahaya

<p>Some few decades ago, penalized regression techniques for linear regression have been developed specifically to reduce the flaws inherent in the prediction accuracy of the classical ordinary least squares (OLS) regression technique. In this paper, we used a diabetes data set obtained from previous literature to compare three of these well-known techniques, namely: Least Absolute Shrinkage Selection Operator (LASSO), Elastic Net and Correlation Adjusted Elastic Net (CAEN). After thorough analysis, it was observed that CAEN generated a less complex model.</p>


2020 ◽  
Vol 6 ◽  
pp. 237802312090606
Author(s):  
Simone Rambotti ◽  
Ronald L. Breiger

A methodological paradox characterizes macro-comparative research: it routinely violates the assumptions underlying its dominant method, multiple regression analysis. Comparative researchers have substantive interest in cases, but cases are largely rendered invisible in regression analysis. Researchers seldom recognize the mismatch between the goals of macro-comparative research and the demands of regression methods, and sometimes they end up engaging in strenuous disputes over particular variable effects. A good example is the controversial relationship between income inequality and health. Here, the authors offer an innovative method that combines variable-oriented and case-oriented approaches by turning ordinary least squares regression models “inside out.” The authors estimate case-specific contributions to regression coefficient estimates. They reanalyze data on income inequality, poverty, and life expectancy across 20 affluent countries. Multiple model specifications are dependent primarily on two countries with values on the outcome that are extreme in magnitude and inconsistent with conventional theoretical expectations.


Sign in / Sign up

Export Citation Format

Share Document