Quantifying the effects of temporal autocorrelation on climatological regression models using geostatistical techniques

Ian B. Strachan; L. Edward Harvey

doi:10.1139/x26-094

Quantifying the effects of temporal autocorrelation on climatological regression models using geostatistical techniques

Canadian Journal of Forest Research ◽

10.1139/x26-094 ◽

1996 ◽

Vol 26 (5) ◽

pp. 864-871 ◽

Cited By ~ 1

Author(s):

Ian B. Strachan ◽

L. Edward Harvey

Keyword(s):

Time Series ◽

Stomatal Conductance ◽

Regression Models ◽

Temporal Structure ◽

Time Lag ◽

Ordinary Least Squares ◽

Data Sets ◽

Linear Regression Models ◽

Temporal Autocorrelation ◽

Geostatistical Techniques

When time-dependent data are used in regression models, temporal autocorrelation violates ordinary least squares assumptions and impedes their proper testing and interpretation. The problem of temporal autocorrelation is exacerbated by the uneven temporal spacing inherent in many data sets. Using simple linear regression models of stomatal conductance as examples, we compare the effectiveness of two methods for removing temporal autocorrelation from regression models (first-differencing and Cochrane–Orcutt) and we introduce the geostatistical technique of semivariograms as a method for quantifying temporal autocorrelation in uneven time series. The Cochrane–Orcutt method proved more effective than first-differencing at removing autocorrelation and produced regression models without changing the significance of the independent variables. Semivariograms were used to quantify the time dependence of the unevenly spaced stomatal conductance time series. This technique revealed the dominant autocorrelation at the minimum time lag (0.5 h) and the 24-h periodicity caused by the climatological variables used in the model. We conclude that geostatistical techniques provide a robust method for quantifying temporal structure and periodicity in unevenly spaced time series.

Download Full-text

Segmented Linear Regression Models for Assessing Change in Retrospective Studies in Healthcare

Computational and Mathematical Methods in Medicine ◽

10.1155/2019/9810675 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 6

Author(s):

Epaminondas Markos Valsamis ◽

David Ricketts ◽

Henry Husband ◽

Benedict Aristotle Rogers

Keyword(s):

Time Series ◽

Hip Fracture ◽

Linear Regression ◽

Length Of Stay ◽

Regression Models ◽

Retrospective Studies ◽

Linear Regression Models ◽

Time To Surgery ◽

Before And After ◽

Group Comparisons

Introduction. In retrospective studies, the effect of a given intervention is usually evaluated by using statistical tests to compare data from before and after the intervention. A problem with this approach is that the presence of underlying trends can lead to incorrect conclusions. This study aimed to develop a rigorous mathematical method to analyse temporal variation and overcome these limitations. Methods. We evaluated hip fracture outcomes (time to surgery, length of stay, and mortality) from a total of 2777 patients between April 2011 and September 2016, before and after the introduction of a dedicated hip fracture unit (HFU). We developed a novel modelling method that fits progressively more complex linear sections to the time series using least squares regression. The method was used to model the periods before implementation, after implementation, and of the whole study period, comparing goodness of fit using F-tests. Results. The proposed method offered reliable descriptions of the temporal evolution of the time series and augmented conclusions that were reached by mere group comparisons. Reductions in time to surgery, length of stay, and mortality rates that group comparisons would have credited to the hip fracture unit appeared to be due to unrelated underlying trends. Conclusion. Temporal analysis using segmented linear regression models can reveal secular trends and is a valuable tool to evaluate interventions in retrospective studies.

Download Full-text

Statistical Models for the Twinning Rate

Acta geneticae medicae et gemellologiae twin research ◽

10.1017/s000156600000605x ◽

1987 ◽

Vol 36 (3) ◽

pp. 297-312 ◽

Cited By ~ 13

Author(s):

J.O. Fellman ◽

A.W. Eriksson

Keyword(s):

Linear Regression ◽

Statistical Models ◽

Maternal Age ◽

Regression Models ◽

Model Building ◽

Data Sets ◽

Linear Regression Models ◽

Linear Regression Technique ◽

Secular Decline ◽

Disaggregated Data

AbstractLinear regression models are used to explain the variations in the twinning rates. Data sets from different countries are analysed and maternal age, parity and marital status are the main regressors. The model building technique is also used in order to study the secular decline in the twinning rate. Linear regression technique makes it possible to compare the effect of different factors but the method requires sufficiently disaggregated data.

Download Full-text

The SPARC Data Initiative: comparisons of CFC-11, CFC-12, HF and SF<sub>6</sub> climatologies from international satellite limb sounders

Earth System Science Data ◽

10.5194/essd-8-61-2016 ◽

2016 ◽

Vol 8 (1) ◽

pp. 61-78 ◽

Cited By ~ 5

Author(s):

S. Tegtmeier ◽

M. I. Hegglin ◽

J. Anderson ◽

B. Funke ◽

J. Gille ◽

...

Keyword(s):

Time Series ◽

Satellite Data ◽

Lower Stratosphere ◽

Atmospheric Transport ◽

Temporal Structure ◽

Data Sets ◽

Upper Troposphere ◽

Mean Fields ◽

Mean State

Abstract. A quality assessment of the CFC-11 (CCl3F), CFC-12 (CCl2F2), HF, and SF6 products from limb-viewing satellite instruments is provided by means of a detailed intercomparison. The climatologies in the form of monthly zonal mean time series are obtained from HALOE, MIPAS, ACE-FTS, and HIRDLS within the time period 1991–2010. The intercomparisons focus on the mean biases of the monthly and annual zonal mean fields and aim to identify their vertical, latitudinal and temporal structure. The CFC evaluations (based on MIPAS, ACE-FTS and HIRDLS) reveal that the uncertainty in our knowledge of the atmospheric CFC-11 and CFC-12 mean state, as given by satellite data sets, is smallest in the tropics and mid-latitudes at altitudes below 50 and 20 hPa, respectively, with a 1σ multi-instrument spread of up to ±5 %. For HF, the situation is reversed. The two available data sets (HALOE and ACE-FTS) agree well above 100 hPa, with a spread in this region of ±5 to ±10 %, while at altitudes below 100 hPa the HF annual mean state is less well known, with a spread ±30 % and larger. The atmospheric SF6 annual mean states derived from two satellite data sets (MIPAS and ACE-FTS) show only very small differences with a spread of less than ±5 % and often below ±2.5 %. While the overall agreement among the climatological data sets is very good for large parts of the upper troposphere and lower stratosphere (CFCs, SF6) or middle stratosphere (HF), individual discrepancies have been identified. Pronounced deviations between the instrument climatologies exist for particular atmospheric regions which differ from gas to gas. Notable features are differently shaped isopleths in the subtropics, deviations in the vertical gradients in the lower stratosphere and in the meridional gradients in the upper troposphere, and inconsistencies in the seasonal cycle. Additionally, long-term drifts between the instruments have been identified for the CFC-11 and CFC-12 time series. The evaluations as a whole provide guidance on what data sets are the most reliable for applications such as studies of atmospheric transport and variability, model–measurement comparisons and detection of long-term trends. The data sets will be publicly available from the SPARC Data Centre and through PANGAEA (doi:10.1594/PANGAEA.849223).

Download Full-text

Selecting statistical models to study the relationship between soybean yield and soil physical properties

Revista Brasileira de Ciência do Solo ◽

10.1590/s0100-06832011000100009 ◽

2011 ◽

Vol 35 (1) ◽

pp. 97-104 ◽

Cited By ~ 3

Author(s):

Marcio Paulo de Oliveira ◽

Maria Hermínia Ferreira Tavares ◽

Miguel Angel Uribe-Opazo ◽

Luis Carlos Timm

Keyword(s):

Statistical Models ◽

Regression Models ◽

Information Criterion ◽

Penetration Resistance ◽

Soil Bulk Density ◽

Soil Physical Properties ◽

Data Sets ◽

Linear Regression Models ◽

Soybean Yield ◽

Soil Penetration Resistance

Statistical models allow the representation of data sets and the estimation and/or prediction of the behavior of a given variable through its interaction with the other variables involved in a phenomenon. Among other different statistical models, are the autoregressive state-space models (ARSS) and the linear regression models (LR), which allow the quantification of the relationships among soil-plant-atmosphere system variables. To compare the quality of the ARSS and LR models for the modeling of the relationships between soybean yield and soil physical properties, Akaike's Information Criterion, which provides a coefficient for the selection of the best model, was used in this study. The data sets were sampled in a Rhodic Acrudox soil, along a spatial transect with 84 points spaced 3 m apart. At each sampling point, soybean samples were collected for yield quantification. At the same site, soil penetration resistance was also measured and soil samples were collected to measure soil bulk density in the 0-0.10 m and 0.10-0.20 m layers. Results showed autocorrelation and a cross correlation structure of soybean yield and soil penetration resistance data. Soil bulk density data, however, were only autocorrelated in the 0-0.10 m layer and not cross correlated with soybean yield. The results showed the higher efficiency of the autoregressive space-state models in relation to the equivalent simple and multiple linear regression models using Akaike's Information Criterion. The resulting values were comparatively lower than the values obtained by the regression models, for all combinations of explanatory variables.

Download Full-text

Artificial Neural Network for analyzing the chaotic time series motion: The case of the Lebanese GDP

10.21203/rs.3.rs-1024808/v1 ◽

2021 ◽

Author(s):

Jean-François Verne

Keyword(s):

Neural Network ◽

Time Series ◽

Artificial Neural Network ◽

Phase Space ◽

Regression Models ◽

Chaotic Dynamic ◽

Linear Regression Models ◽

Large Fluctuations ◽

Artificial Neural ◽

Artificial Neural Network Ann

Abstract In this paper, we propose to analyze the motion of the Lebanese GDP over the period 1950-2019. This macroeconomic aggregate reveals large fluctuations notably during the civil war period (1975-1990). By estimating the Lyapunov exponents with the Artificial Neural Network (ANN) procedure, we show that this series exhibits a strange attractor generated by a chaotic dynamic and we use the embedding procedure to shed in light the bizarre structure of such a series. Thus, the ANN method gives better results regarding prediction than other linear regression models and allows to fit with accuracy the chaotic motion followed by the Lebanese GDP in the phase space.

Download Full-text

Can ODE gene regulatory models neglect time lag or measurement scaling?

Bioinformatics ◽

10.1093/bioinformatics/btaa268 ◽

2020 ◽

Vol 36 (13) ◽

pp. 4058-4064

Author(s):

Jie Hu ◽

Huihui Qin ◽

Xiaodan Fan

Keyword(s):

Linear Regression ◽

Regression Models ◽

Time Course ◽

Time Lag ◽

Microarray Dataset ◽

Gene Products ◽

Linear Regression Models ◽

Ode Models ◽

Gene Regulatory ◽

Regulatory Models

Abstract Motivation Many ordinary differential equation (ODE) models have been introduced to replace linear regression models for inferring gene regulatory relationships from time-course gene expression data. But, since the observed data are usually not direct measurements of the gene products or there is an unknown time lag in gene regulation, it is problematic to directly apply traditional ODE models or linear regression models. Results We introduce a lagged ODE model to infer lagged gene regulatory relationships from time-course measurements, which are modeled as linear transformation of the gene products. A time-course microarray dataset from a yeast cell-cycle study is used for simulation assessment of the methods and real data analysis. The results show that our method, by considering both time lag and measurement scaling, performs much better than other linear and ODE models. It indicates the necessity of explicitly modeling the time lag and measurement scaling in ODE gene regulatory models. Availability and implementation R code is available at https://www.sta.cuhk.edu.hk/xfan/share/lagODE.zip.

Download Full-text

Dying From COVID-19 or With COVID-19: A Definitive Answer Through a Retrospective Analysis of Mortality in Italy (Preprint)

10.2196/preprints.36022 ◽

2021 ◽

Author(s):

Alessandro Rovetta ◽

Akshaya Srikanth Bhagavathula

Keyword(s):

Regression Models ◽

Current Knowledge ◽

Statistical Significance ◽

National Level ◽

Ordinary Least Squares ◽

Linear Regression Models ◽

Mortality Trends ◽

Male Mortality ◽

Definitive Answer ◽

Significant Difference

BACKGROUND COVID-19 mortality was associated with several reasons, including conspiracy theories and infodemic phenomena. However, little is known about the potential endogenous reasons for the increase in COVID-19 associated mortality in Italy. OBJECTIVE This study aimed to search the potential endogenous reasons for the increase in COVID-19 mortality recorded in Italy during the year 2020 and evaluate the statistical significance of the latter. METHODS We analyzed all the trends in the timelapse 2011-2019 related to deaths by age, sex, region, and cause of death in Italy and compared them with those of 2020. Ordinary least squares (OLS) linear regressions and ARIMA (p, d, q) models were applied to investigate the predictions of death in 2020 as compared to death reported in the same year. Grubbs and Iglewicz-Hoaglin tests were used to identify the statistical differences between the predicted and observed deaths. The relationship between mortality and predictive variables was assessed using OLS multiple regression models. RESULTS Both ARIMA and OLS linear regression models predicted the number of deaths in Italy during 2020 to be between 640,000 and 660,000 (95% confidence intervals range: 620,000 – 695,000) and these values were far from the observed deaths reported (above 750,000). Significant difference in deaths at national level (P = 0.003), and higher male mortality than women (+18% versus +14%, P < 0.001 versus P = 0.01) was observed. Finally, higher mortality was strongly and positively correlated with latitude (R = 0.82, P < 0.001). CONCLUSIONS Our findings support the absence of historical endogenous reasons capable of justifying the increase in deaths and mortality observed in Italy in 2020. Together with the current knowledge on the novel coronavirus 2019, these findings provide decisive evidence on the devastating impact of COVID-19 in Italy. We suggest that this research be leveraged by government, health, and information authorities to furnish proof against conspiracy hypotheses. Moreover, given the marked concordance between the predictions of the ARIMA and OLS regression models, we suggest that these models be exploited to predict mortality trends.

Download Full-text

Linear regression models for biomass table construction, using cluster samples

Canadian Journal of Forest Research ◽

10.1139/x89-103 ◽

1989 ◽

Vol 19 (5) ◽

pp. 664-673 ◽

Cited By ~ 1

Author(s):

Andrew J. R. Gillespie ◽

Tiberius Cunia

Keyword(s):

Least Squares ◽

Regression Models ◽

Predictor Variable ◽

Ordinary Least Squares ◽

Cluster Sampling ◽

Parameter Estimates ◽

Least Squares Regression ◽

Linear Regression Models ◽

Estimation Procedures ◽

Biased Estimates

Biomass tables are often constructed from cluster samples by means of ordinary least squares regression estimation procedures. These procedures assume that sample observations are uncorrelated, which ignores the intracluster correlation of cluster samples and results in underestimates of the model error. We tested alternative estimation procedures by simulation under a variety of cluster sampling methods, to determine combinations of sampling and estimation procedures that yield accurate parameter estimates and reliable estimates of error. Modified, generalized, and jack-knife least squares procedures gave accurate parameter and error estimates when sample trees were selected with equal probability. Regression models that did not include height as a predictor variable yielded biased parameter estimates when sample trees were selected with probability proportional to tree size. Models that included height did not yield biased estimates. There was no discernible gain in precision associated with sampling with probability proportional to size. Random coefficient regressions generally gave biased point estimates with poor precision, regardless of sampling method.

Download Full-text

Properties of the ordinary least squares and stein-rule predictions in linear regression models with proxy variables

Statistical Papers ◽

10.1007/bf02925525 ◽

1993 ◽

Vol 34 (1) ◽

pp. 27-41 ◽

Cited By ~ 1

Author(s):

V. K. Srivastava ◽

M. Dube

Keyword(s):

Linear Regression ◽

Least Squares ◽

Regression Models ◽

Ordinary Least Squares ◽

Linear Regression Models ◽

Proxy Variables

Download Full-text

Calibration of stormwater quality regression models: a random process?

Water Science & Technology ◽

10.2166/wst.2010.324 ◽

2010 ◽

Vol 62 (4) ◽

pp. 875-882 ◽

Cited By ~ 4

Author(s):

A. Dembélé ◽

J.-L. Bertrand-Krajewski ◽

B. Barillon

Keyword(s):

Experimental Data ◽

Least Squares ◽

Regression Models ◽

Linear Models ◽

Least Squares Method ◽

Weighted Least Squares ◽

Ordinary Least Squares ◽

Data Sets ◽

Data Set ◽

Urban Catchments

Regression models are among the most frequently used models to estimate pollutants event mean concentrations (EMC) in wet weather discharges in urban catchments. Two main questions dealing with the calibration of EMC regression models are investigated: i) the sensitivity of models to the size and the content of data sets used for their calibration, ii) the change of modelling results when models are re-calibrated when data sets grow and change with time when new experimental data are collected. Based on an experimental data set of 64 rain events monitored in a densely urbanised catchment, four TSS EMC regression models (two log-linear and two linear models) with two or three explanatory variables have been derived and analysed. Model calibration with the iterative re-weighted least squares method is less sensitive and leads to more robust results than the ordinary least squares method. Three calibration options have been investigated: two options accounting for the chronological order of the observations, one option using random samples of events from the whole available data set. Results obtained with the best performing non linear model clearly indicate that the model is highly sensitive to the size and the content of the data set used for its calibration.

Download Full-text