Linear Models I: Regression; PCA of Predictor Variables

Mitigating the Impact of Field and Image Registration Errors through Spatial Aggregation

Remote Sensing ◽

10.3390/rs11030222 ◽

2019 ◽

Vol 11 (3) ◽

pp. 222 ◽

Cited By ~ 5

Author(s):

John Hogland ◽

David L.R. Affleck

Keyword(s):

Linear Models ◽

Predictor Variable ◽

Spatial Aggregation ◽

Predictor Variables ◽

Remotely Sensed Data ◽

Substantial Impact ◽

Landscape Characteristics ◽

Registration Errors ◽

Spatial Registration ◽

The Impact

Remotely sensed data are commonly used as predictor variables in spatially explicit models depicting landscape characteristics of interest (response) across broad extents, at relatively fine resolution. To create these models, variables are spatially registered to a known coordinate system and used to link responses with predictor variable values. Inherently, this linking process introduces measurement error into the response and predictors, which in the latter case causes attenuation bias. Through simulations, our findings indicate that the spatial correlation of response and predictor variables and their corresponding spatial registration (co-registration) errors can have a substantial impact on the bias and accuracy of linear models. Additionally, in this study we evaluate spatial aggregation as a mechanism to minimize the impact of co-registration errors, assess the impact of subsampling within the extent of sample units, and provide a technique that can be used to both determine the extent of an observational unit needed to minimize the impact of co-registration and quantify the amount of error potentially introduced into predictive models.

Download Full-text

Detecting Confounding in Multivariate Linear Models via Spectral Analysis

Journal of Causal Inference ◽

10.1515/jci-2017-0013 ◽

2018 ◽

Vol 6 (1) ◽

Cited By ~ 4

Author(s):

Dominik Janzing ◽

Bernhard Schölkopf

Keyword(s):

Spectral Analysis ◽

Linear Models ◽

Regression Coefficients ◽

Concentration Of Measure ◽

Predictor Variables ◽

Target Variable ◽

Independence Assumption ◽

Common Cause ◽

Multivariate Linear Models ◽

Special Case

AbstractWe study a model where one target variable $Y$ is correlated with a vector $\textbf{X}:=(X_1,\dots,X_d)$ of predictor variables being potential causes of $Y$. We describe a method that infers to what extent the statistical dependences between $\textbf{X}$ and $Y$ are due to the influence of $\textbf{X}$ on $Y$ and to what extent due to a hidden common cause (confounder) of $\textbf{X}$ and $Y$. The method relies on concentration of measure results for large dimensions $d$ and an independence assumption stating that, in the absence of confounding, the vector of regression coefficients describing the influence of each $\textbf{X}$ on $Y$ typically has ‘generic orientation’ relative to the eigenspaces of the covariance matrix of $\textbf{X}$. For the special case of a scalar confounder we show that confounding typically spoils this generic orientation in a characteristic way that can be used to quantitatively estimate the amount of confounding (subject to our idealized model assumptions).

Download Full-text

Interpreting interaction effects in generalized linear models of nonlinear probabilities and counts

10.31234/osf.io/th94c ◽

2020 ◽

Author(s):

Connor McCabe ◽

Max Andrew Halvorson ◽

Kevin Michael King ◽

Xiaolin Cao ◽

Dale Sim Kim

Keyword(s):

Generalized Linear Models ◽

Logistic Model ◽

Linear Models ◽

Interaction Effects ◽

Predictor Variables ◽

Model Parameters ◽

Marginal Effect ◽

Psychology Research ◽

Product Term ◽

Traditional Approaches

Psychology research frequently involves the study of probabilities and counts. These are typically analyzed using generalized linear models (GLMs), which can produce these quantities via nonlinear transformation of model parameters. Interactions are central within many research applications of these models. To date, typical practice in evaluating interactions for probabilities or counts extends directly from linear approaches, in which evidence of an interaction effect is supported by using the product term coefficient between variables of interest. However, unlike linear models, interaction effects in GLMs describing probabilities and counts are not equal to product terms between predictor variables. Instead, interactions may be functions of the predictors of a model, requiring non-traditional approaches for interpreting these effects accurately. Here, we define interactions as change in a marginal effect of one variable as a function of change in another variable, and describe the use of partial derivatives and discrete differences for quantifying these effects. Using guidelines and simulated examples, we then use these approaches to describe how interaction effects should be estimated and interpreted for GLMs on probability and count scales. We conclude with an example using the Adolescent Brain Cognitive Development Study demonstrating how to correctly evaluate interaction effects in a logistic model.

Download Full-text

Comparison of single- and multi-scale models for the prediction of the Culicoides biting midge distribution in Germany

Geospatial health ◽

10.4081/gh.2016.405 ◽

2016 ◽

Vol 11 (2) ◽

Cited By ~ 1

Author(s):

Renke Lühken ◽

Jörn Martin Gethmann ◽

Petra Kranz ◽

Pia Steffenhagen ◽

Christoph Staubach ◽

...

Keyword(s):

Land Cover ◽

Linear Models ◽

Predictor Variables ◽

Buffer Zones ◽

Biting Midges ◽

Generalised Linear Models ◽

Multi Scale ◽

Land Cover Data ◽

Scale Models ◽

Trapping Sites

This study analysed Culicoides presence-absence data from 46 sampling sites in Germany, where monitoring was carried out from April 2007 until May 2008. Culicoides presence-absence data were analysed in relation to land cover data, in order to study whether the prevalence of biting midges is correlated to land cover data with respect to the trapping sites. We differentiated eight scales, i.e. buffer zones with radii of 0.5, 1, 2, 3, 4, 5, 7.5 and 10 km, around each site, and chose several land cover variables. For each species, we built eight single-scale models (i.e. predictor variables from one of the eight scales for each model) based on averaged, generalised linear models and two multiscale models (i.e. predictor variables from all of the eight scales) based on averaged, generalised linear models and generalised linear models with random forest variable selection. There were no significant differences between performance indicators of models built with land cover data from different buffer zones around the trapping sites. However, the overall performance of multi-scale models was higher than the alternatives. Furthermore, these models mostly achieved the best performance for the different species using the index area under the receiver operating characteristic curve. However, as also presented in this study, the relevance of the different variables could significantly differ between various scales, including the number of species affected and the positive or negative direction. This is an even more severe problem if multi-scale models are concerned, in which one model can have the same variable at different scales but with different directions, i.e. negative and positive direction of the same variable at different scales. However, multi-scale modelling is a promising approach to model the distribution of Culicoides species, accounting much more for the ecology of biting midges, which uses different resources (breeding sites, hosts, etc.) at different scales.

Download Full-text

Bias-Reduced Simultaneous Confidence Bands on Generalized Linear Models With Restricted Predictor Variables

Journal of Statistical Theory and Practice ◽

10.1080/15598608.2012.673882 ◽

2012 ◽

Vol 6 (2) ◽

pp. 286-302 ◽

Cited By ~ 1

Author(s):

Amy Wagler ◽

Melinda McCann

Keyword(s):

Generalized Linear Models ◽

Linear Models ◽

Confidence Bands ◽

Predictor Variables ◽

Simultaneous Confidence Bands

Download Full-text

Estimating Fresh Weight of Individual Pea Shoots Using Measurable Morphological Characteristics

HortScience ◽

10.21273/hortsci15086-20 ◽

2020 ◽

Vol 55 (7) ◽

pp. 1111-1118

Author(s):

Yun Kong ◽

Xiangyue Kong ◽

Youbin Zheng

Keyword(s):

Prediction Accuracy ◽

Regression Models ◽

Morphological Traits ◽

Stepwise Regression ◽

Linear Models ◽

Morphological Characteristics ◽

Stem Length ◽

Predictor Variables ◽

Fresh Weight ◽

Power Models

Nondestructive estimation of individual shoot fresh weight (FW) from its measurable morphological traits is useful for a wide variety of purposes in pea shoot production. To predict individual shoot FW, nine regression models in total were developed, including two power models using stem diameter (SMD) or stem length (SML) as a variable, and seven linear models using part or all the following variables: SMD, SML, leaflet length (LL), leaflet width (LW), stipule length (SEL), and stipule width (SEW). Among the nine models, the 6-variable linear equation had the highest coefficient of determination, R2 = 0.92, indicating it is most effective at explaining the variation in FW. The linear equations including only one variable, SMD or SML, were equally the least effective as nonlinear equations (i.e., power models). This finding suggests that there was a linear rather than nonlinear relationship between FW and the morphological variables. During stepwise regression, SEW and LW together were first removed from the 6-variable linear models without reducing the R2, and then SEL, SMD, SML were further removed one-by-one, which reduced the R2 from 0.92 to 0.90, 0.85, and 0.71, respectively. The result suggests that SMD, SML, SEL, and LL were the most important four predictor variables for multivariable linear regression models to estimate FW, an idea that was also supported by path analysis. For the four linear models with 1–4 predictor variables from stepwise regression, the prediction accuracy of FW was evaluated based on the agreement between the predicted and measured values using another independent dataset. The 4- and 3-variable linear models (i.e., FW = −1.437 + 0.276 SMD + 0.010 SML + 0.022 LL + 0.013 SEL and FW = −1.383 + 0.308 SMD + 0.011 SML + 0.030 LL, respectively) were selected for their more accurate prediction than 1- and 2-variable linear models and relatively simpler forms than a 6-variable linear model. Although the prediction accuracy can be potentially affected by air temperature, light conditions, and harvesting time, the multilinear regression model is an effective approach for estimating fresh weight of individual pea shoots using its measurable morphological traits.

Download Full-text

Generalized Linear Models in Vehicle Insurance

Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis ◽

10.11118/actaun201462020383 ◽

2014 ◽

Vol 62 (2) ◽

pp. 383-388 ◽

Cited By ~ 4

Author(s):

Silvie Kafková ◽

Lenka Křivánková

Keyword(s):

Risk Factors ◽

Linear Models ◽

Information Criterion ◽

Insurance Premium ◽

Insurance Companies ◽

Predictor Variables ◽

Estimation Of Parameters ◽

Claim Frequency ◽

Insurance Data ◽

Selection Of

Actuaries in insurance companies try to find the best model for an estimation of insurance premium. It depends on many risk factors, e.g. the car characteristics and the profile of the driver. In this paper, an analysis of the portfolio of vehicle insurance data using a generalized linear model (GLM) is performed. The main advantage of the approach presented in this article is that the GLMs are not limited by inflexible preconditions. Our aim is to predict the relation of annual claim frequency on given risk factors. Based on a large real-world sample of data from 57 410 vehicles, the present study proposed a classification analysis approach that addresses the selection of predictor variables. The models with different predictor variables are compared by analysis of deviance and Akaike information criterion (AIC). Based on this comparison, the model for the best estimate of annual claim frequency is chosen. All statistical calculations are computed in R environment, which contains stats package with the function for the estimation of parameters of GLM and the function for analysis of deviation.

Download Full-text

Association and Prediction: Multiple Regression Analysis and Linear Models with Multiple Predictor Variables

Wiley Series in Probability and Statistics - Biostatistics ◽

10.1002/0471602396.ch11 ◽

2004 ◽

pp. 428-519

Keyword(s):

Regression Analysis ◽

Multiple Regression Analysis ◽

Multiple Regression ◽

Linear Models ◽

Predictor Variables ◽

Multiple Predictor

Download Full-text

Predictions of critical habitat for five whale species in the waters of coastal British Columbia

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f01-078 ◽

2001 ◽

Vol 58 (7) ◽

pp. 1265-1285 ◽

Cited By ~ 60

Author(s):

Edward J Gregr ◽

Andrew W Trites

Keyword(s):

British Columbia ◽

Linear Models ◽

North Pacific Ocean ◽

Sperm Whale ◽

Predictor Variables ◽

Physeter Macrocephalus ◽

Sperm Whales ◽

Large Area ◽

Critical Habitat ◽

Habitat Models

Whaling records from British Columbia coastal whaling stations reliably report the positions of 9592 whales killed between 1948 and 1967. We used this positional information and oceanographic data (bathymetry, temperature, and salinity) to predict critical habitat off the coast of British Columbia for sperm (Physeter macrocephalus), sei (Balaenoptera borealis), fin (Balaenoptera physalus), humpback (Megaptera novaeangliae), and blue (Balaenoptera musculus) whales. We used generalized linear models at annual and monthly time scales to relate whale occurrence to six predictor variables (month, depth, slope, depth class, and sea surface temperature and salinity). The models showed critical habitat for sei, fin, and male sperm whales along the continental slope and over a large area off the northwest coast of Vancouver Island. Habitat models for blue, humpback, and female sperm whales were relatively insensitive to the predictor variables, owing partially to the smaller sample sizes for these groups. The habitat predictions lend support to recent hypotheses about sperm whale breeding off British Columbia and identify humpback whale habitat in sheltered bays and straits throughout the coast. The habitat models also provide insights about the nature of the linkages between the environment and the distribution of whales in the North Pacific Ocean.

Download Full-text

Applied Linear Models with SAS

10.1017/cbo9780511778643 ◽

2009 ◽

Cited By ~ 4

Author(s):

Daniel Zelterman

Keyword(s):

Linear Models

Download Full-text