Generalized Linear Models outperform commonly used canonical analysis in estimating spatial structure of presence/absence data

PeerJ ◽

10.7717/peerj.9777 ◽

2020 ◽

Vol 8 ◽

pp. e9777

Author(s):

Lélis A. Carlos-Júnior ◽

Joel C. Creed ◽

Rob Marrs ◽

Rob J. Lewis ◽

Timothy P. Moulton ◽

...

Keyword(s):

Model Selection ◽

Spatial Structure ◽

Generalized Linear Models ◽

Species Interactions ◽

Linear Models ◽

Spatial Scales ◽

Error Rates ◽

Ecological Communities ◽

Forward Selection ◽

Explanatory Variables

Background Ecological communities tend to be spatially structured due to environmental gradients and/or spatially contagious processes such as growth, dispersion and species interactions. Data transformation followed by usage of algorithms such as Redundancy Analysis (RDA) is a fairly common approach in studies searching for spatial structure in ecological communities, despite recent suggestions advocating the use of Generalized Linear Models (GLMs). Here, we compared the performance of GLMs and RDA in describing spatial structure in ecological community composition data. We simulated realistic presence/absence data typical of many β-diversity studies. For model selection we used standard methods commonly used in most studies involving RDA and GLMs. Methods We simulated communities with known spatial structure, based on three real spatial community presence/absence datasets (one terrestrial, one marine and one freshwater). We used spatial eigenvectors as explanatory variables. We varied the number of non-zero coefficients of the spatial variables, and the spatial scales with which these coefficients were associated and then compared the performance of GLMs and RDA frameworks to correctly retrieve the spatial patterns contained in the simulated communities. We used two different methods for model selection, Forward Selection (FW) for RDA and the Akaike Information Criterion (AIC) for GLMs. The performance of each method was assessed by scoring overall accuracy as the proportion of variables whose inclusion/exclusion status was correct, and by distinguishing which kind of error was observed for each method. We also assessed whether errors in variable selection could affect the interpretation of spatial structure. Results Overall GLM with AIC-based model selection (GLM/AIC) performed better than RDA/FW in selecting spatial explanatory variables, although under some simulations the methods performed similarly. In general, RDA/FW performed unpredictably, often retaining too many explanatory variables and selecting variables associated with incorrect spatial scales. The spatial scale of the pattern had a negligible effect on GLM/AIC performance but consistently affected RDA’s error rates under almost all scenarios. Conclusion We encourage the use of GLM/AIC for studies searching for spatial drivers of species presence/absence patterns, since this framework outperformed RDA/FW in situations most likely to be found in natural communities. It is likely that such recommendations might extend to other types of explanatory variables.

Download Full-text

Minimum description length model selection criteria for generalized linear models

Institute of Mathematical Statistics Lecture Notes - Monograph Series - Statistics and science: a Festschrift for Terry Speed ◽

10.1214/lnms/1215091140 ◽

2003 ◽

pp. 145-163 ◽

Cited By ~ 12

Author(s):

Mark H. Hansen ◽

Bin Yu

Keyword(s):

Model Selection ◽

Generalized Linear Models ◽

Selection Criteria ◽

Linear Models ◽

Minimum Description Length ◽

Model Selection Criteria

Download Full-text

More generalized linear modelling.

Practical R for biologists: an introduction ◽

10.1079/9781789245349.0171 ◽

2021 ◽

pp. 171-186

Author(s):

Donald Quicke ◽

Buntika A. Butcher ◽

Rachel Kruft Welton

Keyword(s):

Generalized Linear Models ◽

Count Data ◽

Binary Data ◽

Linear Models ◽

Explanatory Variable ◽

Response Variable ◽

Explanatory Variables ◽

Continuous Response ◽

Generalized Linear Modelling ◽

Linear Modelling

Abstract This chapter employs generalized linear modelling using the function glm when we know that variances are not constant with one or more explanatory variables and/or we know that the errors cannot be normally distributed, for example, they may be binary data, or count data where negative values are impossible, or proportions which are constrained between 0 and 1. A glm seeks to determine how much of the variation in the response variable can be explained by each explanatory variable, and whether such relationships are statistically significant. The data for generalized linear models take the form of a continuous response variable and a combination of continuous and discrete explanatory variables.

Download Full-text

Climate Influences on Meningitis Incidence in Northwest Nigeria

Weather Climate and Society ◽

10.1175/wcas-d-13-00004.1 ◽

2014 ◽

Vol 6 (1) ◽

pp. 62-76 ◽

Cited By ~ 9

Author(s):

Auwal F. Abdussalam ◽

Andrew J. Monaghan ◽

Vanja M. Dukić ◽

Mary H. Hayden ◽

Thomas M. Hopson ◽

...

Keyword(s):

Generalized Linear Models ◽

Linear Models ◽

Generalized Additive Models ◽

Skill Score ◽

Additive Models ◽

Maximum Temperature ◽

Daily Maximum Temperature ◽

Daily Maximum ◽

Explanatory Variables ◽

Score Statistics

Abstract Northwest Nigeria is a region with a high risk of meningitis. In this study, the influence of climate on monthly meningitis incidence was examined. Monthly counts of clinically diagnosed hospital-reported cases of meningitis were collected from three hospitals in northwest Nigeria for the 22-yr period spanning 1990–2011. Generalized additive models and generalized linear models were fitted to aggregated monthly meningitis counts. Explanatory variables included monthly time series of maximum and minimum temperature, humidity, rainfall, wind speed, sunshine, and dustiness from weather stations nearest to the hospitals, and the number of cases in the previous month. The effects of other unobserved seasonally varying climatic and nonclimatic risk factors that may be related to the disease were collectively accounted for as a flexible monthly varying smooth function of time in the generalized additive models, s(t). Results reveal that the most important explanatory climatic variables are the monthly means of daily maximum temperature, relative humidity, and sunshine with no lag; and dustiness with a 1-month lag. Accounting for s(t) in the generalized additive models explains more of the monthly variability of meningitis compared to those generalized linear models that do not account for the unobserved factors that s(t) represents. The skill score statistics of a model version with all explanatory variables lagged by 1 month suggest the potential to predict meningitis cases in northwest Nigeria up to a month in advance to aid decision makers.

Download Full-text

Closed-form maximum likelihood estimator for generalized linear models in the case of categorical explanatory variables: application to insurance loss modeling

Computational Statistics ◽

10.1007/s00180-019-00918-7 ◽

2019 ◽

Vol 35 (2) ◽

pp. 689-724 ◽

Cited By ~ 1

Author(s):

Alexandre Brouste ◽

Christophe Dutang ◽

Tom Rohmer

Keyword(s):

Maximum Likelihood ◽

Closed Form ◽

Maximum Likelihood Estimator ◽

Generalized Linear Models ◽

Linear Models ◽

Likelihood Estimator ◽

Explanatory Variables ◽

Loss Modeling

Download Full-text

Graphical tools for model selection in generalized linear models

Statistics in Medicine ◽

10.1002/sim.5855 ◽

2013 ◽

Vol 32 (25) ◽

pp. 4438-4451 ◽

Cited By ~ 6

Author(s):

K. Murray ◽

S. Heritier ◽

S. Müller

Keyword(s):

Model Selection ◽

Generalized Linear Models ◽

Linear Models

Download Full-text

The macroecology of rapid evolutionary radiation

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2010.1950 ◽

2011 ◽

Vol 278 (1717) ◽

pp. 2486-2494 ◽

Cited By ~ 12

Author(s):

Nicholas F. Parnell ◽

J. Todd Streelman

Keyword(s):

Adaptive Radiation ◽

Species Interactions ◽

Spatial Scales ◽

Ecological Communities ◽

Core Set ◽

Evolutionary Radiation ◽

Local Species ◽

Entire Lake ◽

Insight Into ◽

Occurrence Of Species

A long-standing debate in ecology addresses whether community composition is the result of stochastic factors or assembly rules. Non-random, over-dispersed patterns of species co-occurrence have commonly been attributed to competition—a particularly important force in adaptive radiation. We thus examined the macroecology of the recently radiated cichlid rock-fish assemblage in Lake Malawi, Africa at a spectrum of increasingly fine spatial scales (entire lake to depth within rock-reef sites). Along this range of spatial scales, we observed a signal of community structure (decreased co-occurrence of species) at the largest and smallest scales, but not in between. Evidence suggests that the lakewide signature of structure is driven by extreme endemism and micro-allopatric speciation, while patterns of reduced co-occurrence with depth are indicative of species interactions. We identified a ‘core’ set of rock-reef species, found in combination throughout the lake, whose depth profiles exhibited replicated positive and negative correlation. Our results provide insight into how ecological communities may be structured differently at distinct spatial scales, re-emphasize the importance of local species interactions in community assembly, and further elucidate the processes shaping speciation in this model adaptive radiation.

Download Full-text

Performances of Bayesian model selection criteria for generalized linear models with non-ignorably missing covariates

Journal of Statistical Computation and Simulation ◽

10.1080/00949655.2012.760089 ◽

2013 ◽

Vol 84 (8) ◽

pp. 1670-1691 ◽

Cited By ~ 2

Author(s):

Zeynep Kalaylioglu

Keyword(s):

Model Selection ◽

Generalized Linear Models ◽

Bayesian Model ◽

Selection Criteria ◽

Linear Models ◽

Bayesian Model Selection ◽

Missing Covariates ◽

Model Selection Criteria

Download Full-text

Peer Review #3 of "Generalized Linear Models outperform commonly used canonical analysis in estimating spatial structure of presence/absence data (v0.1)"

10.7287/peerj.9777v0.1/reviews/3 ◽

2020 ◽

Keyword(s):

Spatial Structure ◽

Peer Review ◽

Generalized Linear Models ◽

Linear Models ◽

Canonical Analysis

Download Full-text

A dynamic occupancy model for interacting species with two spatial scales

10.1101/2020.12.16.423067 ◽

2020 ◽

Author(s):

Eivind Flittie Kleiven ◽

Frederic Barraquand ◽

Olivier Gimenez ◽

John-André Henden ◽

Rolf Anker Ims ◽

...

Keyword(s):

Spatial Structure ◽

Species Interactions ◽

Spatial Scales ◽

Species Interaction ◽

Camera Traps ◽

Occupancy Models ◽

Occupancy Model ◽

Interacting Species ◽

Multi Scale ◽

Extinction Probabilities

1AbstractOccupancy models have been developed independently to account for multiple spatial scales and species interactions in a dynamic setting. However, as interacting species (e.g., predators and prey) often operate at different spatial scales, including nested spatial structure might be especially relevant in models of interacting species. Here we bridge these two model frameworks by developing a multi-scale two-species occupancy model. The model is dynamic, i.e. it estimates initial occupancy, colonization and extinction probabilities - including probabilities conditional to the other species’ presence. With a simulation study, we demonstrate that the model is able to estimate parameters without bias under low, medium and high average occupancy probabilities, as well as low, medium and high detection probabilities. We further show the model’s ability to deal with sparse field data by applying it to a multi-scale camera trapping dataset on a mustelid-rodent predator-prey system. The field study illustrates that the model allows estimation of species interaction effects on colonization and extinction probabilities at two spatial scales. This creates opportunities to explicitly account for the spatial structure found in many spatially nested study designs, and to study interacting species that have contrasted movement ranges with camera traps.

Download Full-text

Large-scale model selection in misspecified generalized linear models

Biometrika ◽

10.1093/biomet/asab005 ◽

2021 ◽

Author(s):

Emre Demirkaya ◽

Yang Feng ◽

Pallavi Basu ◽

Jinchi Lv

Keyword(s):

Model Selection ◽

Generalized Linear Models ◽

Large Scale ◽

Linear Models ◽

Information Criterion ◽

Scale Model ◽

High Dimensional ◽

Model Selection Consistency ◽

New Information ◽

Large Scale Model

Summary Model selection is crucial both to high-dimensional learning and to inference for contemporary big data applications in pinpointing the best set of covariates among a sequence of candidate interpretable models. Most existing work assumes implicitly that the models are correctly specified or have fixed dimensionality, yet both are prevalent in practice. In this paper, we exploit the framework of model selection principles under the misspecified generalized linear models presented in Lv and Liu (2014) and investigate the asymptotic expansion of the posterior model probability in the setting of high-dimensional misspecified models.With a natural choice of prior probabilities that encourages interpretability and incorporates the Kullback–Leibler divergence, we suggest the high-dimensional generalized Bayesian information criterion with prior probability for large-scale model selection with misspecification. Our new information criterion characterizes the impacts of both model misspecification and high dimensionality on model selection. We further establish the consistency of covariance contrast matrix estimation and the model selection consistency of the new information criterion in ultra-high dimensions under some mild regularity conditions. The numerical studies demonstrate that our new method enjoys improved model selection consistency compared to its main competitors.

Download Full-text