Performance evaluation of cetacean species distribution models developed using generalized additive models and boosted regression trees

Elizabeth A. Becker; James V. Carretta; Karin A. Forney; Jay Barlow; Stephanie Brodie; Ryan Hoopes; Michael G. Jacox; Sara M. Maxwell; Jessica V. Redfern; Nicholas B. Sisson; Heather Welch; Elliott L. Hazen

doi:10.1002/ece3.6316

Testing the Influence of Seascape Connectivity on Marine-Based Species Distribution Models

Frontiers in Marine Science ◽

10.3389/fmars.2021.766915 ◽

2021 ◽

Vol 8 ◽

Author(s):

Giorgia Cecino ◽

Roozbeh Valavi ◽

Eric A. Treml

Keyword(s):

Species Distribution ◽

Larval Dispersal ◽

Species Distribution Models ◽

Temporal Dynamics ◽

Explanatory Power ◽

Model Performance ◽

Boosted Regression Trees ◽

Distribution Models ◽

Flexible Tool ◽

Species Specific

Species distribution models (SDMs) are commonly used in ecology to predict species occurrence probability and how species are geographically distributed. Here, we propose innovative predictive factors to efficiently integrate information on connectivity into SDMs, a key element of population dynamics strongly influencing how species are distributed across seascapes. We also quantify the influence of species-specific connectivity estimates (i.e., larval dispersal vs. adult movement) on the marine-based SDMs outcomes. For illustration, seascape connectivity was modeled for two common, yet contrasting, marine species occurring in southeast Australian waters, the purple sea urchin, Heliocidaris erythrogramma, and the Australasian snapper, Chrysophrys auratus. Our models illustrate how different species-specific larval dispersal and adult movement can be efficiently accommodated. We used network-based centrality metrics to compute patch-level importance values and include these metrics in the group of predictors of correlative SDMs. We employed boosted regression trees (BRT) to fit our models, calculating the predictive performance, comparing spatial predictions and evaluating the relative influence of connectivity-based metrics among other predictors. Network-based metrics provide a flexible tool to quantify seascape connectivity that can be efficiently incorporated into SDMs. Connectivity across larval and adult stages was found to contribute to SDMs predictions and model performance was not negatively influenced from including these connectivity measures. Degree centrality, quantifying incoming and outgoing connections with habitat patches, was the most influential centrality metric. Pairwise interactions between predictors revealed that the species were predominantly found around hubs of connectivity and in warm, high-oxygenated, shallow waters. Additional research is needed to quantify the complex role that habitat network structure and temporal dynamics may have on SDM spatial predictions and explanatory power.

Download Full-text

Comparative performance of generalized additive models and boosted regression trees for statistical modeling of incidental catch of wahoo (Acanthocybium solandri) in the Mexican tuna purse-seine fishery

Ecological Modelling ◽

10.1016/j.ecolmodel.2012.03.006 ◽

2012 ◽

Vol 233 ◽

pp. 20-25 ◽

Cited By ~ 31

Author(s):

Raul O. Martínez-Rincón ◽

Sofía Ortega-García ◽

Juan G. Vaca-Rodríguez

Keyword(s):

Statistical Modeling ◽

Generalized Additive Models ◽

Regression Trees ◽

Additive Models ◽

Boosted Regression Trees ◽

Purse Seine ◽

Comparative Performance ◽

Incidental Catch ◽

Purse Seine Fishery

Download Full-text

Testing the ability of species distribution models to infer variable importance

10.1101/715904 ◽

2019 ◽

Cited By ~ 2

Author(s):

Adam B. Smith ◽

Maria J. Santos

Keyword(s):

Predictive Accuracy ◽

Permutation Test ◽

Generalized Additive Models ◽

Simulated Data ◽

Variable Importance ◽

Additive Models ◽

Environmental Data ◽

Boosted Regression Trees ◽

Distribution Models ◽

Model Algorithm

AbstractModels of species’ distributions and niches are frequently used to infer the importance of range- and niche-defining variables. However, the degree to which these models can reliably identify important variables and quantify their influence remains unknown. Here we use a series of simulations to explore how well models can 1) discriminate between variables with different influence and 2) calibrate the magnitude of influence relative to an “omniscient” model. To quantify variable importance, we trained generalized additive models (GAMs), Maxent, and boosted regression trees (BRTs) on simulated data and tested their sensitivity to permutations in each predictor. Importance was inferred by calculating the correlation between permuted and unpermuted predictions, and by comparing predictive accuracy of permuted and unpermuted predictions using AUC and the Continuous Boyce Index. In scenarios with one influential and one uninfluential variable, models were unable to discriminate reliably between variables in conditions that are normally challenging for generating accurate predictions: training occurrences <8-64; prevalence >0.5; small spatial extent; environmental data with coarse resolution when spatial autocorrelation is low; and correlation between environmental variables where |r| >0.7. When two variables influenced the distribution equally, importance was underestimated when species had narrow or intermediate niche breadth. Interactions between variables in how they shaped the niche did not affect inferences about their importance. When variables acted unequally, the effect of the stronger variable was overestimated. GAMs and Maxent discriminated between variables more reliably than BRTs, but no algorithm was consistently well-calibrated vis-à-vis the omniscient model. Algorithm-specific measures of importance like Maxent’s change-in-gain metric were less robust than the permutation test. Overall, high predictive accuracy did not connote robust inferential capacity. As a result, requirements for reliably measuring variable importance are likely more stringent than for creating models with high predictive accuracy.

Download Full-text

Putting prey into the picture: improvements to species distribution models for bottlenose dolphins in Doubtful Sound, New Zealand

Marine Ecology Progress Series ◽

10.3354/meps13492 ◽

2020 ◽

Vol 653 ◽

pp. 191-204

Author(s):

S Bennington ◽

W Rayment ◽

S Dawson

Keyword(s):

New Zealand ◽

Habitat Use ◽

Species Distribution ◽

Species Distribution Models ◽

Bottlenose Dolphins ◽

Model Performance ◽

Additive Models ◽

Distribution Models ◽

Marine Predators ◽

Abiotic Variables

Species distribution models (SDMs) often rely on abiotic variables as proxies for biotic relationships. This means that important biotic relationships may be missed, creating ambiguity in our understanding of the drivers of habitat use. These problems are especially relevant for populations of predators, as their habitat use is likely to be strongly influenced by the distribution of their prey. We investigated habitat use of a population of a top predator, bottlenose dolphins Tursiops truncatus, in Doubtful Sound, New Zealand, using generalised additive models, and compared the results of models with and without biotic predictor variables. We found that although habitat use by bottlenose dolphins was significantly correlated with abiotic variables that likely describe foraging areas, introduction of biotic variables describing potential prey almost doubled the deviance explained, from 19.8 to 39.1%. Biotic variables were the most important of the predictors used, and indicated that the dolphins showed a preference for areas with a high abundance of a reef fish, girdled wrasse Notolabrus cinctus. For the dolphins of Doubtful Sound, these results show the importance of prey distribution in driving habitat use. On a broader scale, our results indicate that making an effort to include true biotic descriptors in SDMs can improve model performance, resulting in better understanding of the drivers of distribution of marine predators.

Download Full-text

Predicting suitable coastal habitat for sei whales, southern right whales and dolphins around the Falkland Islands

PLoS ONE ◽

10.1371/journal.pone.0244068 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0244068

Author(s):

Mick Baines ◽

Caroline R. Weir

Keyword(s):

Anthropogenic Activities ◽

Generalized Additive Models ◽

Habitat Preferences ◽

Additive Models ◽

Falkland Islands ◽

Coastal Habitat ◽

Distribution Models ◽

Cetacean Species ◽

Right Whales ◽

Southern Right Whales

Species distribution models (SDMs) are valuable tools for describing the occurrence of species and predicting suitable habitats. This study used generalized additive models (GAMs) and MaxEnt models to predict the relative densities of four cetacean species (sei whale Balaeanoptera borealis, southern right whale Eubalaena australis, Peale’s dolphin Lagenorhynchus australis, and Commerson’s dolphin Cephalorhynchus commersonii) in neritic waters (≤100 m depth) around the Falkland Islands, using boat survey data collected over three seasons (2017–2019). The model predictor variables (PVs) included remotely sensed environmental variables (sea surface temperature, SST, and chlorophyll-a concentration) and static geographical variables (e.g. water depth, distance to shore, slope). The GAM results explained 35 to 41% of the total deviance for sei whale, combined sei whales and unidentified large baleen whales, and Commerson’s dolphins, but only 17% of the deviance for Peale’s dolphins. The MaxEnt models for all species had low to moderate discriminatory power. The relative density of sei whales increased with SST in both models, and their predicted distribution was widespread across the inner shelf which is consistent with the use of Falklands’ waters as a coastal summer feeding ground. Peale’s dolphins and Commerson’s dolphins were largely sympatric across the study area. However, the relative densities of Commerson’s dolphins were generally predicted to be higher in nearshore, semi-enclosed, waters compared with Peale’s dolphins, suggesting some habitat partitioning. The models for southern right whales performed poorly and the results were not considered meaningful, perhaps due to this species exhibiting fewer strong habitat preferences around the Falklands. The modelling results are applicable to marine spatial planning to identify where the occurrence of cetacean species and anthropogenic activities may most overlap. Additionally, the results can inform the process of delineating a potential Key Biodiversity Area for sei whales in the Falkland Islands.

Download Full-text

[Final version available] Explainable Artificial Intelligence enhances the ecological interpretability of black-box species distribution models

10.32942/osf.io/w96pk ◽

2020 ◽

Cited By ~ 1

Author(s):

Masahiro Ryo ◽

Boyan Angelov ◽

Stefano Mammola ◽

Jamie M. Kass ◽

Blas M. Benito ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Species Distribution ◽

Species Distribution Models ◽

Complex Model ◽

Boosted Regression Trees ◽

Learning Approaches ◽

Distribution Models ◽

Interpretable Model ◽

Scale Behavior

Species distribution models (SDMs) are widely used in ecology, biogeography and conservation biology to estimate relationships between environmental variables and species occurrence data and make predictions of how their distributions vary in space and time. During the past two decades, the field has increasingly made use of machine learning approaches for constructing and validating SDMs. Model accuracy has steadily increased as a result, but the interpretability of the fitted models, for example the relative importance of predictor variables or their causal effects on focal species, has not always kept pace. Here we draw attention to an emerging subdiscipline of artificial intelligence, explainable AI (xAI), as a toolbox for better interpreting SDMs. xAI aims at deciphering the behavior of complex statistical or machine learning models (e.g. neural networks, random forests, boosted regression trees), and can produce more transparent and understandable SDM predictions. We describe the rationale behind xAI and provide a list of tools that can be used to help ecological modelers better understand complex model behavior at different scales. As an example, we perform a reproducible SDM analysis in R on the African elephant and showcase some xAI tools such as local interpretable model-agnostic explanation (LIME) to help interpret local-scale behavior of the model. We conclude with what we see as the benefits and caveats of these techniques and advocate for their use to improve the interpretability of machine learning SDMs.

Download Full-text

Drivers of Solidago species invasion in Central Europe---Case study in the landscape of the Carpathian Mountains and their foreground

10.22541/au.161871535.58482919/v1 ◽

2021 ◽

Author(s):

Chathura Perera ◽

Tomasz Szymura ◽

Adam Zając ◽

Dominika Chmolowska ◽

Magdalena Szymura

Keyword(s):

Central Europe ◽

Species Distribution ◽

Species Distribution Models ◽

Boosted Regression Trees ◽

Sampling Effort ◽

Similar Species ◽

Distribution Models ◽

Carpathian Mountains ◽

Explanatory Variables ◽

Historical Distribution

Abstract Aim: The invasion process is a complex, context-dependent phenomenon; nevertheless, it can be described using the PAB framework. This framework encompasses the joint effect of propagule pressure (P), abiotic characteristics of the environment (A), and biotic characteristics of both the invader and recipient vegetation (B). We analyzed the effectiveness of proxies of PAB factors to explain the spatial pattern of Solidago canadensis and S. gigantea invasion using invasive species distribution models. Location: Carpathian Mountains and their foreground, Central Europe. Methods: The data on species presence or absence were from an atlas of neophyte distribution based on a 2 × 2 km grid, covering approximately 31,200 km2 (7752 grid cells). Proxies of PAB factors, along with data on historical distribution of invaders were used as explanatory variables in Boosted Regression Trees models to explain the distribution of invasive Solidago. The areas with potentially lower sampling effort were excluded from analysis based on a target species approach. Results: Proxies of the PAB factors helped to explain the distribution of both S. canadensis and S. gigantea. Distributions of both species were limited climatically because a mountain climate is not conducive to their growth; however, the S. canadensis distribution pattern was correlated with proxies of human pressure, whereas S. gigantea distribution was connected with environmental characteristics. The varied responses of species with regard to distance from their historical distribution sites indicated differences in their invasion drivers. Main conclusions: Proxies of PAB are helpful in the choice of explanatory variables as well as the ecological interpretation of species distribution models. The results underline that human activity can cause variation in the invasion of ecologically similar species.

Download Full-text

Using species distribution models with climate change scenarios to aid ecological restoration decisionmaking for southern California shrublands

10.2737/psw-rp-270 ◽

2018 ◽

Author(s):

Erin C. Riordan ◽

Arlee M. Montalvo ◽

Jan L. Beyers

Keyword(s):

Climate Change ◽

Ecological Restoration ◽

Species Distribution ◽

Southern California ◽

Species Distribution Models ◽

Climate Change Scenarios ◽

Distribution Models

Download Full-text

A new record of Stylophorum diphyllum (Michx.) Nutt. in Canada: A case study of the value and limitations of building species distribution models for very rare plants

The Journal of the Torrey Botanical Society ◽

10.3159/torrey-d-18-00026.1 ◽

2019 ◽

Vol 146 (2) ◽

pp. 119 ◽

Cited By ~ 2

Author(s):

Jenny L. McCune

Keyword(s):

Species Distribution ◽

New Record ◽

Species Distribution Models ◽

Rare Plants ◽

Distribution Models

Download Full-text

A Robust Prediction Model for Species Distribution Using Bagging Ensembles with Deep Neural Networks

Remote Sensing ◽

10.3390/rs13081495 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1495

Author(s):

Jehyeok Rew ◽

Yongjang Cho ◽

Eenjun Hwang

Keyword(s):

Neural Networks ◽

Species Distribution ◽

Best Practice ◽

Deep Neural Networks ◽

Species Distribution Models ◽

Species Distribution Model ◽

Environmental Data ◽

Distribution Model ◽

Distribution Models ◽

Robust Prediction

Species distribution models have been used for various purposes, such as conserving species, discovering potential habitats, and obtaining evolutionary insights by predicting species occurrence. Many statistical and machine-learning-based approaches have been proposed to construct effective species distribution models, but with limited success due to spatial biases in presences and imbalanced presence-absences. We propose a novel species distribution model to address these problems based on bootstrap aggregating (bagging) ensembles of deep neural networks (DNNs). We first generate bootstraps considering presence-absence data on spatial balance to alleviate the bias problem. Then we construct DNNs using environmental data from presence and absence locations, and finally combine these into an ensemble model using three voting methods to improve prediction accuracy. Extensive experiments verified the proposed model’s effectiveness for species in South Korea using crowdsourced observations that have spatial biases. The proposed model achieved more accurate and robust prediction results than the current best practice models.

Download Full-text