Combining multiple data sources in species distribution models while accounting for spatial dependence and overfitting with combined penalised likelihood maximisation

Mapping Intimacies ◽

10.1101/615583 ◽

2019 ◽

Author(s):

Ian W. Renner ◽

Julie Louvrier ◽

Olivier Gimenez

Keyword(s):

Species Distribution ◽

Spatial Dependence ◽

Process Model ◽

Predictive Performance ◽

Species Distribution Modelling ◽

Data Sets ◽

Multiple Data ◽

Log Likelihood ◽

Likelihood Approach ◽

Penalised Likelihood

SummaryThe increase in availability of species data sets means that approaches to species distribution modelling that incorporate multiple data sets are in greater demand. Recent methodological developments in this area have led to combined likelihood approaches, in which a log-likelihood comprised of the sum of the log-likelihood components of each data source is maximised. Often, these approaches make use of at least one presence-only data set and use the log-likelihood of an inhomogeneous Poisson point process model in the combined likelihood construction. While these advancements have been shown to improve predictive performance, they do not currently address challenges in presence-only modelling such as checking and correcting for violations of the independence assumption of a Poisson point process model or more general challenges in species distribution modelling such as overfitting.In this paper, we present an extension of the combined likelihood frame-work which accommodates alternative presence-only likelihoods in the presence of spatial dependence as well as lasso-type penalties to account for potential overfitting. We compare the proposed combined penalised likelihood approach to the standard combined likelihood approach via simulation and apply the method to modelling the distribution of the Eurasian lynx in the Jura Mountains in eastern France.The simulations show that the proposed combined penalised likelihood approach has better predictive performance than the standard approach when spatial dependence is present in the data. The lynx analysis shows that the predicted maps vary significantly between the model fitted with the proposed combined penalised approach accounting for spatial dependence and the model fitted with the standard combined likelihood.This work highlights the benefits of careful consideration of the presence-only components of the combined likelihood formulation, and allows greater flexibility and ability to accommodate real datasets.

Download Full-text

Combining multiple data sources in species distribution models while accounting for spatial dependence and overfitting with combined penalized likelihood maximization

Methods in Ecology and Evolution ◽

10.1111/2041-210x.13297 ◽

2019 ◽

Vol 10 (12) ◽

pp. 2118-2128

Author(s):

Ian W. Renner ◽

Julie Louvrier ◽

Olivier Gimenez

Keyword(s):

Species Distribution ◽

Spatial Dependence ◽

Species Distribution Models ◽

Penalized Likelihood ◽

Data Sources ◽

Distribution Models ◽

Multiple Data Sources ◽

Multiple Data

Download Full-text

Models for demography of plant populations

10.1093/oxfordhb/9780198703174.013.17 ◽

2018 ◽

Author(s):

Alan Gelfand ◽

Sujit K. Sahu

Keyword(s):

Process Model ◽

Data Sets ◽

Plant Populations ◽

Potential Growth ◽

Individual Tree ◽

Complex Information ◽

Demographic Rates ◽

Multiple Data ◽

Multiple Data Sets ◽

Indirect Information

This article discusses the use of Bayesian analysis and methods to analyse the demography of plant populations, and more specifically to estimate the demographic rates of trees and how they respond to environmental variation. It examines data from individual (tree) measurements over an eighteen-year period, including diameter, crown area, maturation status, and survival, and from seed traps, which provide indirect information on fecundity. The multiple data sets are synthesized with a process model where each individual is represented by a multivariate state-space submodel for both continuous (fecundity potential, growth rate, mortality risk, maturation probability) and discrete states (maturation status). The results from plant population demography analysis demonstrate the utility of hierarchical modelling as a mechanism for the synthesis of complex information and interactions.

Download Full-text

blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models

10.1101/357798 ◽

2018 ◽

Cited By ~ 3

Author(s):

Roozbeh Valavi ◽

Jane Elith ◽

José J. Lahoz-Monfort ◽

Gurutzeta Guillera-Arroita

Keyword(s):

Species Distribution ◽

Cross Validation ◽

Species Distribution Models ◽

Predictive Performance ◽

R Package ◽

Species Distribution Modelling ◽

List Type ◽

Distribution Models ◽

Distribution Modelling ◽

Evaluation Approaches

SummaryWhen applied to structured data, conventional random cross-validation techniques can lead to underestimation of prediction error, and may result in inappropriate model selection.We present the R package blockCV, a new toolbox for cross-validation of species distribution modelling.The package can generate spatially or environmentally separated folds. It includes tools to measure spatial autocorrelation ranges in candidate covariates, providing the user with insights into the spatial structure in these data. It also offers interactive graphical capabilities for creating spatial blocks and exploring data folds.Package blockCV enables modellers to more easily implement a range of evaluation approaches. It will help the modelling community learn more about the impacts of evaluation approaches on our understanding of predictive performance of species distribution models.

Download Full-text

A MATLAB PROGRAM FOR THE ANALYSIS AND INTERPRETATION OF TRANSIENT ELECTROMAGNETIC SOUNDING DATA

Bulletin of the Geological Society of Greece ◽

10.12681/bgsg.11385 ◽

2017 ◽

Vol 43 (4) ◽

pp. 1941

Author(s):

Andreas Tzanis

Keyword(s):

Data Base ◽

Process Model ◽

Data Sets ◽

Transient Electromagnetic ◽

Multiple Data ◽

Multiple Data Sets ◽

The University ◽

And Inversion ◽

2D Resistivity ◽

D Model

Herein we present a software system, written in MATLAB, to interpret TEM sounding data. The program, dubbed maTEM, is designed to process, model and invert multiple soundings, either individually, or simultaneously along profiles. The latter capability allows for laterally constrained inversion, so as to generate pseudo-2D or 2D resistivity sections based on the program EM1DINV v2.13 by the Hydro-Geophysics Group of the University of Aarhus, Denmark. Using maTEM, the analyst may import and display data multiple data sets, denoise and smooth the data, perform approximate inversions, design 1-D model(s) graphically, perform forward modelling and inversion and generate/ update data base in which to store the results. Finally, the analyst may use the data base to create 2-D and 3-D displays of the geoelectric structure with built-in graphical functions. maTEM is highly modular so that additional functions can be added at any time, at minimal programming cost. Although the software presented herein is focused on the analysis of TEM data, the maTEM concept has been designed ready to incorporate additional electrical and EM geophysical sounding methods and to mutually constrained analysis of different geophysical data sets.

Download Full-text

Incorporating uncertainty in predictive species distribution modelling

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2011.0178 ◽

2012 ◽

Vol 367 (1586) ◽

pp. 247-258 ◽

Cited By ~ 143

Author(s):

Colin M. Beale ◽

Jack J. Lennon

Keyword(s):

Species Distribution ◽

Goodness Of Fit ◽

Spatial Scales ◽

Predictive Performance ◽

Species Distribution Modelling ◽

Control Measures ◽

Distribution Model ◽

Distribution Models ◽

Distribution Modelling ◽

Demographic Models

Motivated by the need to solve ecological problems (climate change, habitat fragmentation and biological invasions), there has been increasing interest in species distribution models (SDMs). Predictions from these models inform conservation policy, invasive species management and disease-control measures. However, predictions are subject to uncertainty, the degree and source of which is often unrecognized. Here, we review the SDM literature in the context of uncertainty, focusing on three main classes of SDM: niche-based models, demographic models and process-based models. We identify sources of uncertainty for each class and discuss how uncertainty can be minimized or included in the modelling process to give realistic measures of confidence around predictions. Because this has typically not been performed, we conclude that uncertainty in SDMs has often been underestimated and a false precision assigned to predictions of geographical distribution. We identify areas where development of new statistical tools will improve predictions from distribution models, notably the development of hierarchical models that link different types of distribution model and their attendant uncertainties across spatial scales. Finally, we discuss the need to develop more defensible methods for assessing predictive performance, quantifying model goodness-of-fit and for assessing the significance of model covariates.

Download Full-text

Heterogeneous Effects of the De Jure and De Facto Business Environment: Findings from Multiple Data Sets on the Business Environment

10.1596/1813-9450-9115 ◽

2020 ◽

Author(s):

Christine Zhenwei Qiang ◽

He Wang ◽

L. Colin Xu

Keyword(s):

Business Environment ◽

Data Sets ◽

Multiple Data ◽

Heterogeneous Effects ◽

Multiple Data Sets

Download Full-text

Faculty Opinions recommendation of Marine species distribution modelling and the effects of genetic isolation under climate change.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.733823160.793549530 ◽

2018 ◽

Author(s):

Andrew Baird ◽

Joana Figueiredo

Keyword(s):

Climate Change ◽

Species Distribution ◽

Marine Species ◽

Species Distribution Modelling ◽

Genetic Isolation ◽

Distribution Modelling

Download Full-text

Multiple data parameter identification for nonlinear conceptual models

Water Science & Technology ◽

10.2166/wst.1997.0165 ◽

1997 ◽

Vol 36 (5) ◽

pp. 61-68 ◽

Cited By ~ 1

Author(s):

Hermann Eberl ◽

Amar Khelil ◽

Peter Wilderer

Keyword(s):

Optimization Problem ◽

Conceptual Models ◽

Reference Data ◽

Transport Model ◽

Calibration Method ◽

Data Sets ◽

Multiple Data ◽

Marquardt Algorithm ◽

Multicriteria Optimization Problem ◽

Higher Order Differential Equations

A numerical method for the identification of parameters of nonlinear higher order differential equations is presented, which is based on the Levenberg-Marquardt algorithm. The estimation of the parameters can be performed by using several reference data sets simultaneously. This leads to a multicriteria optimization problem, which will be treated by using the Pareto optimality concept. In this paper, the emphasis is put on the presentation of the calibration method. As an example identification of the parameters of a nonlinear hydrological transport model for urban runoff is included, but the method can be applied to other problems as well.

Download Full-text