scholarly journals Combining multiple data sources in species distribution models while accounting for spatial dependence and overfitting with combined penalised likelihood maximisation

2019 ◽  
Author(s):  
Ian W. Renner ◽  
Julie Louvrier ◽  
Olivier Gimenez

SummaryThe increase in availability of species data sets means that approaches to species distribution modelling that incorporate multiple data sets are in greater demand. Recent methodological developments in this area have led to combined likelihood approaches, in which a log-likelihood comprised of the sum of the log-likelihood components of each data source is maximised. Often, these approaches make use of at least one presence-only data set and use the log-likelihood of an inhomogeneous Poisson point process model in the combined likelihood construction. While these advancements have been shown to improve predictive performance, they do not currently address challenges in presence-only modelling such as checking and correcting for violations of the independence assumption of a Poisson point process model or more general challenges in species distribution modelling such as overfitting.In this paper, we present an extension of the combined likelihood frame-work which accommodates alternative presence-only likelihoods in the presence of spatial dependence as well as lasso-type penalties to account for potential overfitting. We compare the proposed combined penalised likelihood approach to the standard combined likelihood approach via simulation and apply the method to modelling the distribution of the Eurasian lynx in the Jura Mountains in eastern France.The simulations show that the proposed combined penalised likelihood approach has better predictive performance than the standard approach when spatial dependence is present in the data. The lynx analysis shows that the predicted maps vary significantly between the model fitted with the proposed combined penalised approach accounting for spatial dependence and the model fitted with the standard combined likelihood.This work highlights the benefits of careful consideration of the presence-only components of the combined likelihood formulation, and allows greater flexibility and ability to accommodate real datasets.

Author(s):  
Alan Gelfand ◽  
Sujit K. Sahu

This article discusses the use of Bayesian analysis and methods to analyse the demography of plant populations, and more specifically to estimate the demographic rates of trees and how they respond to environmental variation. It examines data from individual (tree) measurements over an eighteen-year period, including diameter, crown area, maturation status, and survival, and from seed traps, which provide indirect information on fecundity. The multiple data sets are synthesized with a process model where each individual is represented by a multivariate state-space submodel for both continuous (fecundity potential, growth rate, mortality risk, maturation probability) and discrete states (maturation status). The results from plant population demography analysis demonstrate the utility of hierarchical modelling as a mechanism for the synthesis of complex information and interactions.


2018 ◽  
Author(s):  
Roozbeh Valavi ◽  
Jane Elith ◽  
José J. Lahoz-Monfort ◽  
Gurutzeta Guillera-Arroita

SummaryWhen applied to structured data, conventional random cross-validation techniques can lead to underestimation of prediction error, and may result in inappropriate model selection.We present the R package blockCV, a new toolbox for cross-validation of species distribution modelling.The package can generate spatially or environmentally separated folds. It includes tools to measure spatial autocorrelation ranges in candidate covariates, providing the user with insights into the spatial structure in these data. It also offers interactive graphical capabilities for creating spatial blocks and exploring data folds.Package blockCV enables modellers to more easily implement a range of evaluation approaches. It will help the modelling community learn more about the impacts of evaluation approaches on our understanding of predictive performance of species distribution models.


2017 ◽  
Vol 43 (4) ◽  
pp. 1941
Author(s):  
Andreas Tzanis

Herein we present a software system, written in MATLAB, to interpret TEM sounding data. The program, dubbed maTEM, is designed to process, model and invert multiple soundings, either individually, or simultaneously along profiles. The latter capability allows for laterally constrained inversion, so as to generate pseudo-2D or 2D resistivity sections based on the program EM1DINV v2.13 by the Hydro-Geophysics Group of the University of Aarhus, Denmark. Using maTEM, the analyst may import and display data multiple data sets, denoise and smooth the data, perform approximate inversions, design 1-D model(s) graphically, perform forward modelling and inversion and generate/ update data base in which to store the results. Finally, the analyst may use the data base to create 2-D and 3-D displays of the geoelectric structure with built-in graphical functions. maTEM is highly modular so that additional functions can be added at any time, at minimal programming cost. Although the software presented herein is focused on the analysis of TEM data, the maTEM concept has been designed ready to incorporate additional electrical and EM geophysical sounding methods and to mutually constrained analysis of different geophysical data sets.


2012 ◽  
Vol 367 (1586) ◽  
pp. 247-258 ◽  
Author(s):  
Colin M. Beale ◽  
Jack J. Lennon

Motivated by the need to solve ecological problems (climate change, habitat fragmentation and biological invasions), there has been increasing interest in species distribution models (SDMs). Predictions from these models inform conservation policy, invasive species management and disease-control measures. However, predictions are subject to uncertainty, the degree and source of which is often unrecognized. Here, we review the SDM literature in the context of uncertainty, focusing on three main classes of SDM: niche-based models, demographic models and process-based models. We identify sources of uncertainty for each class and discuss how uncertainty can be minimized or included in the modelling process to give realistic measures of confidence around predictions. Because this has typically not been performed, we conclude that uncertainty in SDMs has often been underestimated and a false precision assigned to predictions of geographical distribution. We identify areas where development of new statistical tools will improve predictions from distribution models, notably the development of hierarchical models that link different types of distribution model and their attendant uncertainties across spatial scales. Finally, we discuss the need to develop more defensible methods for assessing predictive performance, quantifying model goodness-of-fit and for assessing the significance of model covariates.


1997 ◽  
Vol 36 (5) ◽  
pp. 61-68 ◽  
Author(s):  
Hermann Eberl ◽  
Amar Khelil ◽  
Peter Wilderer

A numerical method for the identification of parameters of nonlinear higher order differential equations is presented, which is based on the Levenberg-Marquardt algorithm. The estimation of the parameters can be performed by using several reference data sets simultaneously. This leads to a multicriteria optimization problem, which will be treated by using the Pareto optimality concept. In this paper, the emphasis is put on the presentation of the calibration method. As an example identification of the parameters of a nonlinear hydrological transport model for urban runoff is included, but the method can be applied to other problems as well.


Sign in / Sign up

Export Citation Format

Share Document