Comparative performance of generalized additive models and boosted regression trees for statistical modeling of incidental catch of wahoo (Acanthocybium solandri) in the Mexican tuna purse-seine fishery

2012 ◽  
Vol 233 ◽  
pp. 20-25 ◽  
Author(s):  
Raul O. Martínez-Rincón ◽  
Sofía Ortega-García ◽  
Juan G. Vaca-Rodríguez
2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 1252.2-1253
Author(s):  
R. Garofoli ◽  
M. Resche-Rigon ◽  
M. Dougados ◽  
D. Van der Heijde ◽  
C. Roux ◽  
...  

Background:Axial spondyloarthritis (axSpA) is a chronic rheumatic disease that encompasses various clinical presentations: inflammatory chronic back pain, peripheral manifestations and extra-articular manifestations. The current nomenclature divides axSpA in radiographic (in the presence of radiographic sacroiliitis) and non-radiographic (in the absence of radiographic sacroiliitis, with or without MRI sacroiliitis. Given that the functional burden of the disease appears to be greater in patients with radiographic forms, it seems crucial to be able to predict which patients will be more likely to develop structural damage over time. Predictive factors for radiographic progression in axSpA have been identified through use of traditional statistical models like logistic regression. However, these models present some limitations. In order to overcome these limitations and to improve the predictive performance, machine learning (ML) methods have been developed.Objectives:To compare ML models to traditional models to predict radiographic progression in patients with early axSpA.Methods:Study design: prospective French multicentric cohort study (DESIR cohort) with 5years of follow-up. Patients: all patients included in the cohort, i.e. 708 patients with inflammatory back pain for >3 months but <3 years, highly suggestive of axSpA. Data on the first 5 years of follow-up was used. Statistical analyses: radiographic progression was defined as progression either at the spine (increase of at least 1 point per 2 years of mSASSS scores) or at the sacroiliac joint (worsening of at least one grade of the mNY score between 2 visits). Traditional modelling: we first performed a bivariate analysis between our outcome (radiographic progression) and explanatory variables at baseline to select the variables to be included in our models and then built a logistic regression model (M1). Variable selection for traditional models was performed with 2 different methods: stepwise selection based on Akaike Information Criterion (stepAIC) method (M2), and the Least Absolute Shrinkage and Selection Operator (LASSO) method (M3). We also performed sensitivity analysis on all patients with manual backward method (M4) after multiple imputation of missing data. Machine learning modelling: using the “SuperLearner” package on R, we modelled radiographic progression with stepAIC, LASSO, random forest, Discrete Bayesian Additive Regression Trees Samplers (DBARTS), Generalized Additive Models (GAM), multivariate adaptive polynomial spline regression (polymars), Recursive Partitioning And Regression Trees (RPART) and Super Learner. Finally, the accuracy of traditional and ML models was compared based on their 10-foldcross-validated AUC (cv-AUC).Results:10-fold cv-AUC for traditional models were 0.79 and 0.78 for M2 and M3, respectively. The 3 best models in the ML algorithm were the GAM, the DBARTS and the Super Learner models, with 10-fold cv-AUC of: 0.77, 0.76 and 0.74, respectively (Table 1).Table 1.Comparison of 10-fold cross-validated AUC between best traditional and machine learning models.Best modelsCross-validated AUCTraditional models M2 (step AIC method)0.79 M3 (LASSO method)0.78Machine learning approach SL Discrete Bayesian Additive Regression Trees Samplers (DBARTS)0.76 SL Generalized Additive Models (GAM)0.77 Super Learner0.74AUC: Area Under the Curve; AIC: Akaike Information Criterion; LASSO: Least Absolute Shrinkage and Selection Operator; SL: SuperLearner. N = 295.Conclusion:Traditional models predicted better radiographic progression than ML models in this early axSpA population. Further ML algorithms image-based or with other artificial intelligence methods (e.g. deep learning) might perform better than traditional models in this setting.Acknowledgments:Thanks to the French National Society of Rheumatology and the DESIR cohort.Disclosure of Interests:Romain Garofoli: None declared, Matthieu resche-rigon: None declared, Maxime Dougados Grant/research support from: AbbVie, Eli Lilly, Merck, Novartis, Pfizer and UCB Pharma, Consultant of: AbbVie, Eli Lilly, Merck, Novartis, Pfizer and UCB Pharma, Speakers bureau: AbbVie, Eli Lilly, Merck, Novartis, Pfizer and UCB Pharma, Désirée van der Heijde Consultant of: AbbVie, Amgen, Astellas, AstraZeneca, BMS, Boehringer Ingelheim, Celgene, Cyxone, Daiichi, Eisai, Eli-Lilly, Galapagos, Gilead Sciences, Inc., Glaxo-Smith-Kline, Janssen, Merck, Novartis, Pfizer, Regeneron, Roche, Sanofi, Takeda, UCB Pharma; Director of Imaging Rheumatology BV, Christian Roux: None declared, Anna Moltó Grant/research support from: Pfizer, UCB, Consultant of: Abbvie, BMS, MSD, Novartis, Pfizer, UCB


2021 ◽  
Vol 10 (5) ◽  
pp. 343
Author(s):  
Diana Sousa-Guedes ◽  
Marc Franch ◽  
Neftalí Sillero

Road networks are the main source of mortality for many species. Amphibians, which are in global decline, are the most road-killed fauna group, due to their activity patterns and preferred habitats. Many different methodologies have been applied in modeling the relationship between environment and road-kills events, such as logistic regression. Here, we compared the performance of five regression techniques to relate amphibians’ road-kill frequency to environmental variables. For this, we surveyed three country roads in northern Portugal in search of road-killed amphibians. To explain the presence of road-kills, we selected a set of environmental variables important for the presence of amphibians and the occurrence of road-kills. We compared the performances of five modeling techniques: (i) generalized linear models, (ii) generalized additive models, (iii) random forest, (iv) boosted regression trees, and (v) geographically weighted regression. The boosted regression trees and geographically weighted regression techniques performed the best, with a percentage of deviance explained between 61.8% and 76.6% and between 55.3% and 66.7%, respectively. Moreover, the geographically weighted regression showed a great advantage over the other techniques, as it allows mapping local parameter coefficients as well as local model performance (pseudo-R2). The results suggest that geographically weighted regression is a useful tool for road-kill modeling, as well as to better visualize and map the spatial variability of the models.


2019 ◽  
Author(s):  
Adam B. Smith ◽  
Maria J. Santos

AbstractModels of species’ distributions and niches are frequently used to infer the importance of range- and niche-defining variables. However, the degree to which these models can reliably identify important variables and quantify their influence remains unknown. Here we use a series of simulations to explore how well models can 1) discriminate between variables with different influence and 2) calibrate the magnitude of influence relative to an “omniscient” model. To quantify variable importance, we trained generalized additive models (GAMs), Maxent, and boosted regression trees (BRTs) on simulated data and tested their sensitivity to permutations in each predictor. Importance was inferred by calculating the correlation between permuted and unpermuted predictions, and by comparing predictive accuracy of permuted and unpermuted predictions using AUC and the Continuous Boyce Index. In scenarios with one influential and one uninfluential variable, models were unable to discriminate reliably between variables in conditions that are normally challenging for generating accurate predictions: training occurrences <8-64; prevalence >0.5; small spatial extent; environmental data with coarse resolution when spatial autocorrelation is low; and correlation between environmental variables where |r| >0.7. When two variables influenced the distribution equally, importance was underestimated when species had narrow or intermediate niche breadth. Interactions between variables in how they shaped the niche did not affect inferences about their importance. When variables acted unequally, the effect of the stronger variable was overestimated. GAMs and Maxent discriminated between variables more reliably than BRTs, but no algorithm was consistently well-calibrated vis-à-vis the omniscient model. Algorithm-specific measures of importance like Maxent’s change-in-gain metric were less robust than the permutation test. Overall, high predictive accuracy did not connote robust inferential capacity. As a result, requirements for reliably measuring variable importance are likely more stringent than for creating models with high predictive accuracy.


2019 ◽  
Author(s):  
Duarte S. Viana ◽  
Petr Keil ◽  
Alienor Jeliazkov

AbstractCommunity ecologists and macroecologists have long sought to evaluate the importance of environmental conditions and other drivers in determining species composition across sites. Different methods have been used to estimate species-environment relationships while accounting for or partitioning the variation attributed to environment and spatial autocorrelation, but their differences and respective reliability remain poorly known. We compared the performance of four families of statistical methods in estimating the contribution of the environment and space to explain variation in multi-species occurrence and abundance. These methods included distance-based regression (MRM), constrained ordination (RDA and CCA), generalised linear and additive models (GLM, GAM), and treebased machine learning (regression trees, boosted regression trees, and random forests). Depending on the method, the spatial model consisted of either Moran’s Eigenvector Maps (MEM; in constrained ordination and GLM), smooth spatial splines (in GAM), or tree-based non-linear modelling of spatial coordinates (in machine learning). We simulated typical ecological data to assess the methods’ performance in (1) fitting environmental and spatial effects, and (2) partitioning the variation explained by the environmental and spatial effects. Differences in the fitting performance among major model types – (G)LM, GAM, machine learning – were reflected in the variation partitioning performance of the different methods. Machine learning methods, namely boosted regression trees, performed overall better. GAM performed similarly well, though likelihood optimisation did not converge for some empirical test data. The remaining methods performed worse under most simulated data variations (depending on the type of species data, sample size and coverage, autocorrelation range, and response shape). Our results suggest that tree-based machine learning is a robust approach that can be widely used for variation partitioning. Our recommendations apply to single-species niche models, community ecology, and macroecology studies aiming at disentangling the relative contributions of space vs. environment and other drivers of variation in site-by-species matrices.


2020 ◽  
Vol 653 ◽  
pp. 105-119
Author(s):  
J Hilliard ◽  
D Karlen ◽  
T Dix ◽  
S Markham ◽  
A Schulze

Capitellid polychaetes are ubiquitous throughout the world’s oceans and are often encountered in high abundance. We used an extensive dataset of species abundance and distribution records of the Capitella capitata complex, C. aciculata, C. jonesi, Heteromastus filiformis, Mediomastus ambiseta, and M. californiensis from Tampa Bay, Florida, USA, as a model system of closely related species filling a similar ecological niche. We sought to (1) characterize the spatial distribution of each species, (2) determine if a single species abundance modeling strategy could be applied to them all, and (3) assess environmental drivers of species distribution and abundance. We found that all species had a zero-inflated abundance distribution and there was spatial autocorrelation by bay regions. Lorenz curves were an effective tool to assess spatial patterns of species abundance across large areas. Bay segment, depth, and dissolved oxygen were the most important environmental drivers. Modeling was accomplished by comparing 6 different approaches: 4 generalized additive models (GAMs: Poisson, negative binomial, Tweedie, and zero-inflated Poisson distributions), hurdle models, and boosted regression trees. There was no single model with top performance for every species. However, GAM-Tweedie and hurdle models performed well overall and may be useful for studies of other benthic marine invertebrates.


Sign in / Sign up

Export Citation Format

Share Document