Integrating citizen science data with expert surveys increases accuracy and spatial extent of species distribution models

Mapping Intimacies ◽

10.1101/806547 ◽

2019 ◽

Cited By ~ 2

Author(s):

O.J. Robinson ◽

V. Ruiz-Gutierrez ◽

M.D. Reynolds ◽

G.H. Golet ◽

M. Strimas-Mackey ◽

...

Keyword(s):

Survey Data ◽

Citizen Science ◽

Species Distribution ◽

Central Valley ◽

Habitat Associations ◽

Biological Information ◽

Ecological Knowledge ◽

Science Data ◽

Trade Offs ◽

Wide Range

AbstractInformation on species’ habitat associations and distributions, across a wide range of spatial and temporal scales, are a fundamental source of ecological knowledge. However, collecting biological information at relevant scales if often cost prohibitive, although it is essential for framing the broader context of more focused research and conservation efforts. Citizen-science data has been signaled as an increasingly important source of biological information needed to fill in data gaps needed to make more comprehensive and robust inferences on species distributions. However, there are perceived trade-offs of combining highly structured, scientific survey data with largely unstructured, citizen-science data. As a result, the focus of most methodological advances to combine these sources of information has been on treating these sources as independent. The degree to which each source of information is allowed to directly inform a common underlying process (e.g. species distribution) depends on the perceived quality of the data. In this paper, we explore these trade-offs by applying a simplified approach of filtering citizen-science data to resemble structured survey data, and analyze both sources of data under a common framework. To accomplish this, we explored ways of integrating high-resolution survey data on shorebirds in the northern Central Valley of California with observations in eBird for the entire region that were filtered to improve their quality. The integration of survey data with the filtered citizen-science data in eBird resulted in improved inference and predictive ability, and increased the extent and accuracy of inferences on shorebirds for the Central Valley. The structured surveys were found to improve the overall accuracy of ecological inference based only on citizen-science data, by increasing the representation of data collected from high quality habitats for shorebirds (e.g. rice fields). The practical approach we have shown for data integration can be also be used to improve the efficiency of designing biological surveys in the context of larger, citizen-science monitoring efforts, ultimately reducing the financial and time expenditures typically required of monitoring programs and focused research. The simple processing and filtering method we present can be used to integrate other types of data (e.g. camera traps) with more localized efforts (e.g. research projects), ultimately improving our ecological knowledge on the distribution and habitat associations of species of conservation concern worldwide.

Download Full-text

Analytical guidelines to increase the value of citizen science data: using eBird data to estimate species occurrence

10.1101/574392 ◽

2019 ◽

Cited By ~ 8

Author(s):

A Johnston ◽

WM Hochachka ◽

ME Strimas-Mackey ◽

V Ruiz Gutierrez ◽

OJ Robinson ◽

...

Keyword(s):

Data Processing ◽

Citizen Science ◽

Model Performance ◽

Species Distributions ◽

Ecological Knowledge ◽

Sample Sizes ◽

Science Data ◽

Science Projects ◽

Wide Range ◽

The Impact

AbstractCitizen science data are valuable for addressing a wide range of ecological research questions, and there has been a rapid increase in the scope and volume of data available. However, data from large-scale citizen science projects typically present a number of challenges that can inhibit robust ecological inferences. These challenges include: species bias, spatial bias, and variation in effort.To demonstrate addressing key challenges in analysing citizen science data, we use the example of estimating species distributions with data from eBird, a large semi-structured citizen science project. We estimate two widely applied metrics of species distributions: encounter rate and occupancy probability. For each metric, we assess the impact of data processing steps that either degrade or refine the data used in the analyses. We also test whether differences in model performance are maintained at different sample sizes.Model performance improved when data processing and analytical methods addressed the challenges arising from citizen science data. The largest gains in model performance were achieved with: 1) the use of complete checklists (where observers report all the species they detect and identify); and 2) the use of covariates describing variation in effort and detectability for each checklist. Occupancy models were more robust to a lack of complete checklists and effort variables. Improvements in model performance with data refinement were more evident with larger sample sizes.Here, we describe processes to refine semi-structured citizen science data to estimate species distributions. We demonstrate the value of complete checklists, which can inform the design and adaptation of citizen science projects. We also demonstrate the value of information on effort. The methods we have outlined are also likely to improve other forms of inference, and will enable researchers to conduct robust analyses and harness the vast ecological knowledge that exists within citizen science data.

Download Full-text

Using citizen science data to monitor the Sustainable Development Goals: a bottom-up analysis

Sustainability Science ◽

10.1007/s11625-021-01001-1 ◽

2021 ◽

Author(s):

Laura Ballerini ◽

Sylvia I. Bergh

Keyword(s):

Sustainable Development ◽

Citizen Science ◽

Sustainable Development Goals ◽

Comparative Case Study ◽

Science Data ◽

The Sustainable Development ◽

Case Study Analysis ◽

Wide Range ◽

Development Goals

AbstractOfficial data are not sufficient for monitoring the United Nations Sustainable Development Goals (SDGs): they do not reach remote locations or marginalized populations and can be manipulated by governments. Citizen science data (CSD), defined as data that citizens voluntarily gather by employing a wide range of technologies and methodologies, could help to tackle these problems and ultimately improve SDG monitoring. However, the link between CSD and the SDGs is still understudied. This article aims to develop an empirical understanding of the CSD-SDG link by focusing on the perspective of projects which employ CSD. Specifically, the article presents primary and secondary qualitative data collected on 30 of these projects and an explorative comparative case study analysis. It finds that projects which use CSD recognize that the SDGs can provide a valuable framework and legitimacy, as well as attract funding, visibility, and partnerships. But, at the same time, the article reveals that these projects also encounter several barriers with respect to the SDGs: a widespread lack of knowledge of the goals, combined with frustration and political resistance towards the UN, may deter these projects from contributing their data to the SDG monitoring apparatus.

Download Full-text

The impact of data quality filtering of opportunistic citizen science data on species distribution model performance

Ecological Modelling ◽

10.1016/j.ecolmodel.2021.109453 ◽

2021 ◽

Vol 444 ◽

pp. 109453

Author(s):

Camille Van Eupen ◽

Dirk Maes ◽

Marc Herremans ◽

Kristijn R.R. Swinnen ◽

Ben Somers ◽

...

Keyword(s):

Data Quality ◽

Citizen Science ◽

Species Distribution ◽

Species Distribution Model ◽

Model Performance ◽

Distribution Model ◽

Science Data ◽

Quality Filtering ◽

The Impact

Download Full-text

Can citizen science analysis of camera trap data be used to study reproduction? Lessons from Snapshot Serengeti program

10.1101/2020.11.30.400804 ◽

2020 ◽

Author(s):

Thel Lucie ◽

Chamaillé-Jammes Simon ◽

Keurinck Léa ◽

Catala Maxime ◽

Packer Craig ◽

...

Keyword(s):

Citizen Science ◽

Camera Trap ◽

Camera Traps ◽

Life History Trait ◽

List Type ◽

Breeding Phenology ◽

Science Data ◽

Morphological Criteria ◽

Wide Range ◽

Trained Observers

AbstractEcologists increasingly rely on camera trap data to estimate a wide range of biological parameters such as occupancy, population abundance or activity patterns. Because of the huge amount of data collected, the assistance of non-scientists is often sought after, but an assessment of the data quality is a prerequisite to their use.We tested whether citizen science data from one of the largest citizen science projects - Snapshot Serengeti - could be used to study breeding phenology, an important life-history trait. In particular, we tested whether the presence of juveniles (less than one or 12 months old) of three ungulate species in the Serengeti: topi Damaliscus jimela, kongoni Alcelaphus buselaphus and Grant’s gazelle Nanger granti could be reliably detected by the “naive” volunteers vs. trained observers. We expected a positive correlation between the proportion of volunteers identifying juveniles and their effective presence within photographs, assessed by the trained observers.We first checked the agreement between the trained observers for age classes and species and found a good agreement between them (Fleiss’ κ > 0.61 for juveniles of less than one and 12 month(s) old), suggesting that morphological criteria can be used successfully to determine age. The relationship between the proportion of volunteers detecting juveniles less than a month old and their actual presence plateaued at 0.45 for Grant’s gazelle and reached 0.70 for topi and 0.56 for kongoni. The same relationships were however much stronger for juveniles younger than 12 months, to the point that their presence was perfectly detected by volunteers for topi and kongoni.Volunteers’ classification allows a rough, moderately accurate, but quick, sorting of photograph sequences with/without juveniles. Obtaining accurate data however appears more difficult. We discuss the limitations of using citizen science camera traps data to study breeding phenology, and the options to improve the detection of juveniles, such as the addition of aging criteria on the online citizen science platforms, or the use of machine learning.

Download Full-text

Decision letter for "Spatial thinning and class balancing: key choices lead to variation in the performance of species distribution models with citizen science data"

10.1111/2041-210x.13525/v2/decision1 ◽

2020 ◽

Keyword(s):

Citizen Science ◽

Species Distribution ◽

Species Distribution Models ◽

Distribution Models ◽

Science Data

Download Full-text

Review for "Spatial thinning and class balancing: key choices lead to variation in the performance of species distribution models with citizen science data"

10.1111/2041-210x.13525/v1/review1 ◽

2020 ◽

Keyword(s):

Citizen Science ◽

Species Distribution ◽

Species Distribution Models ◽

Distribution Models ◽

Science Data

Download Full-text

Rapid assessment of the suitability of multi-species citizen science datasets for occupancy trend analysis

10.1101/813626 ◽

2019 ◽

Cited By ~ 1

Author(s):

Michael J.O. Pocock ◽

Mark W. Logie ◽

Nick J.B. Isaac ◽

Charlotte L. Outhwaite ◽

Tom August

Keyword(s):

Great Britain ◽

Citizen Science ◽

Expert Elicitation ◽

List Type ◽

Occupancy Models ◽

Science Data ◽

Focal Species ◽

Wide Range ◽

Simple Rules ◽

Taxonomic Groups

AbstractSpecies records from volunteers are a vast and valuable source of information on biodiversity for a wide range of taxonomic groups. Although these citizen science data are opportunistic and unstructured, occupancy analysis can be used to quantify trends in distribution. However, occupancy analysis of unstructured data can be resource-intensive and requires substantial expertise. It is valuable to have simple ‘rules of thumb’ to efficiently assess the suitability of a dataset for occupancy analysis prior to analysis.Our analysis was possible due to the production of trends, from our Bayesian occupancy analysis, for 10 967 species from 34 multi-species recording schemes in Great Britain. These schemes had an average of 500 visits to sites per year, and an average of 20% of visited sites received a revisit in a year. Occupancy trend outputs varied in their precision and we used expert elicitation on a subset of outputs to determine a precision threshold above which trends were suitable for further consideration. We then used classification trees with seven metrics to define simple rules explaining when the data would result in outputs that met the precision threshold.We found that the suitability of a species’ data was best described by (i) the number of records of the focal species in the 10% best-recorded years, and (ii) the proportion of recording visits for that taxonomic group with non-detections of the focal species. Surprisingly few data were required to be predicted to meet the precision threshold. Specifically, for 98% confidence that our Bayesian occupancy models would produce outputs meeting the precision threshold, there needed to be ≥29 records of the focal species in the 10% best-recorded years (equivalent to an average of 12.5 records per year in our dataset), although only ≥10 records (equivalent to 4.5 records per year) were required for species recorded in less than 1 in 25 visits.We applied these rules to regional species data for Great Britain. Data from 32% of the species:region combinations met the precision threshold with 80% confidence, and 14% with 98% confidence. There was great variation between taxonomic groups (e.g. butterflies, moths and dragonflies were well recorded) and region (e.g. south-east England was best recorded).These simple criteria provide no indication of the accuracy or representativeness of the trend outputs: this is vital, but needs to be assessed individually. However our criteria do provide a rapid, quantitative assessment of the predicted suitability of existing data for occupancy analysis and could be used to inform the design and implementation of multi-species citizen science recording projects elsewhere in the world.

Download Full-text