Standardizing Biologging Data for LifeWatch: Camera Traps, Acoustic Telemetry and GPS Tracking

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37413 ◽

2019 ◽

Vol 3 ◽

Author(s):

Peter Desmet ◽

Stijn Van Hoey ◽

Lien Reyserhove ◽

Dimitri Brosens ◽

Damiano Oldoni ◽

...

Keyword(s):

Interest Group ◽

Open Data ◽

Camera Trap ◽

Camera Traps ◽

Gps Tracking ◽

Observation Data ◽

Global Biodiversity Information Facility ◽

Darwin Core ◽

Acoustic Receiver ◽

Biodiversity Information

The Research Institute for Nature and Forest (INBO) is co-managing three biologging networks as part of a terrestrial and freshwater observatory for LifeWatch Belgium. The networks are a GPS tracking network for large birds, an acoustic receiver network for fish, and a camera trap network for mammals. As part of our mission at the Open science lab for biodiversity, we are publishing the machine observations these networks generate as standardized, open data. One of the challenges however, is finding the appropriate standards and platforms to do so. In this talk, we will present the three networks, the type of biologging data they collect and how we (plan to) standardize these to specific community standards and to Darwin Core (Wieczorek et al. 2012). Data from the bird tracking network have been published in 2014 as one of the first biologging datasets on the Global Biodiversity Information Facility (GBIF) (Stienen et al. 2014). We are now planning to upload the data to Movebank instead and contribute to a generic mapping between the Movebank format and Darwin Core. Data from the acoustic receiver network are being mapped using the Darwin Core guidelines proposed by the Machine Observations Interest Group of Biodiversity Information Standards (TDWG). Images generated by the camera trap network are managed in the annotation system Agouti, for which we plan to export the data in the Camera Trap Metadata Language (Forrester et al. 2016). We also aim to write a software package to deposit camera trap images and data on Zenodo and map the observation data to Darwin Core. We hope that our work will contribute to discussions and guidelines on how to best map biologging data to Darwin Core, which is one of the aims of the Machine Observations Interest Group of Biodiversity Information Standards (TDWG).

Download Full-text

Best practices for connecting genetic records with specimen data

Biodiversity Information Science and Standards ◽

10.3897/biss.2.26369 ◽

2018 ◽

Vol 2 ◽

pp. e26369

Author(s):

Michael Trizna

Keyword(s):

Best Practices ◽

Good News ◽

Sequencing Technology ◽

Global Biodiversity Information Facility ◽

A Value ◽

Darwin Core ◽

High Quality Sequence ◽

Voucher Specimens ◽

Tight Connection ◽

Biodiversity Information

As rapid advances in sequencing technology result in more branches of the tree of life being illuminated, there has actually been a decrease in the percentage of sequence records that are backed by voucher specimens Trizna 2018b. The good news is that there are tools Trizna (2017), NCBI (2005), Biocode LLC (2014) to enable well-databased museum vouchers to automatically validate and format specimen and collection metadata for high quality sequence records. Another problem is that there are millions of existing sequence records that are known to contain either incorrect or incomplete specimen data. I will show an end-to-end example of sequencing specimens from a museum, depositing their sequence records in NCBI's (National Center for Biotechnology Information) GenBank database, and then providing updates to GenBank as the museum database revises identifications. I will also talk about linking records from specimen databases as well. Over one million records in the Global Biodiversity Information Facility (GBIF) Trizna (2018a) contain a value in the Darwin Core term "associatedSequences", and I will examine what is currently contained in these entries, and how best to format them to ensure that a tight connection is made to sequence records.

Download Full-text

An audit of some processing effects in aggregated occurrence records

ZooKeys ◽

10.3897/zookeys.751.24791 ◽

2018 ◽

Vol 751 ◽

pp. 129-146 ◽

Cited By ~ 7

Author(s):

Robert Mesibov

Keyword(s):

Data Loss ◽

Global Biodiversity Information Facility ◽

Australian Museum ◽

Darwin Core ◽

Species Groups ◽

Processing Effects ◽

Global Biodiversity ◽

Name Changes ◽

Biodiversity Information ◽

Occurrence Records

A total of ca 800,000 occurrence records from the Australian Museum (AM), Museums Victoria (MV) and the New Zealand Arthropod Collection (NZAC) were audited for changes in selected Darwin Core fields after processing by the Atlas of Living Australia (ALA; for AM and MV records) and the Global Biodiversity Information Facility (GBIF; for AM, MV and NZAC records). Formal taxon names in the genus- and species-groups were changed in 13–21% of AM and MV records, depending on dataset and aggregator. There was little agreement between the two aggregators on processed names, with names changed in two to three times as many records by one aggregator alone compared to records with names changed by both aggregators. The type status of specimen records did not change with name changes, resulting in confusion as to the name with which a type was associated. Data losses of up to 100% were found after processing in some fields, apparently due to programming errors. The taxonomic usefulness of occurrence records could be improved if aggregators included both original and the processed taxonomic data items for each record. It is recommended that end-users check original and processed records for data loss and name replacements after processing by aggregators.

Download Full-text

Biodiversity Information Services: A (not-so-) little knowledge that acts

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25738 ◽

2018 ◽

Vol 2 ◽

pp. e25738 ◽

Cited By ~ 1

Author(s):

Arturo Ariño ◽

Daniel Noesgaard ◽

Angel Hjarding ◽

Dmitry Schigel

Keyword(s):

Critical Mass ◽

The Body ◽

Global Biodiversity Information Facility ◽

Entire List ◽

Darwin Core ◽

Biogeographical Regions ◽

Continuous Presence ◽

Set Up ◽

Taxonomic Groups ◽

Biodiversity Information

Standards set up by Biodiversity Information Standards-Taxonomic Databases Working Group (TDWG), initially developed as a way to share taxonomical data, greatly facilitated the establishment of the Global Biodiversity Information Facility (GBIF) as the largest index to digitally-accessible primary biodiversity information records (PBR) held by many institutions around the world. The level of detail and coverage of the body of standards that later became the Darwin Core terms enabled increasingly precise retrieval of relevant records useful for increased digitally-accessible knowledge (DAK) which, in turn, may have helped to solve ecologically-relevant questions. After more than a decade of data accrual and release, an increasing number of papers and reports are citing GBIF either as a source of data or as a pointer to the original datasets. GBIF has curated a list of over 5,000 citations that were examined for contents, and to which tags were applied describing such contents as additional keywords. The list now provides a window on what users want to accomplish using such DAK. We performed a preliminary word frequency analysis of this literature, starting at titles, which refers to GBIF as a resource. Through a standardization and mapping of terms, we examined how the facility-enabled data seem to have been used by scientists and other practitioners through time: what concepts/issues are pervasive, which taxon groups are mostly addressed, and whether data concentrate around specific geographical or biogeographical regions. We hoped to cast light on which types of ecological problems the community believes are amenable to study through the judicious use of this data commons and found that, indeed, a few themes were distinctly more frequently mentioned than others. Among those, generally-perceived issues such as climate change and its effect on biodiversity at global and regional scales seemed prevalent. The taxonomic groups were also unevenly mentioned, with birds and plants being the most frequently named. However, the entire list of potential subjects that might have used GBIF-enabled data is now quite wide, showing that the availability of well-structured data has spawned a widening spectrum of possible use cases. Among them, some enjoy early and continuous presence (e.g. species, biodiversity, climate) while others have started to show up only later, once a critical mass of data seemed to have been attained (e.g. ecosystems, suitability, endemism). Biodiversity information in the form of standards-compliant DAK may thus already have become a commodity enabling insight into an increasingly more complex and diverse body of science. Paraphrasing Tennyson, more things were wrought by data than TDWG dreamt of.

Download Full-text

Data Location Quality at GBIF

Biodiversity Information Science and Standards ◽

10.3897/biss.3.35829 ◽

2019 ◽

Vol 3 ◽

Author(s):

John Waller

Keyword(s):

Data Quality ◽

Open Data ◽

R Package ◽

Large Network ◽

Global Biodiversity Information Facility ◽

Quality Issue ◽

Data Portal ◽

Quality Issues ◽

Data Location ◽

Biodiversity Information

I will cover how the Global Biodiversity Information Facility (GBIF) handles data quality issues, with specific focus on coordinate location issues, such as gridded datasets (Fig. 1) and country centroids. I will highlight the challenges GBIF faces identifying potential data quality problems and what we and others (Zizka et al. 2019) are doing to discover and address them. GBIF is the largest open-data portal of biodiversity data, which is a large network of individual datasets (> 40k) from various sources and publishers. Since these datasets are variable both within themselves and dataset-to-dataset, this creates a challenge for users wanting to use data collected from museums, smartphones, atlases, satellite tracking, DNA sequencing, and various other sources for research or analysis. Data quality at GBIF will always be a moving target (Chapman 2005), and GBIF already handles many obvious errors such as zero/impossible coordinates, empty or invalid data fields, and fuzzy taxon matching. Since GBIF primarily (but not exclusively) serves lat-lon location information, there is an expectation that occurrences fall somewhat close to where the species actually occurs. This is not always the case. Occurrence data can be hundereds of kilometers away from where the species naturally occur, and there can be multiple reasons for why this can happen, which might not be entirely obvious to users. One reasons is that many GBIF datasets are gridded. Gridded datasets are datasets that have low resolution due to equally-spaced sampling. This can be a data quality issue because a user might assume an occurrence record was recorded exactly at its coordinates. Country centroids are another reason why a species occurrence record might be far from where it occurs naturally. GBIF does not yet flag country centroids, which are records where the dataset publishers has entered the lat-long center of a country instead of leaving the field blank. I will discuss the challenges surrounding locating these issues and the current solutions (such as the CoordinateCleaner R package). I will touch on how existing DWCA terms like coordinateUncertaintyInMeters and footprintWKT are being utilized to highlight low coordinate resolution. Finally, I will highlight some other emerging data quality issues and how GBIF is beginning to experiment with dataset-level flagging. Currently we have flagged around 500 datasets as gridded and around 400 datasets as citizen science, but there are many more potential dataset flags.

Download Full-text

Tapping into technology and the biodiversity informatics revolution: updated terrestrial mammal list of Angola, with new records from the Okavango Basin

ZooKeys ◽

10.3897/zookeys.779.25964 ◽

2018 ◽

Vol 779 ◽

pp. 51-88 ◽

Cited By ~ 2

Author(s):

Peter J. Taylor ◽

Götz Neef ◽

Mark Keith ◽

Sina Weier ◽

Ara Monadjem ◽

...

Keyword(s):

Small Mammals ◽

Camera Trap ◽

Large Mammals ◽

Global Biodiversity Information Facility ◽

Mammal Species ◽

Species Lists ◽

First Time ◽

Excellent Source ◽

Biodiversity Information ◽

Historical Range

Using various sources, including the Global Biodiversity Information Facility (GBIF), published literature, recent (2015–2017) collections, as well as bat detector and camera trap surveys with opportunistic sightings and live capture in the upper Okavango catchment in central Angola, we present an updated mammal checklist of 275 species from 15 different orders for Angola (including the Cabinda region). Recent surveys (captures and bat detectors) of small mammals from the upper Okavango catchment yielded 46 species (33 species of bats, ten species of rodents and three species of shrews). One bat (Pipistrellusrusticus, rusty pipistrelle); two rodents (Mussetzeri, Setzer’s mouse and Zelotomyswoosnami, Woosnam’s broad-faced mouse) and one shrew (Suncusvarilla, lesser dwarf shrew) were captured for the first time, in Angola. While our species lists of bats conformed to predicted totals, terrestrial small mammals were under sampled, with only 13 species recorded by our trapping survey compared to a total of 42 shrew and rodent species expected based on GBIF records for the central Angolan highlands. Seven terrestrial small mammal species (one shrew and six rodents) are endemic to the central and western Angolan highlands but none of these were captured in our survey. The bat detector surveys added three further bat species to the country list: Pipistrellushesperidus, Kerivoulaargentata, and Mopsmidas. Camera trap surveys and opportunistic sightings in the upper Okavango catchment in 2016 yielded a total of 35 species of medium-large mammals, from 17 families, although all of these had been reported previously in Angola. GBIF proved to be an excellent source of biodiversity data for Angolan mammals, most importantly for documenting dramatic historical range changes of larger mammals such as the sable (Hippotragusnigerniger), Kirk’s sable (H.nigerkirkii) and the giant sable (H.nigervariani).

Download Full-text

Camtrap DP: A frictionless data exchange format for camera trapping data

Biodiversity Information Science and Standards ◽

10.3897/biss.5.73188 ◽

2021 ◽

Vol 5 ◽

Author(s):

Peter Desmet ◽

Jakub Bubnicki ◽

Ben Norton

Keyword(s):

Data Management ◽

Data Exchange ◽

Species Recognition ◽

Camera Trap ◽

Camera Trapping ◽

Essential Information ◽

Data Exchange Format ◽

Exchange Format ◽

Darwin Core ◽

Biodiversity Information

Camera trapping is one of the most important technologies in conservation and ecological research and a well-established, non-invasive method of collecting field data on animal abundance, distribution, behaviour, temporal activity, and space use (Wearn and Glover-Kapfer 2019). Collectively, camera trapping projects are generating a massive and continuous flow of data, consisting of images and videos (with and without animal observations) and associated identifications (Scotson et al. 2017, Kays et al. 2020). In recent years, significant progress has been made by the global camera trapping community to resolve the challenges this brings, from the development of specialized data management tools and analytical packages, to the application of cloud computing and artificial intelligence to automate species recognition (Tabak et al. 2018). However, to effectively exchange camera trap data between infrastructures and to (automatically) harmonize data into large-scale wildlife datasets, there is a need for a common data exchange format—one that captures the essential information about a camera trap study, allows expression of different study and identification approaches, and aligns well with existing biodiversity standards such as Darwin Core (Wieczorek et al. 2012). Here we present Camera Trap Data Package (Camtrap DP), a data exchange format for camera trap data. It is managed by the Machine Observations Interest Group of Biodiversity Information Standards (TDWG) and developed publicly, soliciting community feedback for every change. Camtrap DP is built on Frictionless Standards, a set of generic specifications to describe and package (tabular) data and metadata. Camtrap DP extends these with specific requirements and constraints for camera trap data. By building on an existing framework, users can employ existing open source software to read and validate Camtrap DP formatted data. Validation especially is useful to automatically check if provided data meets the requirements set forth by Camtrap DP, before analysis or integration. Supported by the major camera trap data management systems e.g. Agouti, TRAPPER, eMammal, and Wildlife Insights, Camtrap DP is reaching its first stable version. The first Camtrap DP dataset was published on Zenodo (Cartuyvels et al. 2021b). This dataset was also published to the Global Biodiversity Information Facility (GBIF) (Cartuyvels et al. 2021a), demonstrating the ability and limitations of transforming the data to the Darwin Core standard.

Download Full-text

BiGe-Onto: An ontology-based system for managing biodiversity and biogeography data1

Applied Ontology ◽

10.3233/ao-200228 ◽

2020 ◽

Vol 15 (4) ◽

pp. 411-437 ◽

Cited By ~ 3

Author(s):

Marcos Zárate ◽

Germán Braun ◽

Pablo Fillottrani ◽

Claudio Delrieux ◽

Mirtha Lewis

Keyword(s):

Data Sources ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Sparql Endpoint ◽

Darwin Core ◽

Metadata Standards ◽

Great Progress ◽

Global Biodiversity ◽

Research Domains ◽

Biodiversity Information

Great progress to digitize the world’s available Biodiversity and Biogeography data have been made recently, but managing data from many different providers and research domains still remains a challenge. A review of the current landscape of metadata standards and ontologies in Biodiversity sciences suggests that existing standards, such as the Darwin Core terminology, are inadequate for describing Biodiversity data in a semantically meaningful and computationally useful way. As a contribution to fill this gap, we present an ontology-based system, called BiGe-Onto, designed to manage data together from Biodiversity and Biogeography. As data sources, we use two internationally recognized repositories: the Global Biodiversity Information Facility (GBIF) and the Ocean Biogeographic Information System (OBIS). BiGe-Onto system is composed of (i) BiGe-Onto Architecture (ii) a conceptual model called BiGe-Onto specified in OntoUML, (iii) an operational version of BiGe-Onto encoded in OWL 2, and (iv) an integrated dataset for its exploitation through a SPARQL endpoint. We will show use cases that allow researchers to answer questions that manage information from both domains.

Download Full-text

Agouti: A platform for processing and archiving of camera trap images

Biodiversity Information Science and Standards ◽

10.3897/biss.3.46690 ◽

2019 ◽

Vol 3 ◽

Cited By ~ 1

Author(s):

Jim Casaer ◽

Tanja Milotic ◽

Yorick Liefting ◽

Peter Desmet ◽

Patrick Jansen

Keyword(s):

Activity Patterns ◽

Scientific Research ◽

Distribution Patterns ◽

Camera Trap ◽

Scientific Data ◽

Camera Traps ◽

Crowd Sourcing ◽

Global Biodiversity Information Facility ◽

Metadata Standard ◽

Processing And Storage

Camera traps placed in the field, photograph warm-bodied animals that pass in front of an infrared sensor. The imagery represents a rich source of data on mammals larger than ~200 grams, providing information at the level of species and communities. Camera-trap surveys generate observations of specific mammals at a certain location and time, including photo evidence that can be evaluated by experts to map species distribution patterns. The imagery also provides information on the species composition of local communities, identifying which species co-occur and in what proportion. Moreover, the images contain information on activity patterns and other interesting aspects of animal behaviour. Because surveys can be standardized relatively easily, camera traps are well suited for documenting shifts in the behaviour, distribution and community composition, for example in response to climate and land-use change. Imagery from camera traps can thus serve as a baseline for subsequent surveys. In less than two decades, camera traps have become the standard tool for surveying mammals. They are simple to use and non-invasive, requiring no special permits. As a consequence they are widely used by professionals and hobbyists alike. Together, tens of thousands of users have the potential to form a huge sensor network. Unfortunately however, imagery and data collected are currently rarely integrated. Rather, they are lost at a massive scale. Users tend to retain only a subset of the photos and discard the rest. Or the material ends up on an external hard disk that will at some point fail or be erased as these scientific data tend to be used within the scope of specific projects. Very few of the wealth of material becomes available for scientific research and monitoring. Moreover, joint projects are rare and there is little coordination between camera-trap users. A solution to this problem is provided by Agouti, a platform for the organization, processing and storage of camera-trap imagery (www.agouti.eu). The aim of Agouti is, on the one hand, to standardize and facilitate collaborative camera-trap surveys, and on the other hand to compile and secure imagery and data for scientific research and monitoring, by encouraging users to share their material. Agouti provides an interface that allows users to collaborate on projects, organize and manage their surveys, upload and store imagery, and annotate images with species identifications and characteristics. Images can also be annotated through basic image recognition and crowd sourcing via a connection with the citizen science platform Zooniverse, which creates the potential to reach new audiences. Exporting data and imagery in the Camera Trap Metadata Standard (Forrester et al. 2016) will be supported in the near future. This will allow data to be archived outside of Agouti in research repositories such as Zenodo and by further mapping to Darwin Core to be made discoverable on the Global Biodiversity Information Facility (GBIF). Agouti provides both professionals and the public with a practical solution for retaining camera-trap surveys and simultaneously engages people in contributing data to science in a standardized and organized manner, to the benefit of science and conservation.

Download Full-text

Vocabulary challenges with invasive species data sharing

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25642 ◽

2018 ◽

Vol 2 ◽

pp. e25642

Author(s):

Annie Simpson

Keyword(s):

Invasive Species ◽

Native Species ◽

Value Added ◽

Application Programming Interface ◽

The United States ◽

United States Geological Survey ◽

Global Biodiversity Information Facility ◽

Darwin Core ◽

Biodiversity Information ◽

The U.S

Biodiversity Information Serving our Nation - BISON (bison.usgs.gov) is the U.S. node to the Global Biodiversity Information Facility (gbif.org), containing more than 375 million documented locations for all species in the U.S. It is hosted by the United States Geological Survey (USGS) and includes a web site and application programming interface for apps and other websites to use for free. With this massive database one can see not only the 15 million records for nearly 10 thousand non-native species in the U.S. and its territories, but also their relationship to all of the other species in the country as well as their full national range. Leveraging this huge resource and its enterprise level cyberinfrastructure, USGS BISON staff have created a value-added feature by labeling non-native species records, even where contributing datasets have not provided such labels. Based on our ongoing four-year compilation of non-native species scientific names from the literature, specific examples will be shared about the ambiguity and evolution of terms that have been discovered, as they relate to invasiveness, impact, dispersal, and management. The idea of incorporating these terms into an invasive species extension to Darwin Core has been discussed by Biodiversity Information Standards (TDWG) working group participants since at least 2005. One roadblock to the implementation of this standard's extension has been the diverse terminology used to describe the characteristics of biological invasions, terminology which has evolved significantly over the past decade.

Download Full-text

A Workflow for the Semantic Annotation of Field Books and Specimen Labels

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25839 ◽

2018 ◽

Vol 2 ◽

pp. e25839

Author(s):

Lise Stork ◽

Andreas Weber ◽

Eulàlia Miracle ◽

Katherine Wolstencroft

Keyword(s):

Natural History ◽

Knowledge Base ◽

Web Application ◽

Semantic Annotation ◽

Open Data ◽

Linked Open Data ◽

Easy Access ◽

Observation Data ◽

Natural History Collections ◽

Darwin Core

Geographical and taxonomical referencing of specimens and documented species observations from within and across natural history collections is vital for ongoing species research. However, much of the historical data such as field books, diaries and specimens, are challenging to work with. They are computationally inaccessable, refer to historical place names and taxonomies, and are written in a variety of languages. In order to address these challenges and elucidate historical species observation data, we developed a workflow to (i) crowd-source semantic annotations from handwritten species observations, (ii) transform them into RDF (Resource Description Framework) and (iii) store and link them in a knowledge base. Instead of full-transcription we directly annotate digital field books scans with key concepts that are based on Darwin Core standards. Our workflow stresses the importance of verbatim annotation. The interpretation of the historical content, such a resolving a historical taxon to a current one, can be done by individual researchers after the content is published as linked open data. Through the storage of annotion provenance, who created the annotation and when, we allow multiple interpretations of the content to exist in parallel, stimulating scientific discourse. The semantic annotation process is supported by a web application, the Semantic Field Book (SFB)-Annotator, driven by an application ontology. The ontology formally describes the content and meta-data required to semantically annotate species observations. It is based on the Darwin Core standard (DwC), Uberon and the Geonames ontology. The provenance of annotations is stored using the Web Annotation Data Model. Adhering to the principles of FAIR (Findable, Accessible, Interoperable & Reusable) and Linked Open Data, the content of the specimen collections can be interpreted homogeneously and aggregated across datasets. This work is part of the Making Sense project: makingsenseproject.org. The project aims to disclose the content of a natural history collection: a 17,000 page account of the exploration of the Indonesian Archipelago between 1820 and 1850 (Natuurkundige Commissie voor Nederlands-Indie) With a knowledge base, researchers are given easy access to the primary sources of natural history collections. For their research, they can aggregate species observations, construct rich queries to browse through the data and add their own interpretations regarding the meaning of the historical content.

Download Full-text