NaturalHeritage: Bridging Belgian natural history collections

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37854 ◽

2019 ◽

Vol 3 ◽

Author(s):

Franck Theeten ◽

Marielle Adam ◽

Thomas Vandenberghe ◽

Mathias Dillen ◽

Patrick Semal ◽

...

Keyword(s):

Natural History ◽

Marine Species ◽

Biodiversity Data ◽

Data Centre ◽

The World ◽

Collection Data ◽

Oceanographic Data ◽

Human Validation ◽

Data Requirements ◽

Biodiversity Information

The Royal Belgian Institute of Natural Sciences (RBINS), the Royal Museum for Central Africa (RMCA) and Meise Botanic Garden house more than 50 million specimens covering all fields of natural history. While many different research topics have their own specificities, throughout the years it became apparent that with regards to collection data management, data publication and exchange via community standards, collection holding institutions face similar challenges (James et al. 2018, Rocha et al. 2014). In the past, these have been tackled in different ways by Belgian natural history institutions. In addition to local and national collaborations, there is a great need for a joint structure to share data between scientific institutions in Europe and beyond. It is the aim of large networks and infrastructures such as the Global Biodiversity Information Facility (GBIF), the Biodiversity Information Standards (TDWG), the Distributed System of Scientific collections (DiSSCo) and the Consortium of European Taxonomic Facilities (CETAF) to further implement and improve these efforts, thereby gaining ever increasing efficiencies. In this context, the three institutions mentioned above, submitted the NaturalHeritage project (http://www.belspo.be/belspo/brain-be/themes_3_HebrHistoScien_en.stm) granted in 2017 by the Belgian Science Policy Service, which runs from 2017 to 2020. The project provides links among databases and services. The unique qualities of each database are maintained, while the information can be concentrated and exposed in a structured way via one access point. This approach aims also to link data that are unconnected at present (e.g. relationship between soil/substrate, vegetation and associated fauna) and to improve the cross-validation of data. (1) The NaturalHeritage prototype (http://www.naturalheritage.be) is a shared research portal with an open access infrastructure, which is still in the development phase. Its backbone is an ElasticSearch catalogue, with Kibana, and a Python aggregator gathering several types of (re)sources: relational databases, REpresentational State Transfer (REST) services of objects databases and bibliographical data, collections metadata and the GBIF Internet Publishing Toolkit (IPT) for observational and taxonomical data. Semi-structured data in English are semantically analysed and linked to a rich autocomplete mechanism. Keywords and identifiers are indexed and grouped in four categories (“what”, “who”, “where”, “when”). The portal can act also as an Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH) service and ease indexing of the original webpage on the internet with microdata enrichment. (2) The collection data management system of DaRWIN (Data Research Warehouse Information Network) of RBINS and RMCA has been improved as well. External (meta)data requirements, i.e. foremost publication into or according to the practices and standards of GBIF and OBIS (Ocean Biogeographic Information System: https://obis.org) for biodiversity data, and INSPIRE (https://inspire.ec.europa.eu) for geological data, have been identified and evaluated. New and extended data structures have been created to be compliant with these standards, as well as the necessary procedures developed to expose the data. Quality control tools for taxonomic and geographic names have been developed. Geographic names can be hard to confirm as their lack of context often requires human validation. To address this a similarity measure is used to help map the result. Species, locations, sampling devices and other properties have been mapped to the World Register of Marine Species and DarwinCore (http://www.marinespecies.org), Marine Regions and GeoNames, the AGRO Agronomy and Vertebrate trait ontologies and the British Oceanographic Data Centre (BODC) vocabularies (http://www.obofoundry.org/ontology/agro.html). Extensive mapping is necessary to make use of the ExtendedMeasurementOrFact Extension of DarwinCore (https://tools.gbif.org/dwca-validator/extensions.do). External (meta)data requirements, i.e. foremost publication into or according to the practices and standards of GBIF and OBIS (Ocean Biogeographic Information System: https://obis.org) for biodiversity data, and INSPIRE (https://inspire.ec.europa.eu) for geological data, have been identified and evaluated. New and extended data structures have been created to be compliant with these standards, as well as the necessary procedures developed to expose the data. Quality control tools for taxonomic and geographic names have been developed. Geographic names can be hard to confirm as their lack of context often requires human validation. To address this a similarity measure is used to help map the result. Species, locations, sampling devices and other properties have been mapped to the World Register of Marine Species and DarwinCore (http://www.marinespecies.org), Marine Regions and GeoNames, the AGRO Agronomy and Vertebrate trait ontologies and the British Oceanographic Data Centre (BODC) vocabularies (http://www.obofoundry.org/ontology/agro.html). Extensive mapping is necessary to make use of the ExtendedMeasurementOrFact Extension of DarwinCore (https://tools.gbif.org/dwca-validator/extensions.do).

Download Full-text

Is Your Collection Ambiguous?

Biodiversity Information Science and Standards ◽

10.3897/biss.5.73702 ◽

2021 ◽

Vol 5 ◽

Author(s):

Mathias Dillen ◽

Elspeth Haston ◽

Nicole Kearney ◽

Deborah L Paul ◽

Joaquim Santos ◽

...

Keyword(s):

Natural History ◽

Gap Analysis ◽

Data Entry ◽

Task Group ◽

Ethical Guidelines ◽

Biodiversity Data ◽

The People ◽

New Information ◽

Collection Data ◽

Data Gap

The natural history specimens of the world have been documented on paper labels, often physically attached to the specimen itself. As we transcribe these data to make them digital and more useful for analysis, we make interpretations. Sometimes these interpretations are trivial, because the label is unambiguous, but often the meaning is not so clear, even if it is easily read. One key element that suffers from considerable ambiguity is people’s names. Though a person is indivisible, their name can change, is rarely unique and can be written in many ways. Yet knowing the people associated with data is incredibly useful. Data on people can be used to validate other data, simplify data capture, link together data across domains, reduce duplication-of-effort and facilitate data-gap-analysis. In addition, people data enable the discovery of individuals unique to our collections, the collective charting of the history of scientific researchers and the provision of credit to the people who deserve it (Groom et al. 2020). We foresee a future where the people associated with collections are not ambiguous, are shared globally, and data of all kinds are linked through the people who generate them. The TDWG People in Biodiversity Data Task Group is therefore working on a guide to the disambiguation of people in natural history collections. The ultimate goal is to connect the various strings of characters on specimen labels and other documentation to persistent identifiers (PIDs) that unambiguously link a name “string” to the identity of a person. In working towards this goal, 150 volunteers in the Bionomia project have linked 21 million specimens to persistent identifiers for their collectors and determiners. An additional 2 million specimens with links to identifiers for people have already emerged directly from collections that make use of the recently ratified Darwin Core terms recordedByID and identifiedByID. Furthermore, the CETAF Botany Pilot conducted among a group of European herbaria and museums has connected over 1.4 million specimens to disambiguated collectors (Güntsch et al. 2021). Still, given the estimated 2 billion (Ariño 2010) natural history specimens globally, there is much more disambiguation to be done. The process of disambiguation starts with a trigger, which is often the transcription of a specimen’s label data. Unambiguous identification of the collector may facilitate this transcription, as it offers knowledge of their biographical details and collecting habits, allowing us to infer missing information such as collecting date or locality. Another trigger might be the flagging of inconsistent data during data entry or resulting from data quality processes, revealing for instance that multiple collectors have been conflated. A disambiguation trigger is followed by the gathering of data, then the evaluation of the results and finally by the documentation of the new information. Disambiguation is not always straightforward and there are many pitfalls. It requires access to biographical data, and identifiers to be minted. In the case of living people, they have to cooperate with being disambiguated and we have to follow legal and ethical guidelines. In the case of dead people, particularly those long dead, disambiguation may require considerable research. We will present the progress made by the People in Biodiversity Data Task Group and their recommendations for disambiguation in collections. We want to encourage other institutions to engage with a global effort of linking people to persistent identifiers to collaboratively improve all collection data.

Download Full-text

Translating TDWG Controlled Vocabularies

Biodiversity Information Science and Standards ◽

10.3897/biss.5.79050 ◽

2021 ◽

Vol 5 ◽

Author(s):

Steven J Baskauf ◽

Paula Zermoglio

Keyword(s):

Small Groups ◽

Development Process ◽

Biodiversity Data ◽

Controlled Vocabularies ◽

The World ◽

English Speaking ◽

Standards Development ◽

Biodiversity Information

Users may be more likely to understand and utilize standards if they are able to read labels and definitions of terms in their own languages. Increasing standards usage in non-English speaking parts of the world will be important for making biodiversity data from across the globe more uniformly available. For these reasons, it is important for Biodiversity Information Standards (TDWG) to make its standards widely available in as many languages as possible. Currently, TDWG has six ratified controlled vocabularies*1, 2, 3, 4, 5, 6 that were originally available only in English. As an outcome of this workshop, we have made term labels and definitions in those vocabularies available in the languages of translators who participated in its sessions. In the introduction, we reviewed the concept of vocabularies, explained the distinction between term labels and controlled value strings, and described how multilingual labels and definitions fit into the standards development process. The introduction was followed by working sessions in which individual translators or small groups working in a single language filled out Google Sheets with their translations. The resulting translations were compiled along with attribution information for the translators and made freely available in JavaScript Object Notation (JSON) and comma separated values (CSV) formats.*7

Download Full-text

Unlocking the Entomological Collection of the Natural History Museum of Maputo, Mozambique

Biodiversity Data Journal ◽

10.3897/bdj.9.e64461 ◽

2021 ◽

Vol 9 ◽

Author(s):

Domingos Sandramo ◽

Enrico Nicosia ◽

Silvio Cianciullo ◽

Bernardo Muatinte ◽

Almeida Guissamulo

Keyword(s):

Natural History ◽

Crucial Role ◽

Development Programme ◽

Natural History Museum ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

History Museum ◽

Data Portal ◽

Global Biodiversity ◽

Biodiversity Information

The collections of the Natural History Museum of Maputo have a crucial role in the safeguarding of Mozambique's biodiversity, representing an important repository of data and materials regarding the natural heritage of the country. In this paper, a dataset is described, based on the Museum’s Entomological Collection recording 409 species belonging to seven orders and 48 families. Each specimen’s available data, such as geographical coordinates and taxonomic information, have been digitised to build the dataset. The specimens included in the dataset were obtained between 1914–2018 by collectors and researchers from the Natural History Museum of Maputo (once known as “Museu Alváro de Castro”) in all the country’s provinces, with the exception of Cabo Delgado Province. This paper adds data to the Biodiversity Network of Mozambique and the Global Biodiversity Information Facility, within the objectives of the SECOSUD II Project and the Biodiversity Information for Development Programme. The aforementioned insect dataset is available on the GBIF Engine data portal (https://doi.org/10.15468/j8ikhb). Data were also shared on the Mozambican national portal of biodiversity data BioNoMo (https://bionomo.openscidata.org), developed by SECOSUD II Project.

Download Full-text

Using the Taxonomic Backbone(s): The challenge of selecting a taxonomic resource and integrating it with a collection management solution

Biodiversity Information Science and Standards ◽

10.3897/biss.5.74115 ◽

2021 ◽

Vol 5 ◽

Author(s):

Teresa Mayfield-Meyer ◽

Phyllis Sharp ◽

Dusty McDonald

Keyword(s):

Marine Invertebrate ◽

Marine Species ◽

Collection Management ◽

Global Biodiversity Information Facility ◽

The World ◽

Name Matching ◽

Research Grade ◽

Global Biodiversity ◽

Biodiversity Information ◽

Insight Into

The reality is that there is no single “taxonomic backbone”, there are many: the Global Biodiversity Information Facility (GBIF) Backbone Taxonomy, the World Register of Marine Species (WoRMS) and MolluscaBase, to name a few. We could view each one of these as a vertebra on the taxonomic backbone, but even that isn’t quite correct as some of these are nested within others (MolluscaBase contributes to WoRMS, which contributes to Catalogue of Life, which contributes to the GBIF Backbone Taxonomy). How is a collection manager without expertise in a given set of taxa and a limited amount of time devoted to finding the “most current” taxonomy supposed to maintain research grade identifications when there are so many seemingly authoritative taxonomic resources? And once a resource is chosen, how can they seamlessly use the information in that resource? This presentation will document how the Arctos community’s use of the taxon name matching service Global Names Architecture (GNA) led one volunteer team leader in a marine invertebrate collection to attempt to make use of WoRMS taxonomy and how her persistence brought better identifications and classifications to a community of collections. It will also provide insight into some of the technical and curatorial challenges involved in using an outside resource as well as the ongoing struggle to keep up with changes as they occur in the curated resource.

Download Full-text

FinBIF: An all-embracing, integrated, cross-sectoral biodiversity data infrastructure

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37253 ◽

2019 ◽

Vol 3 ◽

Cited By ~ 1

Author(s):

Leif Schulman ◽

Aino Juslén ◽

Kari Lahti

Keyword(s):

Natural History ◽

Data Management ◽

Species Identification ◽

Large Scale ◽

Dna Barcode ◽

National Research Council ◽

Observation Data ◽

Biodiversity Data ◽

Research Infrastructures ◽

Biodiversity Information

The service model of the Global Biodiversity Information Facility (GBIF) is being implemented in an increasing number of national biodiversity (BD) data services. While GBIF already shares >109 data points, national initiatives are an essential component: increase in GBIF-mediated data relies on national data mobilisation and GBIF is not optimised to support local use. The Finnish Biodiversity Information Facility (FinBIF), initiated in 2012 and operational since late 2016, is one of the more recent examples of national BD research infrastructures (RIs) – and arguably among the most comprehensive. Here, we describe FinBIF’s development and service integration, and provide a model approach for the construction of all-inclusive national BD RIs. FinBIF integrates a wide array of BD RI approaches under the same umbrella. These include large-scale and multi-technology digitisation of natural history collections; building a national DNA barcode reference library and linking it to species occurrence data; citizen science platforms enabling recording, managing and sharing of observation data; management and sharing of restricted data among authorities; community-driven species identification support; an e-learning environment for species identification; and IUCN Red Listing (Fig. 1). FinBIF’s aims are to accelerate digitisation, mobilisation, and distribution of biodiversity data and to boost their use in research and education, environmental administration, and the private sector. The core functionalities of FinBIF were built in a 3.5-year project (01/2015–06/2018) by a consortium of four university-based natural history collection facilities led by the Finnish Museum of Natural History Luomus. Close to 30% of the total funding was granted through the Finnish Research Infrastructures programme (FIRI) governed by the national research council and based on scientific excellence. Government funds for productivity enhancement in state administration covered c.40 % of the development and the rest was self-financed by the implementing consortium of organisations that have both a research and an education mission. The cross-sectoral scope of FinBIF has led to rapid uptake and a broad user base of its functionalities and services. Not only researchers but also administrative authorities, various enterprises and a large number of private citizens show a significant interest in the RI (Table 1). FinBIF is now in its second construction cycle (2019–2022), funded through the FIRI programme and, thus, focused on researcher services. The work programme includes integration of tools for data management in ecological restoration and e-Lab tools for spatial analyses, morphometric analysis of 3D images, species identification from sound recordings, and metagenomics analyses.

Download Full-text

Harvestmen occurrence database (Arachnida, Opiliones) of the Museu Paraense Emílio Goeldi, Brazil

Biodiversity Data Journal ◽

10.3897/bdj.7.e47456 ◽

2019 ◽

Vol 7 ◽

Author(s):

Valéria da Silva ◽

Manoel Aguiar-Neto ◽

Dan Teixeira ◽

Cleverson Santos ◽

Marcos de Sousa ◽

...

Keyword(s):

Public Consultation ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

The Third ◽

The World ◽

Northern Brazil ◽

Global Biodiversity ◽

The Government ◽

Biodiversity Information ◽

Brazilian Biodiversity

We present a dataset with information from the Opiliones collection of the Museu Paraense Emílio Goeldi, Northern Brazil. This collection currently has 6,400 specimens distributed in 13 families, 30 genera and 32 species and holotypes of four species: Imeri ajuba Coronato-Ribeiro, Pinto-da-Rocha & Rheims, 2013, Phareicranaus patauateua Pinto-da-Rocha & Bonaldo, 2011, Protimesius trocaraincola Pinto-da-Rocha, 1997 and Sickesia tremembe Pinto-da-Rocha & Carvalho, 2009. The material of the collection is exclusive from Brazil, mostly from the Amazon Region. The dataset is now available for public consultation on the Sistema de Informação sobre a Biodiversidade Brasileira (SiBBr) (https://ipt.sibbr.gov.br/goeldi/resource?r=museuparaenseemiliogoeldi-collection-aracnologiaopiliones). SiBBr is the Brazilian Biodiversity Information System, an initiative of the government and the Brazilian node of the Global Biodiversity Information Facility (GBIF), which aims to consolidate and make primary biodiversity data available on a platform (Dias et al. 2017). Harvestmen or Opiliones constitute the third largest arachnid order, with approximately 6,500 described species. Brazil is the holder of the greatest diversity in the world, with more than 1,000 described species, 95% (960 species) of which are endemic to the country. Of these, 32 species were identified and deposited in the collection of the Museu Paraense Emílio Goeldi.

Download Full-text

People: From a Collection Manager's Viewpoint

Biodiversity Information Science and Standards ◽

10.3897/biss.3.35709 ◽

2019 ◽

Vol 3 ◽

Author(s):

Elspeth Haston ◽

Lorna Mitchell

Keyword(s):

Natural History ◽

Direct Result ◽

Global Biodiversity Information Facility ◽

Natural History Collections ◽

Plant Names ◽

The World ◽

Global Biodiversity ◽

Biodiversity Information ◽

Do So ◽

Residual Number

The specimens held in natural history collections around the world are the direct result of the effort of thousands of people over hundreds of years. However, the way that the names of these people have been recorded within the collections has never been fully standardised, and this makes the process of correctly assigning the event relating to the specimen to an individual difficult at best, and impossible at worst. The events in which people are related to specimens include collecting, identifying, naming, loaning and owning. Whilst there are resources in the botanical community that hold information on many collectors and authors of plant names, the residual number of unknown people and the effort required to disambiguate them is daunting. Moreover, in many cases, the work carried out within the collection to disambiguate the names relating to the specimens is often not recorded and made available, generally due to the lack of a system to do so. This situation is making it extremely difficult to search for collections within the main aggregators, such as GBIF —the Global Biodiversity Information Facility— , and severely hampers our ability to link collections both within and between institutes and disciplines. When we look at benefits of linking collections and people, the need to agree and implement a system of managing people names becomes increasingly urgent.

Download Full-text

Creating a National Biodiversity Database in Gabon and the Challenges of Mobilizing Natural History Data for Francophone Countries

Biodiversity Information Science and Standards ◽

10.3897/biss.5.75643 ◽

2021 ◽

Vol 5 ◽

Author(s):

Elie Tobi ◽

Geovanne Aymar Nziengui Djiembi ◽

Anna Feistner ◽

Donald Midoko Iponga ◽

Jean Felicien Liwouwou ◽

...

Keyword(s):

Natural History ◽

Spoken Language ◽

Decision Makers ◽

Global Biodiversity Information Facility ◽

Major Barrier ◽

French Speaking ◽

Collection Data ◽

Biodiversity Database ◽

Biodiversity Information ◽

Different Parts

Language is a major barrier for researchers wanting to digitize and publish collection data in Africa. Despite being the fifth most spoken language on Earth and the second most common in Africa, resources in French about digitization, data management, and publishing are lacking. Furthermore, French-speaking regions of Africa (primarily Central/West Africa and Madagascar) host some of the highest biodiversity on the continent and therefore are of great importance to scientists and decision-makers. Without having representation in online portals like the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio), these important collections are effectively invisible. Producing relevant/applicable resources about digitization in French will help shine a light on these valuable natural history records and allow the data-holders in Africa to retain the autonomy of their collections. Awarded a GBIF-BID (Biodiversity Information for Development) grant in 2021, an international, multilingual network of partners has undertaken the important task of digitizing and mobilizing Gabon’s vertebrate collections. There are an estimated 13,500 vertebrate specimens housed in five institutions in different parts of Gabon. To date, the group has mobilized >4,600 vertebrate records to our recently launched Gabon Biodiversity Portal (https://gabonbiota.org/). The portal also hosts French guides for using Symbiota-based portals to manage, georeference, and publish natural history databases. These resources can provide much-needed guidance for other Francophone countries⁠—in Africa and beyond⁠—working to maximize the accessibility and value of their biodiversity collections.

Download Full-text