scholarly journals FinBIF: An all-embracing, integrated, cross-sectoral biodiversity data infrastructure

Author(s):  
Leif Schulman ◽  
Aino Juslén ◽  
Kari Lahti

The service model of the Global Biodiversity Information Facility (GBIF) is being implemented in an increasing number of national biodiversity (BD) data services. While GBIF already shares >109 data points, national initiatives are an essential component: increase in GBIF-mediated data relies on national data mobilisation and GBIF is not optimised to support local use. The Finnish Biodiversity Information Facility (FinBIF), initiated in 2012 and operational since late 2016, is one of the more recent examples of national BD research infrastructures (RIs) – and arguably among the most comprehensive. Here, we describe FinBIF’s development and service integration, and provide a model approach for the construction of all-inclusive national BD RIs. FinBIF integrates a wide array of BD RI approaches under the same umbrella. These include large-scale and multi-technology digitisation of natural history collections; building a national DNA barcode reference library and linking it to species occurrence data; citizen science platforms enabling recording, managing and sharing of observation data; management and sharing of restricted data among authorities; community-driven species identification support; an e-learning environment for species identification; and IUCN Red Listing (Fig. 1). FinBIF’s aims are to accelerate digitisation, mobilisation, and distribution of biodiversity data and to boost their use in research and education, environmental administration, and the private sector. The core functionalities of FinBIF were built in a 3.5-year project (01/2015–06/2018) by a consortium of four university-based natural history collection facilities led by the Finnish Museum of Natural History Luomus. Close to 30% of the total funding was granted through the Finnish Research Infrastructures programme (FIRI) governed by the national research council and based on scientific excellence. Government funds for productivity enhancement in state administration covered c.40 % of the development and the rest was self-financed by the implementing consortium of organisations that have both a research and an education mission. The cross-sectoral scope of FinBIF has led to rapid uptake and a broad user base of its functionalities and services. Not only researchers but also administrative authorities, various enterprises and a large number of private citizens show a significant interest in the RI (Table 1). FinBIF is now in its second construction cycle (2019–2022), funded through the FIRI programme and, thus, focused on researcher services. The work programme includes integration of tools for data management in ecological restoration and e-Lab tools for spatial analyses, morphometric analysis of 3D images, species identification from sound recordings, and metagenomics analyses.

Author(s):  
Kari Lahti ◽  
Liselott Skarp

The Finnish Biodiversity Information Facility FinBIF (LINK: species.fi), operational since late 2016, is one of the more recent examples of comprehensive, all-inclusive national biodiversity research infrastructures. FinBIF integrates a wide array of biodiversity information approaches under the same umbrella. These include species information Fig. 1 (e.g. descriptions, photos and administrative attributes); citizen science platforms enabling recording, managing and sharing of observation data; an e-learning environment for species identification; management and sharing of restricted data among authorities; building a national DNA barcode reference library and linking it to species occurrence data; community-driven species identification support; large-scale and multi-technology digitisation of natural history collections; and IUCN Red Listing to conduct a periodic national assesment of the status of the threatened species. To improve the taxonomic coverage and the content of species information, FinBIF is starting a process to collaborate with the species information community at large, in order to collate already existing but not yet openly distributed information. This also means digitisation of information from analogue sources. In addition, the attempt is to join forces with Scandinavian counterparts, namely Artdatabanken (LINK: https://www.artdatabanken.se/) and Artsdatabanken (LINK: https://www.artsdatabanken.no/), for more efficient knowledge exchange within the countries sharing the same biogeographical region and thus similar species composition. The aim is also to reach politically high level agreement for deeper and wider commitment to collaborate in compiling, digitising and sharing relevant biodiversity information over the national borders.


Author(s):  
Katharine Barker ◽  
Jonas Astrin ◽  
Gabriele Droege ◽  
Jonathan Coddington ◽  
Ole Seberg

Most successful research programs depend on easily accessible and standardized research infrastructures. Until recently, access to tissue or DNA samples with standardized metadata and of a sufficiently high quality, has been a major bottleneck for genomic research. The Global Geonome Biodiversity Network (GGBN) fills this critical gap by offering standardized, legal access to samples. Presently, GGBN’s core activity is enabling access to searchable DNA and tissue collections across natural history museums and botanic gardens. Activities are gradually being expanded to encompass all kinds of biodiversity biobanks such as culture collections, zoological gardens, aquaria, arboreta, and environmental biobanks. Broadly speaking, these collections all provide long-term storage and standardized public access to samples useful for molecular research. GGBN facilitates sample search and discovery for its distributed member collections through a single entry point. It stores standardized information on mostly geo-referenced, vouchered samples, their physical location, availability, quality, and the necessary legal information on over 50,000 species of Earth’s biodiversity, from unicellular to multicellular organisms. The GGBN Data Portal and the GGBN Data Standard are complementary to existing infrastructures such as the Global Biodiversity Information Facility (GBIF) and International Nucleotide Sequence Database (INSDC). Today, many well-known open-source collection management databases such as Arctos, Specify, and Symbiota, are implementing the GGBN data standard. GGBN continues to increase its collections strategically, based on the needs of the research community, adding over 1.3 million online records in 2018 alone, and today two million sample data are available through GGBN. Together with Consortium of European Taxonomic Facilities (CETAF), Society for the Preservation of Natural History Collections (SPNHC), Biodiversity Information Standards (TDWG), and Synthesis of Systematic Resources (SYNTHESYS+), GGBN provides best practices for biorepositories on meeting the requirements of the Nagoya Protocol on Access and Benefit Sharing (ABS). By collaboration with the Biodiversity Heritage Library (BHL), GGBN is exploring options for tagging publications that reference GGBN collections and associated specimens, made searchable through GGBN’s document library. Through its collaborative efforts, standards, and best practices GGBN aims at facilitating trust and transparency in the use of genetic resources.


2021 ◽  
Vol 118 (6) ◽  
pp. e2018093118
Author(s):  
J. Mason Heberling ◽  
Joseph T. Miller ◽  
Daniel Noesgaard ◽  
Scott B. Weingart ◽  
Dmitry Schigel

The accessibility of global biodiversity information has surged in the past two decades, notably through widespread funding initiatives for museum specimen digitization and emergence of large-scale public participation in community science. Effective use of these data requires the integration of disconnected datasets, but the scientific impacts of consolidated biodiversity data networks have not yet been quantified. To determine whether data integration enables novel research, we carried out a quantitative text analysis and bibliographic synthesis of >4,000 studies published from 2003 to 2019 that use data mediated by the world’s largest biodiversity data network, the Global Biodiversity Information Facility (GBIF). Data available through GBIF increased 12-fold since 2007, a trend matched by global data use with roughly two publications using GBIF-mediated data per day in 2019. Data-use patterns were diverse by authorship, geographic extent, taxonomic group, and dataset type. Despite facilitating global authorship, legacies of colonial science remain. Studies involving species distribution modeling were most prevalent (31% of literature surveyed) but recently shifted in focus from theory to application. Topic prevalence was stable across the 17-y period for some research areas (e.g., macroecology), yet other topics proportionately declined (e.g., taxonomy) or increased (e.g., species interactions, disease). Although centered on biological subfields, GBIF-enabled research extends surprisingly across all major scientific disciplines. Biodiversity data mobilization through global data aggregation has enabled basic and applied research use at temporal, spatial, and taxonomic scales otherwise not possible, launching biodiversity sciences into a new era.


2021 ◽  
Vol 9 ◽  
Author(s):  
Domingos Sandramo ◽  
Enrico Nicosia ◽  
Silvio Cianciullo ◽  
Bernardo Muatinte ◽  
Almeida Guissamulo

The collections of the Natural History Museum of Maputo have a crucial role in the safeguarding of Mozambique's biodiversity, representing an important repository of data and materials regarding the natural heritage of the country. In this paper, a dataset is described, based on the Museum’s Entomological Collection recording 409 species belonging to seven orders and 48 families. Each specimen’s available data, such as geographical coordinates and taxonomic information, have been digitised to build the dataset. The specimens included in the dataset were obtained between 1914–2018 by collectors and researchers from the Natural History Museum of Maputo (once known as “Museu Alváro de Castro”) in all the country’s provinces, with the exception of Cabo Delgado Province. This paper adds data to the Biodiversity Network of Mozambique and the Global Biodiversity Information Facility, within the objectives of the SECOSUD II Project and the Biodiversity Information for Development Programme. The aforementioned insect dataset is available on the GBIF Engine data portal (https://doi.org/10.15468/j8ikhb). Data were also shared on the Mozambican national portal of biodiversity data BioNoMo (https://bionomo.openscidata.org), developed by SECOSUD II Project.


Author(s):  
Choki Gyeltshen ◽  
Sangay Dema

Access to reliable and updated data and information on the status of biodiversity for effective conservation and sustainable use has been one of the major challenges in Bhutan. The current scenario of inaccessibility is due to the fact that biodiversity inventories and documentation are carried out within the context of individual projects and institutions, guided by their specific objectives and collection standards, often in isolation. More critical is the fact that these data hardly get shared nor are they easily accessible, resulting in either duplication of efforts or underutilization of the existing data. It has been duly noted that despite the global recognition of Bhutan’s protected areas system and its conservation achievements, information on the existing biodiversity of these protected areas is not easily accessible. There is also inadequate information on the critical biodiverse areas of the country, making it difficult to make informed decisions for either initiating developmental activities or prioritizing the area for conservation. These gaps are acknowledged and discussed in national documents (NBSAP 2014). In order to provide easy access to comprehensive biodiversity data and information of the country and to ensure the judicious use of our scarce resources, there is a compelling need to establish a coordination mechanism for sharing data on a common platform, not only to overcome the existing gaps but also to enable consolidation and analysis of the data in order to generate information for broader use such as conservation planning or education. Thus in 1994, Bhutan, along with the South-South Cooperation (PSC 2009), which included Benin and Costa Rica, initiated a basic biodiversity information system in each country, funded by the Kingdom of Netherlands. In 2008, the National Biodiversity Centre (NBC) developed a web-based biodiversity portal, which was subsequently upgraded to the status of a national biodiversity information clearing house in 2010. However, because of the vastness and variety of biodiversity data, it was not feasible for a single agency to collect as well as curate these vast data. Thus, in early 2013, the Centre proposed the formation of a consortium to manage biodiversity data through a strengthened and an improved version of a web-based portal. In addition, this initiative to form a consortium amongst different biodiversity stakeholders, was also to address the issue of duplicative efforts in developing and managing isolated information systems and databases. The Bhutan Biodiversity Portal (www.biodiversity.bt) was launched on 17th December 2013. Currently, the observation data has crossed 63,000 of all taxa owing mostly to the efforts of a mass campaign across the country. However, one of the major challenges is the availability of active taxonmic curators especially for the understudied taxonomic groups such as invertebrates. In addition, some users prefer social media over the portal due to its user-friendliness.


Author(s):  
Christian Köhler

Automated observations of natural occurrences play a key role in monitoring biodiversity worldwide. With the development of affordable hardware like the AudioMoth (Hill et al. 2019) acoustic logger, large scale and long-term monitoring has come within reach. However, data management and dissemination of monitoring data remain challenging, as the development of software and the infrastructure for the management of monitoring data lag behind. We want to fill this gap, providing a complete audio monitoring solution from affordable audio monitoring hardware, custom data management tools and storage infrastructure based on open source hard- and software, biodiversity information standards and integrable interfaces. The Scientific Monitoring Data Management and Online Repository (SIMON) consists of a portable data collector and a connected online repository. The data collector, a device for the automated extraction of the audio data from the audio loggers in the field, stores the data and metadata in an internal cache. Once connected to the internet via WiFi or a cable connection, the data are automatically uploaded to an online repository for automated analysis, annotation, data management and dissemination. To prevent SIMON from becoming yet another proprietary storage, the FAIR principles (Findable, Accessible, Interoperable, and Re-usable) Wilkinson et al. (2016) are at the very core of data managed in the online repository. We plan to offer an API (application programming interface) to disseminate data to established data infrastructures. A second API will allow the use of external services for data enrichment. While in the planning phase, we would like to take the opportunity to discuss with domain experts the requirements and implementation of different standards—namely ABCD (Access to Biological Collections Data task group, Biodiversity Information Standards (TDWG) 2007), Darwin Core (Darwin Core Task Group, Biodiversity Information Standards (TDWG) 2009) and Darwin Core Archive (Remsen et al. 2017)—connecting to external services and targeting data infrastructures.


Author(s):  
Franck Theeten ◽  
Marielle Adam ◽  
Thomas Vandenberghe ◽  
Mathias Dillen ◽  
Patrick Semal ◽  
...  

The Royal Belgian Institute of Natural Sciences (RBINS), the Royal Museum for Central Africa (RMCA) and Meise Botanic Garden house more than 50 million specimens covering all fields of natural history. While many different research topics have their own specificities, throughout the years it became apparent that with regards to collection data management, data publication and exchange via community standards, collection holding institutions face similar challenges (James et al. 2018, Rocha et al. 2014). In the past, these have been tackled in different ways by Belgian natural history institutions. In addition to local and national collaborations, there is a great need for a joint structure to share data between scientific institutions in Europe and beyond. It is the aim of large networks and infrastructures such as the Global Biodiversity Information Facility (GBIF), the Biodiversity Information Standards (TDWG), the Distributed System of Scientific collections (DiSSCo) and the Consortium of European Taxonomic Facilities (CETAF) to further implement and improve these efforts, thereby gaining ever increasing efficiencies. In this context, the three institutions mentioned above, submitted the NaturalHeritage project (http://www.belspo.be/belspo/brain-be/themes_3_HebrHistoScien_en.stm) granted in 2017 by the Belgian Science Policy Service, which runs from 2017 to 2020. The project provides links among databases and services. The unique qualities of each database are maintained, while the information can be concentrated and exposed in a structured way via one access point. This approach aims also to link data that are unconnected at present (e.g. relationship between soil/substrate, vegetation and associated fauna) and to improve the cross-validation of data. (1) The NaturalHeritage prototype (http://www.naturalheritage.be) is a shared research portal with an open access infrastructure, which is still in the development phase. Its backbone is an ElasticSearch catalogue, with Kibana, and a Python aggregator gathering several types of (re)sources: relational databases, REpresentational State Transfer (REST) services of objects databases and bibliographical data, collections metadata and the GBIF Internet Publishing Toolkit (IPT) for observational and taxonomical data. Semi-structured data in English are semantically analysed and linked to a rich autocomplete mechanism. Keywords and identifiers are indexed and grouped in four categories (“what”, “who”, “where”, “when”). The portal can act also as an Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH) service and ease indexing of the original webpage on the internet with microdata enrichment. (2) The collection data management system of DaRWIN (Data Research Warehouse Information Network) of RBINS and RMCA has been improved as well. External (meta)data requirements, i.e. foremost publication into or according to the practices and standards of GBIF and OBIS (Ocean Biogeographic Information System: https://obis.org) for biodiversity data, and INSPIRE (https://inspire.ec.europa.eu) for geological data, have been identified and evaluated. New and extended data structures have been created to be compliant with these standards, as well as the necessary procedures developed to expose the data. Quality control tools for taxonomic and geographic names have been developed. Geographic names can be hard to confirm as their lack of context often requires human validation. To address this a similarity measure is used to help map the result. Species, locations, sampling devices and other properties have been mapped to the World Register of Marine Species and DarwinCore (http://www.marinespecies.org), Marine Regions and GeoNames, the AGRO Agronomy and Vertebrate trait ontologies and the British Oceanographic Data Centre (BODC) vocabularies (http://www.obofoundry.org/ontology/agro.html). Extensive mapping is necessary to make use of the ExtendedMeasurementOrFact Extension of DarwinCore (https://tools.gbif.org/dwca-validator/extensions.do). External (meta)data requirements, i.e. foremost publication into or according to the practices and standards of GBIF and OBIS (Ocean Biogeographic Information System: https://obis.org) for biodiversity data, and INSPIRE (https://inspire.ec.europa.eu) for geological data, have been identified and evaluated. New and extended data structures have been created to be compliant with these standards, as well as the necessary procedures developed to expose the data. Quality control tools for taxonomic and geographic names have been developed. Geographic names can be hard to confirm as their lack of context often requires human validation. To address this a similarity measure is used to help map the result. Species, locations, sampling devices and other properties have been mapped to the World Register of Marine Species and DarwinCore (http://www.marinespecies.org), Marine Regions and GeoNames, the AGRO Agronomy and Vertebrate trait ontologies and the British Oceanographic Data Centre (BODC) vocabularies (http://www.obofoundry.org/ontology/agro.html). Extensive mapping is necessary to make use of the ExtendedMeasurementOrFact Extension of DarwinCore (https://tools.gbif.org/dwca-validator/extensions.do).


2018 ◽  
Vol 2 ◽  
pp. e26060
Author(s):  
Pamela Soltis

Digitized natural history data are enabling a broad range of innovative studies of biodiversity. Large-scale data aggregators such as Global Biodiversity Information facility (GBIF) and Integrated Digitized Biocollections (iDigBio) provide easy, global access to millions of specimen records contributed by thousands of collections. A developing community of eager users of specimen data – whether locality, image, trait, etc. – is perhaps unaware of the effort and resources required to curate specimens, digitize information, capture images, mobilize records, serve the data, and maintain the infrastructure (human and cyber) to support all of these activities. Tracking of specimen information throughout the research process is needed to provide appropriate attribution to the institutions and staff that have supplied and served the records. Such tracking may also allow for annotation and comment on particular records or collections by the global community. Detailed data tracking is also required for open, reproducible science. Despite growing recognition of the value and need for thorough data tracking, both technical and sociological challenges continue to impede progress. In this talk, I will present a brief vision of how application of a DOI to each iteration of a data set in a typical research project could provide attribution to the provider, opportunity for comment and annotation of records, and the foundation for reproducible science based on natural history specimen records. Sociological change – such as journal requirements for data deposition of all iterations of a data set – can be accomplished using community meetings and workshops, along with editorial efforts, as were applied to DNA sequence data two decades ago.


2018 ◽  
Vol 2 ◽  
pp. e25882
Author(s):  
Maarten Schermer ◽  
Daphne Duin

The value of data present in natural history collections for research and collection management cannot be overstated. Naturalis Biodiversity Center, home to one of the largest natural history collections in the world, completed a large-scale digitisation project resulting in the registration of more than 38 million objects, many of them annotated with descriptive metadata, such as geographic coordinates and multimedia content. While digitisation is ongoing, we are now also looking for ways to leverage our digital collection, both for the benefit of collection management and that of networking with other natural history collections. To this end, we developed the Netherlands Biodiversity Data Services, providing centralized access to our collection data via state of the art, open access interfaces. Full, centralized access to the digital collection allows us to combine the data with other sources, such as collection scans focusing on the physical condition and accessibility of the collection. But also with data from external sources, such as the collection information of sister institutions, allowing for combining and comparing data, and exploring areas where collections can reinforce each other. Focusing on availability and accessibility, the services were deliberately designed as a versatile, low-level API to allow the use of our data with a broad variety of applications and services. These applications range from scientific research and remote mobile access to collection information, to “mash ups” with other data sources, apps and application in our own museum. We will demonstrate this range of applications through several examples, including the embedding of data in websites (example, Dutch Caribbean Species Register: http://www.dutchcaribbeanspecies.org/linnaeus_ng/app/views/species/nsr_taxon.php?id=177968&cat=165), use in the development of deep learning models, thematic portals (example, Naturalis meteorite collection: http://bioportal.naturalis.nl/result?theme=meteorites&language=en) and the development of Java- and R-clients. This presentation ties in with Max Caspers' presentation “Advancing collections management with the Netherlands Biodiversity Data Services“, in which he will demonstratie the potential of the services described in this presentation for the area of collections management, specifically.


Author(s):  
Olaf Banki ◽  
Letty Stupers ◽  
Marijn Prins

Within the Netherlands, large scale digitization efforts of natural science collections have taken place in recent years. This has led to a wealth of digital information on natural science collections. Still, large quantities of collection data remain untapped and undigitized. The usage of all these digital collections data as driver for science and society remains underexplored. Especially important, is the opportunity for such data to be combined and/or enriched with other data types with the aim to empower different user groups. A consortium of Dutch partners has committed themselves in working together to make biological and geological collections into a joint research infrastructure, underpinning other research infrastructures and scientific uses also beyond the biodiversity research domain. This consortium combines the Dutch contribution to the Distributed Systems of Scientific Collections (DiSSCo), LifeWatch, the Catalogue of Life and the Global Biodiversity Information facility, under the coordination of the Netherlands Biodiversity Information Facility. As part of a preparatory project for DiSSCo, funded by the Dutch science council, we connected the different users groups of collection managers (data providers), scientists (end-users), IT-specialists and policymakers. With collection managers we explored how to move towards an overview of all natural science collections in the Netherlands. In addition, we studied to what extent collection holdings of different musea could be combined, managed, and shared into one research infrastructure. Using a research data management cycle perspective, we surveyed and interviewed the Dutch research community for the barriers and opportunities in using natural science collections and related data. The outcomes of the project should lead to the next steps in creating a more comprehensive and inclusive biodiversity research data infrastructure in the Netherlands that interacts seamlessly with existing international research infrastructures, including DiSSCo.


Sign in / Sign up

Export Citation Format

Share Document