scholarly journals Towards a Global Collection Description Standard

Author(s):  
Niels Raes ◽  
Emily van Egmond ◽  
Ana Casino ◽  
Matt Woodburn ◽  
Deborah L Paul

With digitisation of natural history collections over the past decades, their traditional roles — for taxonomic studies and public education — have been greatly expanded into the fields of biodiversity assessments, climate change impact studies, trait analyses, sequencing, 3D object analyses etc. (Nelson and Ellis 2019; Watanabe 2019). Initial estimates of the global natural history collection range between 1.2 and 2.1 billion specimens (Ariño 2010), of which 169 million (8-14% - as of April 2019) are available at some level of digitisation through the Global Biodiversity Information Facility (GBIF). With iDigBio (Integrated Digitized Biocollections) established in the United States and with the European DiSSCo (Distributed Systems of Scientific Collections) accepted on the ESFRI roadmap, it has become a priority to digitize natural history collections at an industrialized scale. Both iDigBio and DiSSCo aim at mobilising, unifying and delivering bio- and geo-diversity information at the scale, form and precision required by scientific communities, and thereby transform a fragmented landscape into a coherent and responsive research infrastructure. In order to prioritise digitisation based on scientific demand, and efficiency using industrial digitisation pipelines, it is required to arrive at a uniform and unambiguously accepted collection description standard that would allow comparing, grouping and analysing natural history collections at diverse levels. Several initiatives attempt to unambiguously describe natural history collections using taxonomic and storage classification schemes. These initiatives include One World Collection, Global Registry of Scientific Collections (GRSciColl), TDWG (Taxonomic Databases Working Group) Natural Collection Descriptions (NCD) and CETAF (Consortium of European Taxonomy Facilities) passports, among others. In a collaborative effort of DiSSCo, ICEDIG (Innovation and consolidation for large scale digitisation of natural heritage), iDigBio, TDWG and the Task Group Collection Digitisation Dashboards, the various schemes were compared in a cross-walk analysis to propose a preliminary natural collection description standard that is supported by the wider community. In the process, two main user groups of collection descriptions standards were identified; scientists and collection managers. The classification produced intends to meet requirements from them both, resulting in three classification schemes that exist in parallel to each other (van Egmond et al. 2019). For scientific purposes a ‘Taxonomic’ and ‘Stratigraphic’ classification were defined, and for management purposes a ‘Storage’ classification. The latter is derived from specimen preservation types (e.g. dried, liquid preserved) defining storage requirements and the physical location of specimens in collection holding facilities. The three parallel collection classifications can be cross-sectioned with a ‘Geographic’ classification to assign sub-collections to major terrestrial and marine regions, which allow scientists to identify particular taxonomic or stratigraphic (sub-)collections from major geographical or marine regions of interest. Finally, to measure the level of digitisation of institutional collections and progress of digitisation through time, the number of digitised specimens for each geographically cross-sectioned (sub-)collection can be derived from institutional collection management systems (CMS). As digitisation has different levels of completeness a ‘Digitisation’ scheme has been adopted to quantify the level of digitisation of a collection from Saarenmaa et al. 2019, ranging from ‘not digitised’ to extensively digitised, recorded in a progressive scale of MIDS (Minimal Information for Digital Specimen). The applicability of this preliminary classification will be discussed and visualized in a Collection Digitisation Dashboards (CDD) to demonstrate how the implementation of a collection description standard allows the identification of existing gaps in taxonomic and geographic coverage and levels of digitisation of natural history collections. This set of common classification schemes and dashboard design (van Egmond et al. 2019) will be contributed to the TDWG Collection Description interest group to ultimately arrive at the common goal of a 'World Collection Catalogue'.

2018 ◽  
Vol 2 ◽  
pp. e26473
Author(s):  
Molly Phillips ◽  
Anne Basham ◽  
Marc Cubeta ◽  
Kari Harris ◽  
Jonathan Hendricks ◽  
...  

Natural history collections around the world are currently being digitized with the resulting data and associated media now shared online in aggregators such as the Global Biodiversity Information Facility and Integrated Digitized Biocollections (iDigBio). These collections and their resources are accessible and discoverable through online portals to not only researchers and collections professionals, but to educators, students, and other potential downstream users. Primary and secondary education (K-12) in the United States is going through its own revolution with many states adopting Next Generation Science Standards (NGSS https://www.nextgenscience.org/). The new standards emphasize science practices for analyzing and interpreting data and connect to cross-cutting concepts such as cause and effect and patterns. NGSS and natural history collections data portals seem to complement each other. Nevertheless, many educators and students are unaware of the digital resources available or are overwhelmed with working in aggregated databases created by scientists. To better address this challenge, participants within the National Science Foundation Advancing Digitization for Biodiversity Collections program (ADBC) have been working to increase awareness of, and scaffold learning for, digitized collections with K-12 educators and learners. They are accomplishing this through individual programs at institutions across the country as part of the Thematic Collections Networks and collaboratively through the iDigBio Education and Outreach Working Group. ADBC partners have focused on incorporating digital data and resources into K-12 classrooms through training workshops and webinars for both educators and collections professionals, as well as through creating educational resources, websites, and applications that use digital collections data. This presentation includes lessons learned from engaging K-12 audiences with digital data, summarizes available resources for both educators and collections professionals, shares how to become involved, and provides ways to facilitate transfer of educational resources to the K-12 community.


Author(s):  
Tim Robertson ◽  
Marcos Gonzalez ◽  
Morten Høfft ◽  
Marie Grosjean

The Global Biodiversity Information Facility (GBIF) was established by governments in 2001, largely through the initiative and leadership of the natural history collections community, following the 1999 recommendation by a working group under the Megascience Forum (predecessor of the Global Science Forum) of the Organization for Economic Cooperation and Development (OECD). Over 20 years, GBIF has helped develop standards and convened a global community of data-publishing institutions, aggregating over one billion species occurrence records freely and openly available for use in research and policy making. Of these more than 150 million records originate from specimens preserved by the collections community. The recent adoption of the Global Registry of Scientific Collections by GBIF (https://www.gbif.org/news/5kyAslpqTVxYqZTwYn1cub) is the first step by GBIF to better enable a picture of the natural history collections of the world along with the associated science that they have and continue to enable. Recognising that other collection metadata initiatives exists, GBIF aims to discuss with the community and progress topics such as: Synchronising with existing metadata catalogues to ensure accurate, up-to-date information is available without unnecessary burden for authors Defining, testing and formalizing the Collection Descriptions standard (https://github.com/tdwg/cd) Providing clear guidelines of citation practice for collections, potentially building on the success of the Digital Object Identifier (DOI) approach used for datasets mediated through GBIF.org. Tracking citations of use through both data downloads and through references in literature, such as materials examined in a taxonomic publication Improving the linkages and discoverability of specimen records derived from the same collecting event but preserved in multiple institutions Improving the linkages between the people involved in collecting, preserving, and identifying specimen records through the use of Open Researcher and Contributor IDs (ORCID) Lowering the technical threshold to deploy tools such as “data dashboards” and specimen search/download on collection related websites Synchronising with existing metadata catalogues to ensure accurate, up-to-date information is available without unnecessary burden for authors Defining, testing and formalizing the Collection Descriptions standard (https://github.com/tdwg/cd) Providing clear guidelines of citation practice for collections, potentially building on the success of the Digital Object Identifier (DOI) approach used for datasets mediated through GBIF.org. Tracking citations of use through both data downloads and through references in literature, such as materials examined in a taxonomic publication Improving the linkages and discoverability of specimen records derived from the same collecting event but preserved in multiple institutions Improving the linkages between the people involved in collecting, preserving, and identifying specimen records through the use of Open Researcher and Contributor IDs (ORCID) Lowering the technical threshold to deploy tools such as “data dashboards” and specimen search/download on collection related websites The progress made to date will be summarised and a roadmap for the future will be introduced.


Author(s):  
Erica Krimmel ◽  
Austin Mast ◽  
Deborah Paul ◽  
Robert Bruhn ◽  
Nelson Rios ◽  
...  

Genomic evidence suggests that the causative virus of COVID-19 (SARS-CoV-2) was introduced to humans from horseshoe bats (family Rhinolophidae) (Andersen et al. 2020) and that species in this family as well as in the closely related Hipposideridae and Rhinonycteridae families are reservoirs of several SARS-like coronaviruses (Gouilh et al. 2011). Specimens collected over the past 400 years and curated by natural history collections around the world provide an essential reference as we work to understand the distributions, life histories, and evolutionary relationships of these bats and their viruses. While the importance of biodiversity specimens to emerging infectious disease research is clear, empowering disease researchers with specimen data is a relatively new goal for the collections community (DiEuliis et al. 2016). Recognizing this, a team from Florida State University is collaborating with partners at GEOLocate, Bionomia, University of Florida, the American Museum of Natural History, and Arizona State University to produce a deduplicated, georeferenced, vetted, and versioned data product of the world's specimens of horseshoe bats and relatives for researchers studying COVID-19. The project will serve as a model for future rapid data product deployments about biodiversity specimens. The project underscores the value of biodiversity data aggregators iDigBio and the Global Biodiversity Information Facility (GBIF), which are sources for 58,617 and 79,862 records, respectively, as of July 2020, of horseshoe bat and relative specimens held by over one hundred natural history collections. Although much of the specimen-based biodiversity data served by iDigBio and GBIF is high quality, it can be considered raw data and therefore often requires additional wrangling, standardizing, and enhancement to be fit for specific applications. The project will create efficiencies for the coronavirus research community by producing an enhanced, research-ready data product, which will be versioned and published through Zenodo, an open-access repository (see doi.org/10.5281/zenodo.3974999). In this talk, we highlight lessons learned from the initial phases of the project, including deduplicating specimen records, standardizing country information, and enhancing taxonomic information. We also report on our progress to date, related to enhancing information about agents (e.g., collectors or determiners) associated with these specimens, and to georeferencing specimen localities. We seek also to explore how much we can use the added agent information (i.e., ORCID iDs and Wikidata Q identifiers) to inform our georeferencing efforts and to support crediting those collecting and doing identifications. The project will georeference approximately one third of our specimen records, based on those lacking geospatial coordinates but containing textual locality descriptions. We furthermore provide an overview of our holistic approach to enhancing specimen records, which we hope will maximize the value of the bat specimens at the center of what has been recently termed the "extended specimen network" (Lendemer et al. 2020). The centrality of the physical specimen in the network reinforces the importance of archived materials for reproducible research. Recognizing this, we view the collections providing data to iDigBio and GBIF as essential partners, as we expect that they will be responsible for the long-term management of enhanced data associated with the physical specimens they curate. We hope that this project can provide a model for better facilitating the reintegration of enhanced data back into local specimen data management systems.


Author(s):  
Leonor Venceslau ◽  
Luis Lopes

Major efforts are being made to digitize natural history collections to make these data available online for retrieval and analysis (Beaman and Cellinese 2012). Georeferencing, an important part of the digitization process, consists of obtaining geographic coordinates from a locality description. In many natural history collection specimens, the coordinates of the sampling location are not recorded, rather they contain a description of the site. Inaccurate georeferencing of sampling locations negatively impacts data quality and the accuracy of any geographic analysis on those data. In addition to latitude and longitude, it is important to define a degree of uncertainty of the coordinates, since in most cases it is impossible to pinpoint the exact location retrospectively. This is usually done by defining an uncertainty value represented as a radius around the center of the locality where the sampling took place. Georeferencing is a time-consuming process requiring manual validation; as such, a significant part of all natural history collection data available online are not georeferenced. Of the 161 million records of preserved specimens currently available in the Global Biodiversity Information Facility (GBIF), only 86 million (53.4%) include coordinates. It is therefore important to develop and optimize automatic tools that allow a fast and accurate georeferencing. The objective of this work was to test existing automatic georeferencing services and evaluate their potential to accelerate georeferencing of large collection datasets. For this end, several open-source georeferencing services are currently available, which provide an application programming interface (API) for batch georeferencing. We evaluated five programs: Google Maps, MapQuest, GeoNames, OpenStreetMap, and GEOLocate. A test dataset of 100 records (reference dataset), which had been previously individually georreferenced following Chapman and Wieczorek 2006, was randomly selected from the Museu Nacional de História Natural e da Ciência, Universidade de Lisboa insect collection catalogue (Lopes et al. 2016). An R (R Core Team 2018) script was used to georeference these records using the five services. In cases where multiple results were returned, only the first one was considered and compared with the manually obtained coordinates of the reference dataset. Two factors were considered in evaluating accuracy: Total number of results obtained and Distance to the original location in the reference dataset. Total number of results obtained and Distance to the original location in the reference dataset. Of the five programs tested, Google Maps yielded the most results (99) and was the most accurate with 57 results < 1000 m from the reference location and 79 within the uncertainty radius. GEOLocate provided results for 87 locations, of which 47 were within 1000 m of the correct location, and 57 were within the uncertainty radius. The other 3 services tested all had less than 35 results within 1000 m from the reference location, and less than 50 results within the uncertainty radius. Google Maps and Open Street Map had the lowest average distance from the reference location, both around 5500 m. Google Maps has a usage limit of around 40000 free georeferencing requests per month, beyond which the service is paid, while GEOLocate is free with no usage limit. For large collections, this may be a factor to take into account. In the future, we hope to optimize these methods and test them with larger datasets.


2019 ◽  
Vol 374 (1777) ◽  
pp. 20180248 ◽  
Author(s):  
Sangeet Lamichhaney ◽  
Daren C. Card ◽  
Phil Grayson ◽  
João F. R. Tonini ◽  
Gustavo A. Bravo ◽  
...  

Evolutionary convergence has been long considered primary evidence of adaptation driven by natural selection and provides opportunities to explore evolutionary repeatability and predictability. In recent years, there has been increased interest in exploring the genetic mechanisms underlying convergent evolution, in part, owing to the advent of genomic techniques. However, the current ‘genomics gold rush’ in studies of convergence has overshadowed the reality that most trait classifications are quite broadly defined, resulting in incomplete or potentially biased interpretations of results. Genomic studies of convergence would be greatly improved by integrating deep ‘vertical’, natural history knowledge with ‘horizontal’ knowledge focusing on the breadth of taxonomic diversity. Natural history collections have and continue to be best positioned for increasing our comprehensive understanding of phenotypic diversity, with modern practices of digitization and databasing of morphological traits providing exciting improvements in our ability to evaluate the degree of morphological convergence. Combining more detailed phenotypic data with the well-established field of genomics will enable scientists to make progress on an important goal in biology: to understand the degree to which genetic or molecular convergence is associated with phenotypic convergence. Although the fields of comparative biology or comparative genomics alone can separately reveal important insights into convergent evolution, here we suggest that the synergistic and complementary roles of natural history collection-derived phenomic data and comparative genomics methods can be particularly powerful in together elucidating the genomic basis of convergent evolution among higher taxa. This article is part of the theme issue ‘Convergent evolution in the genomics era: new insights and directions’.


2015 ◽  
Vol 33 (15_suppl) ◽  
pp. e15120-e15120
Author(s):  
Humaid Obaid Al-Shamsi ◽  
Reham Abdel-Wahab ◽  
Manal Hassan ◽  
Gehan Botrus ◽  
Ahmed S Shalaby ◽  
...  

Author(s):  
David Shorthouse ◽  
Roderic Page

Through the Bloodhound proof-of-concept, https://bloodhound-tracker.net an international audience of collectors and determiners of natural history specimens are engaged in the emotive act of claiming their specimens and attributing other specimens to living and deceased mentors and colleagues. Behind the scenes, these claims build links between Open Researcher and Contributor Identifiers (ORCID, https://orcid.org) or Wikidata identifiers for people and Global Biodiversity Information Facility (GBIF) specimen identifiers, predicated by the Darwin Core terms, recordedBy (collected) and identifiedBy (determined). Here we additionally describe the socio-technical challenge in unequivocally resolving people names in legacy specimen data and propose lightweight and reusable solutions. The unique identifiers for the affiliations of active researchers are obtained from ORCID whereas the unique identifiers for institutions where specimens are actively curated are resolved through Wikidata. By constructing closed loops of links between person, specimen, and institution, an interesting suite of potential metrics emerges, all due to the activities of employees and their network of professional relationships. This approach balances a desire for individuals to receive formal recognition for their efforts in natural history collections with that of an institutional-level need to alter budgets in response to easily obtained numeric trends in national and international reach. If handled in a coordinating fashion, this reporting technique may be a significant new driver for specimen digitization efforts on par with Altmetric, https://www.altmetric.com, an important new tool that tracks the impact of publications and delights administrators and authors alike.


Author(s):  
Wouter Addink ◽  
Dimitrios Koureas ◽  
Ana Rubio

European Natural Science Collections (NSC) are part of the global natural and cultural capital and represent 80% of the world bio-and geo-diversity. Data derived from these collections underpin thousands of scholarly publications and official reports (used to support legislative and regulatory processes relating to health, food, security, sustainability and environmental change) and let to inventions and products that today play an important role in our bio-economy. In the last decades, the research practice in natural sciences changed dramatically. Advances in digital, genomic and information technologies enable natural science collections to provide new insights but also ask for changing the current operational and business models of individual collections held at local natural history museums and universities. A new business model that provides unified access to collection objects and all scientific data derived from them. Although aggregating infrastructures like the Global Biodiversity Information Facility, GenBank and Catalogue of Life now successfully aggregate data on specific data classes, the landscape remains fragmented with limited capacity to bring together this information in a systematic and robust manner and with scattered access to the physical objects. The Distributed System of Scientific Collections (DiSSCo) represents a pan-European initiative, and the largest ever agreement of natural science museums, to jointly address the fragmentation of European collections. DiSSCo is unifying European natural science collections into a coherent new research infrastructure, able to provide bio- and geo-diversity data at the scale, form and precision required by a multi-disciplinary user base in science. DiSSCo is harmonising digitisation, curation and publication processes and workflows across the scientific collections in Europe and enables linking of occurrence, genomic, chemical and morphological data classes as well as publications and experts to the physical object. In this paper we will present the socio-cultural and governance aspects of this research infrastructure. DiSSCo is receiving political support from 11 countries in Europe and will gradually change its funding model from institutional to national funding, with temporary funding from the EC to support the preparation and development. Solutions to achieve large scale digitisation are currently designed in the EC funded ICEDIG project to underpin the future large scale digitisation carried out by the countries. Unified virtual (digitisation on demand) and transnational physical access to the collections is over the next four years being developed in the EC funded SYNTHESYS+ project. The governance of DiSSCo is designed to gradually change from a steering committee composed of a few large natural history museums contributing in cash to initiate the development into a legal entity in which national consortia are represented, with a central coordination office for daily management. Each country individually decides how its entities (scientific collection facilities, research councils, governmental bodies) are organised in their national consortium. A stakeholder and user forum, Scientific Advisory Board and International Advisory Board will ensure that DiSSCo will be functional in enabling science across disciplines and within the international landscape of infrastructures. Training and short scientific missions are being developed in the MOBILISE COST Action to build capacity in FAIR data production, publication and usage of scientific collection-derived data in Europe and to initiate the socio-cultural changes needed in the collection-holding institutes. A Helpdesk is being constructed in the SYNTHESYS+ and DiSSCo Prepare projects to further facilitate the use and scientific use cases have been collected in ICEDIG to develop and facilitate e-services tailored to scientific needs.


2018 ◽  
Vol 374 (1763) ◽  
pp. 20170391 ◽  
Author(s):  
Gil Nelson ◽  
Shari Ellis

The first two decades of the twenty-first century have seen a rapid rise in the mobilization of digital biodiversity data. This has thrust natural history museums into the forefront of biodiversity research, underscoring their central role in the modern scientific enterprise. The advent of mobilization initiatives such as the United States National Science Foundation's Advancing Digitization of Biodiversity Collections (ADBC), Australia's Atlas of Living Australia (ALA), Mexico's National Commission for the Knowledge and Use of Biodiversity (CONABIO), Brazil's Centro de Referência em Informação (CRIA) and China's National Specimen Information Infrastructure (NSII) has led to a rapid rise in data aggregators and an exponential increase in digital data for scientific research and arguably provide the best evidence of where species live. The international Global Biodiversity Information Facility (GBIF) now serves about 131 million museum specimen records, and Integrated Digitized Biocollections (iDigBio) in the USA has amassed more than 115 million. These resources expose collections to a wider audience of researchers, provide the best biodiversity data in the modern era outside of nature itself and ensure the primacy of specimen-based research. Here, we provide a brief history of worldwide data mobilization, their impact on biodiversity research, challenges for ensuring data quality, their contribution to scientific publications and evidence of the rising profiles of natural history collections. This article is part of the theme issue ‘Biological collections for understanding biodiversity in the Anthropocene’.


Sign in / Sign up

Export Citation Format

Share Document