Visualizing natural history collection data provides insight into collection development and bias

Biodiversity Data Journal ◽

10.3897/bdj.6.e26741 ◽

2018 ◽

Vol 6 ◽

Cited By ~ 1

Author(s):

Vaughn Shirey

Keyword(s):

Natural History ◽

Large Body ◽

Aggregated Data ◽

Natural History Collections ◽

Natural History Collection ◽

Drexel University ◽

Collection Data ◽

Spatial Domains ◽

Life On Earth ◽

Insight Into

Natural history collections contain estimated billions of records representing a large body of knowledge about the diversity and distribution of life on Earth. Assessments of various forms of bias within the aggregated data associated with specimens in these collections have been conducted across temporal, taxonomic, and spatial domains. Considering that these biases are the sum of biases across all contributing collections to aggregate datasets, the assessment of bias at the collection level is warranted. Interactive visualization provides a powerful tool for the assessment of these biases and insight into the historical development of natural history collections, providing context for where sources of bias may originate and developing historical narratives to clarify our understanding of our own knowledge about life on Earth. Here, I present a case study on using Sankey diagrams to illustrate the development of the entomology type collection at the Academy of Natural Sciences of Drexel University in Philadelphia, Pennsylvania with the hope that extensions of these practices among individual natural history collections are modified and adopted.

Download Full-text

A Comprehensive Phylogenomic Platform for Exploring the Angiosperm Tree of Life

10.1101/2021.02.22.431589 ◽

2021 ◽

Author(s):

William J. Baker ◽

Paul Bailey ◽

Vanessa Barber ◽

Abigail Barker ◽

Sidonie Bellot ◽

...

Keyword(s):

Natural History ◽

Flowering Plants ◽

Tree Of Life ◽

Flowering Plant ◽

Target Sequence ◽

Sequence Capture ◽

Natural History Collections ◽

Public Data ◽

Data Release ◽

Life On Earth

AbstractThe tree of life is the fundamental biological roadmap for navigating the evolution and properties of life on Earth, and yet remains largely unknown. Even angiosperms (flowering plants) are fraught with data gaps, despite their critical role in sustaining terrestrial life. Today, high-throughput sequencing promises to significantly deepen our understanding of evolutionary relationships. Here, we describe a comprehensive phylogenomic platform for exploring the angiosperm tree of life, comprising a set of open tools and data based on the 353 nuclear genes targeted by the universal Angiosperms353 sequence capture probes. This paper (i) documents our methods, (ii) describes our first data release and (iii) presents a novel open data portal, the Kew Tree of Life Explorer (https://treeoflife.kew.org). We aim to generate novel target sequence capture data for all genera of flowering plants, exploiting natural history collections such as herbarium specimens, and augment it with mined public data. Our first data release, described here, is the most extensive nuclear phylogenomic dataset for angiosperms to date, comprising 3,099 samples validated by DNA barcode and phylogenetic tests, representing all 64 orders, 404 families (96%) and 2,333 genera (17%). Using the multi-species coalescent, we inferred a “first pass” angiosperm tree of life from the data, which totalled 824,878 sequences, 489,086,049 base pairs, and 532,260 alignment columns. The tree is strongly supported and highly congruent with existing taxonomy, while challenging numerous hypothesized relationships among orders and placing many genera for the first time. The validated dataset, species tree and all intermediates are openly accessible via the Kew Tree of Life Explorer. This major milestone towards a complete tree of life for all flowering plant species opens doors to a highly integrated future for angiosperm phylogenomics through the systematic sequencing of standardised nuclear markers. Our approach has the potential to serve as a much-needed bridge between the growing movement to sequence the genomes of all life on Earth and the vast phylogenomic potential of the world’s natural history collections.

Download Full-text

Comparison of Automated Georeferencing Tools Using Insect Collection Data

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37345 ◽

2019 ◽

Vol 3 ◽

Author(s):

Leonor Venceslau ◽

Luis Lopes

Keyword(s):

Natural History ◽

Average Distance ◽

Sampling Location ◽

Reference Dataset ◽

Global Biodiversity Information Facility ◽

Google Maps ◽

Reference Location ◽

Natural History Collection ◽

Collection Data ◽

Insect Collection

Major efforts are being made to digitize natural history collections to make these data available online for retrieval and analysis (Beaman and Cellinese 2012). Georeferencing, an important part of the digitization process, consists of obtaining geographic coordinates from a locality description. In many natural history collection specimens, the coordinates of the sampling location are not recorded, rather they contain a description of the site. Inaccurate georeferencing of sampling locations negatively impacts data quality and the accuracy of any geographic analysis on those data. In addition to latitude and longitude, it is important to define a degree of uncertainty of the coordinates, since in most cases it is impossible to pinpoint the exact location retrospectively. This is usually done by defining an uncertainty value represented as a radius around the center of the locality where the sampling took place. Georeferencing is a time-consuming process requiring manual validation; as such, a significant part of all natural history collection data available online are not georeferenced. Of the 161 million records of preserved specimens currently available in the Global Biodiversity Information Facility (GBIF), only 86 million (53.4%) include coordinates. It is therefore important to develop and optimize automatic tools that allow a fast and accurate georeferencing. The objective of this work was to test existing automatic georeferencing services and evaluate their potential to accelerate georeferencing of large collection datasets. For this end, several open-source georeferencing services are currently available, which provide an application programming interface (API) for batch georeferencing. We evaluated five programs: Google Maps, MapQuest, GeoNames, OpenStreetMap, and GEOLocate. A test dataset of 100 records (reference dataset), which had been previously individually georreferenced following Chapman and Wieczorek 2006, was randomly selected from the Museu Nacional de História Natural e da Ciência, Universidade de Lisboa insect collection catalogue (Lopes et al. 2016). An R (R Core Team 2018) script was used to georeference these records using the five services. In cases where multiple results were returned, only the first one was considered and compared with the manually obtained coordinates of the reference dataset. Two factors were considered in evaluating accuracy: Total number of results obtained and Distance to the original location in the reference dataset. Total number of results obtained and Distance to the original location in the reference dataset. Of the five programs tested, Google Maps yielded the most results (99) and was the most accurate with 57 results < 1000 m from the reference location and 79 within the uncertainty radius. GEOLocate provided results for 87 locations, of which 47 were within 1000 m of the correct location, and 57 were within the uncertainty radius. The other 3 services tested all had less than 35 results within 1000 m from the reference location, and less than 50 results within the uncertainty radius. Google Maps and Open Street Map had the lowest average distance from the reference location, both around 5500 m. Google Maps has a usage limit of around 40000 free georeferencing requests per month, beyond which the service is paid, while GEOLocate is free with no usage limit. For large collections, this may be a factor to take into account. In the future, we hope to optimize these methods and test them with larger datasets.

Download Full-text

Integrating natural history collections and comparative genomics to study the genetic architecture of convergent evolution

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2018.0248 ◽

2019 ◽

Vol 374 (1777) ◽

pp. 20180248 ◽

Cited By ~ 14

Author(s):

Sangeet Lamichhaney ◽

Daren C. Card ◽

Phil Grayson ◽

João F. R. Tonini ◽

Gustavo A. Bravo ◽

...

Keyword(s):

Comparative Genomics ◽

Natural History ◽

Convergent Evolution ◽

Phenotypic Diversity ◽

Taxonomic Diversity ◽

Gold Rush ◽

Phenotypic Data ◽

Natural History Collections ◽

Natural History Collection ◽

Genomic Studies

Evolutionary convergence has been long considered primary evidence of adaptation driven by natural selection and provides opportunities to explore evolutionary repeatability and predictability. In recent years, there has been increased interest in exploring the genetic mechanisms underlying convergent evolution, in part, owing to the advent of genomic techniques. However, the current ‘genomics gold rush’ in studies of convergence has overshadowed the reality that most trait classifications are quite broadly defined, resulting in incomplete or potentially biased interpretations of results. Genomic studies of convergence would be greatly improved by integrating deep ‘vertical’, natural history knowledge with ‘horizontal’ knowledge focusing on the breadth of taxonomic diversity. Natural history collections have and continue to be best positioned for increasing our comprehensive understanding of phenotypic diversity, with modern practices of digitization and databasing of morphological traits providing exciting improvements in our ability to evaluate the degree of morphological convergence. Combining more detailed phenotypic data with the well-established field of genomics will enable scientists to make progress on an important goal in biology: to understand the degree to which genetic or molecular convergence is associated with phenotypic convergence. Although the fields of comparative biology or comparative genomics alone can separately reveal important insights into convergent evolution, here we suggest that the synergistic and complementary roles of natural history collection-derived phenomic data and comparative genomics methods can be particularly powerful in together elucidating the genomic basis of convergent evolution among higher taxa. This article is part of the theme issue ‘Convergent evolution in the genomics era: new insights and directions’.

Download Full-text

Digitisation of private collections

Research Ideas and Outcomes ◽

10.3897/rio.6.e57767 ◽

2020 ◽

Vol 6 ◽

Author(s):

Luc Willemse ◽

Veljo Runnel ◽

Hannu Saarenmaa ◽

Ana Casino ◽

Karsten Gödderz

Keyword(s):

Natural History ◽

Communication Strategy ◽

Data Management System ◽

Specific Knowledge ◽

Management Tools ◽

Data Infrastructure ◽

Natural History Collections ◽

Private Collection ◽

Collection Data ◽

Private Collections

Results are presented of a study investigating solutions and procedures to incorporate private natural history collections into the international collections data infrastructure. Results are based on pilot projects carried out in three European countries aimed at approaches on how to best motivate and equip citizen collectors for digitisation: 1) In Estonia, the approach was to outline tools for registering, digitising and publishing private collection data in the biodiversity data management system PlutoF. 2) In Finland, the functionality of FinBIF, a portal offering a popular Notebook Service for citizens to store observations has been expanded to include collection specimens related to a field gathering event. 3) In the Netherlands private collection owners were approached directly and asked to start digitising their collection using dedicated software, either by themselves or with the help of volunteers who were recruited specifically for this task. In addition to management tools, pilots also looked at motivation, persons undertaking the work, scope, planning, specific knowledge or skills required and the platform for online publication. Future ownership, legality of specimens residing in private collections and the use of unique identifiers are underexposed aspects effecting digitisation. Besides streamlining the overall process of digitising private collections and dealing with local, national or international challenges, developing a communication strategy is crucial in order to effectively distribute information and keep private collection owners aware of ongoing developments. Besides collection owners other stakeholders were identified and for each of them a roadmap is outlined aimed at further streamlining the data from private collections into the international infrastructure. In conclusion recommendations are presented based on challenges encountered during this task that are considered important to really make significant progress towards the overall accessibility of data stored in privately held natural history collections.

Download Full-text

The Müritzeum in Waren (Müritz): natural history museum and modern nature discovery centre

DEUQUA Special Publications ◽

10.5194/deuquasp-2-77-2019 ◽

2019 ◽

Vol 2 ◽

pp. 77-81 ◽

Cited By ~ 1

Author(s):

Mathias Küster

Keyword(s):

Natural History ◽

Natural History Museum ◽

Freshwater Species ◽

Lake District ◽

History Museum ◽

Natural History Collections ◽

Northeastern Germany ◽

Insight Into

Abstract. The Müritzeum is a nature discovery centre and a museum in the heart of the Mecklenburg Lake District. It is the first natural history museum in Mecklenburg-Vorpommern, with natural history collections that are over 150 years old, and are still growing today. The collections contain about 290 000 specimens from the fields of botany, zoology and geology. An extensive library and an archive are also part of the museum. Collecting, preserving and researching natural history are our main spheres of activity. The exhibition in the Müritzeum offers the visitor a comprehensive insight into the development of the nature and landscape of northeastern Germany and of Mecklenburg-Vorpommern and the Lake Müritz region in particular. The largest aquarium for indigenous freshwater species in Germany enables visitors to imagine themselves in the underwater world of the Mecklenburg Lake District.

Download Full-text

A botanical demonstration of the potential of linking data using unique identifiers for people

PLoS ONE ◽

10.1371/journal.pone.0261130 ◽

2021 ◽

Vol 16 (12) ◽

pp. e0261130

Author(s):

Anton Güntsch ◽

Quentin Groom ◽

Marcus Ernst ◽

Jörg Holetschek ◽

Andreas Plank ◽

...

Keyword(s):

Pilot Study ◽

Natural History ◽

Index System ◽

External Resources ◽

Central Index ◽

Natural History Collection ◽

Collection Data ◽

The Web

Natural history collection data available digitally on the web have so far only made limited use of the potential of semantic links among themselves and with cross-disciplinary resources. In a pilot study, botanical collections of the Consortium of European Taxonomic Facilities (CETAF) have therefore begun to semantically annotate their collection data, starting with data on people, and to link them via a central index system. As a result, it is now possible to query data on collectors across different collections and automatically link them to a variety of external resources. The system is being continuously developed and is already in production use in an international collection portal.

Download Full-text

Future Challenges in Digitisation of Private Natural History Collections

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37640 ◽

2019 ◽

Vol 3 ◽

Author(s):

Luc Willemse ◽

Emily van Egmond ◽

Veljo Runnel ◽

Hannu Saarenmaa ◽

Ana Rubio ◽

...

Keyword(s):

Natural History ◽

Pilot Project ◽

Communication Strategy ◽

Sensitive Data ◽

Natural History Collections ◽

Raising Awareness ◽

Private Collection ◽

Future Challenges ◽

Collection Data ◽

Private Collections

Specimens held in private natural history collections form an essential, but often neglected part of the specimens held worldwide in natural history collections. When engaging in regional, national or international initiatives aimed at increasing the accessibility of biodiversity data, it is paramount to include private collections as much and as often as possible. Compared to larger collections in national history institutions, private collections present a unique set of challenges: they are numerous, anonymous, small and diverse in all aspects of collection management. In ICEDIG, a design study for DiSSCo these challenges were tackled in task 2 "Inventory of content and incentives for digitisation of small and private collections" under Workpackage 2 "Inventory of current criteria for prioritization of digitization". First, we need to understand the current state and content of private collections within Europe, to identify and tackle challenges more effectively. While some private collections will duplicate material already held in public collections, many are likely to fill more specialised or unusual niches, relevant to the particular collector(s). At present, there is little evidence about the content of private collections and this needs to be explored. In 2018, a European survey was carried out amongst private collection owners to gain more insight in the volume, scope and degree of digitisation of these collections. Based on this survey, all of the respondents’ collections combined are estimated to contain between 9 and 33 million specimens. This is only the tip of the iceberg for private collections in Europe and underlines the importance of these private collections. Digitisation and sharing collection data are activities that are overall considered important among private collection owners. The survey also showed that for those who have not yet started digitising their collection, the provision of tools and information would be most valuable. These and other highlights of the survey will be presented. In addition, protocols for inventories of private collections will be discussed, as well as ways to keep these up to date. To enhance the inclusion of private collections in Europe’s digitisation efforts, we recognise that we mainly have to focus on the challenges regarding the ‘how’ (work-process), and the sharing of information residing in private collections (including ownership, legal issues, sensitive data). Where necessary, we will also draw attention to the ‘why’ (motivation) of digitisation. A communication strategy aimed at raising awareness about digitisation, offering insight in the practicalities to implement digitisation as well as providing answers to issues related to sharing information, is an essential tool. Elements of a communication strategy to further engage private collection owners will be presented, as will conclusions and recommendations. Finally, digitisation and communication aspects related to private collection owners will need to be tested within the community. Therefore, a pilot project is currently (2018-2019) being carried out in Estonia, Finland and the Netherlands to digitise private collections in a variety of settings. Preliminary results will be presented, zooming in on different approaches to include data from private collections in the overall (research) infrastructures.

Download Full-text

Updates to the checklist of the wild bee fauna of Luxembourg as inferred from revised natural history collection data and fieldwork

Biodiversity Data Journal ◽

10.3897/bdj.9.e64027 ◽

2021 ◽

Vol 9 ◽

Author(s):

Fernanda Herrera Mesías ◽

Alexander Weigand

Keyword(s):

Natural History ◽

Historical Context ◽

Natural World ◽

Conservation Strategies ◽

Species List ◽

Wild Bee ◽

Natural History Collection ◽

Scientific Disciplines ◽

Collection Data ◽

Taxonomic Groups

Museums and other institutions curating natural history collections (NHCs) are fundamental entities to many scientific disciplines, as they house data and reference material for varied research projects. As such, biological specimens preserved in NHCs represent accessible physical records of the living world's history. They provide useful information regarding the presence and distribution of different taxonomic groups through space and time. Despite the importance of biological museum specimens, their potential to answer scientific questions, pertinent to the necessities of our current historical context, is often under-explored. The currently-known wild bee fauna of Luxembourg comprises 341 registered species distributed amongst 38 different genera. However, specimens stored in the archives of local NHCs represent an untapped resource to update taxonomic lists, including potentially overlooked findings relevant to the development of national conservation strategies. We re-investigated the wild bee collection of the Zoology Department of the National Museum of Natural History Luxembourg by using morphotaxonomy and DNA barcoding. The collection revision led to the discovery of four species so far not described for the country: Andrena lagopus (Latreille, 1809), Nomada furva (Panzer, 1798), Hoplitis papaveris (Latreille, 1799) and Sphecodes majalis (Pérez, 1903). Additionally, the presence of Nomada sexfasciata (Panzer, 1799), which inexplicably had been omitted by the most current species list, can be re-confirmed. Altogether, our findings increase the number of recorded wild bee species in Luxembourg to 346. Moreover, the results highlight the crucial role of NHCs as repositories of our knowledge of the natural world.

Download Full-text

Historical records of the blotched stingray Urotrygon chilensis (Urotrygonidae: Myliobatiformes) yield insight into species distribution: the importance of natural history collections to questions of zoogeography

Systematics and Biodiversity ◽

10.1080/14772000.2020.1868607 ◽

2021 ◽

pp. 1-9

Author(s):

Nicolás Roberto Ehemann ◽

Francisco Javier García-Rodríguez ◽

Germán Pequeño ◽

Ralf Thiel ◽

José De La Cruz-Agüero

Keyword(s):

Natural History ◽

Species Distribution ◽

Historical Records ◽

Natural History Collections ◽

Insight Into

Download Full-text

Towards a Global Collection Description Standard

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37894 ◽

2019 ◽

Vol 3 ◽

Cited By ~ 2

Author(s):

Niels Raes ◽

Emily van Egmond ◽

Ana Casino ◽

Matt Woodburn ◽

Deborah L Paul

Keyword(s):

Natural History ◽

Large Scale ◽

The United States ◽

Fragmented Landscape ◽

Global Biodiversity Information Facility ◽

Classification Schemes ◽

Natural History Collections ◽

Natural History Collection ◽

World Collection ◽

Scientific Collections

With digitisation of natural history collections over the past decades, their traditional roles — for taxonomic studies and public education — have been greatly expanded into the fields of biodiversity assessments, climate change impact studies, trait analyses, sequencing, 3D object analyses etc. (Nelson and Ellis 2019; Watanabe 2019). Initial estimates of the global natural history collection range between 1.2 and 2.1 billion specimens (Ariño 2010), of which 169 million (8-14% - as of April 2019) are available at some level of digitisation through the Global Biodiversity Information Facility (GBIF). With iDigBio (Integrated Digitized Biocollections) established in the United States and with the European DiSSCo (Distributed Systems of Scientific Collections) accepted on the ESFRI roadmap, it has become a priority to digitize natural history collections at an industrialized scale. Both iDigBio and DiSSCo aim at mobilising, unifying and delivering bio- and geo-diversity information at the scale, form and precision required by scientific communities, and thereby transform a fragmented landscape into a coherent and responsive research infrastructure. In order to prioritise digitisation based on scientific demand, and efficiency using industrial digitisation pipelines, it is required to arrive at a uniform and unambiguously accepted collection description standard that would allow comparing, grouping and analysing natural history collections at diverse levels. Several initiatives attempt to unambiguously describe natural history collections using taxonomic and storage classification schemes. These initiatives include One World Collection, Global Registry of Scientific Collections (GRSciColl), TDWG (Taxonomic Databases Working Group) Natural Collection Descriptions (NCD) and CETAF (Consortium of European Taxonomy Facilities) passports, among others. In a collaborative effort of DiSSCo, ICEDIG (Innovation and consolidation for large scale digitisation of natural heritage), iDigBio, TDWG and the Task Group Collection Digitisation Dashboards, the various schemes were compared in a cross-walk analysis to propose a preliminary natural collection description standard that is supported by the wider community. In the process, two main user groups of collection descriptions standards were identified; scientists and collection managers. The classification produced intends to meet requirements from them both, resulting in three classification schemes that exist in parallel to each other (van Egmond et al. 2019). For scientific purposes a ‘Taxonomic’ and ‘Stratigraphic’ classification were defined, and for management purposes a ‘Storage’ classification. The latter is derived from specimen preservation types (e.g. dried, liquid preserved) defining storage requirements and the physical location of specimens in collection holding facilities. The three parallel collection classifications can be cross-sectioned with a ‘Geographic’ classification to assign sub-collections to major terrestrial and marine regions, which allow scientists to identify particular taxonomic or stratigraphic (sub-)collections from major geographical or marine regions of interest. Finally, to measure the level of digitisation of institutional collections and progress of digitisation through time, the number of digitised specimens for each geographically cross-sectioned (sub-)collection can be derived from institutional collection management systems (CMS). As digitisation has different levels of completeness a ‘Digitisation’ scheme has been adopted to quantify the level of digitisation of a collection from Saarenmaa et al. 2019, ranging from ‘not digitised’ to extensively digitised, recorded in a progressive scale of MIDS (Minimal Information for Digital Specimen). The applicability of this preliminary classification will be discussed and visualized in a Collection Digitisation Dashboards (CDD) to demonstrate how the implementation of a collection description standard allows the identification of existing gaps in taxonomic and geographic coverage and levels of digitisation of natural history collections. This set of common classification schemes and dashboard design (van Egmond et al. 2019) will be contributed to the TDWG Collection Description interest group to ultimately arrive at the common goal of a 'World Collection Catalogue'.

Download Full-text

Visualizing natural history collection data provides insight into collection development and bias

A Comprehensive Phylogenomic Platform for Exploring the Angiosperm Tree of Life

Comparison of Automated Georeferencing Tools Using Insect Collection Data

Integrating natural history collections and comparative genomics to study the genetic architecture of convergent evolution

Digitisation of private collections

The M&#252;ritzeum in Waren (M&#252;ritz): natural history museum and modern nature discovery centre

A botanical demonstration of the potential of linking data using unique identifiers for people

Future Challenges in Digitisation of Private Natural History Collections

Updates to the checklist of the wild bee fauna of Luxembourg as inferred from revised natural history collection data and fieldwork

Historical records of the blotched stingray Urotrygon chilensis (Urotrygonidae: Myliobatiformes) yield insight into species distribution: the importance of natural history collections to questions of zoogeography

Towards a Global Collection Description Standard

The Müritzeum in Waren (Müritz): natural history museum and modern nature discovery centre