Taxonomic Gap Analysis: A method of evaluating the taxa represented in biodiversity databases

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37288 ◽

2019 ◽

Vol 3 ◽

Author(s):

Amanda Devine ◽

Jonathan Coddington

Keyword(s):

Gap Analysis ◽

Dna Barcode ◽

Global Biodiversity Information Facility ◽

High Quality ◽

Taxonomic Rank ◽

Genetic Sequencing ◽

Physical Resources ◽

Data Portal ◽

Taxonomic Novelty ◽

Life On Earth

The Global Genome Initiative (GGI) endeavors to collect the Earth’s genomic biodiversity, preserve this biodiversity as high quality genetic resources in Global Genome Biodiversity Network (GGBN) affiliated biorepositories, increase knowledge of biodiversity through genetic sequencing, and make resources and knowledge accessible to researchers via the GGBN Data Portal, the Global Catalogue of Microorganisms (GCM), and the National Center for Biotechnology Information (NCBI) GenBank. In GGI's seven year timespan, it is attempting to collect samples from all 9,870 families and half of the 165,683 genera of life on Earth (Roskov et al. 2019). To accomplish this, GGI must synergistically consider the following questions: What life exists? What has already been preserved as physical resources? What is already known from genetic sequencing? How will novel or legacy collections fill the gaps in resources or knowledge? What life exists? What has already been preserved as physical resources? What is already known from genetic sequencing? How will novel or legacy collections fill the gaps in resources or knowledge? To answer the first question, GGI has explored the use of taxonomic authorities such as the Global Biodiversity Information Facility (GBIF) Backbone Taxonomy and the Catalogue of Life as taxonomic backbones to variously match taxonomic names and derive complete lists of extant taxa at each taxonomic rank. To answer the second question, GGI utilizes the GGBN Data Portal API and the GCM website to extract lists of taxonomic names, which are then standardized to a taxonomic backbone. To answer the third question, following the recommendations of Hanner 2009 for identifying high-quality DNA barcode records, GGI employs the NCBI Entrez Programming Utilities to download GenBank records, then standardizes the associated taxa to a taxonomic backbone. Finally, GGI compares lists of taxa found in specific geographic areas or specific legacy collections to determine the amount of taxonomic novelty a new collection may supply. GGI refers to this comparison of taxonomic databases as a taxonomic gap analysis, an assessment of how well a potential collection fills the taxonomic gaps in physical collections and genetic knowledge. A gap analysis performed by GGI in March 2019 shows that 49% of families and 78% of genera still have no representation as either physical samples or genetic information (Table 1). There are substantial gaps to fill in the endeavor to capture the Earth's biodiversity, and taxonomic gap analysis will continue to be a powerful tool to identify the most promising potential new collections.

Download Full-text

Shaping our Taxonomic Legacy through Openly Sharing Primary Biodiversity Data in Taxonomic Revisions

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37062 ◽

2019 ◽

Vol 3 ◽

Author(s):

Torsten Dikow

Keyword(s):

Data Dissemination ◽

Original Description ◽

Published Data ◽

Data Standards ◽

Global Biodiversity Information Facility ◽

High Quality ◽

Data Repositories ◽

The Past ◽

Text Document ◽

Digital Format

Taxonomy has a long tradition of describing earth’s biodiversity. For the past 20 years or so, taxonomic revisions have become available in PDF format, which is regarded by most practicing taxonomists to be a good means of digital dissemination. However, a PDF document is nothing more than a text document that can be transferred easily for viewing among researchers and computer platforms. In today’s world, traditional taxonomic techniques need to be met with novel tools to make data dissemination a reality, make species hypotheses more robust, and open the field up to rigorous scientific testing. Here, I argue that high-quality taxonomic output is not just the publication of detailed species descriptions and re-descriptions, precise taxon delimitations, easy-to-use identification keys, and comprehensively undertaken and illustrated revisions. Rather, in addition high-quality taxonomic output embraces digital workflows and data standards to disseminate captured and published data in structured, machine-readable formats to data repositories so as to make all data openly accessible. Imagine that a taxonomist today has every original description and every subsequent re-description of a species at her/his fingertips online, has every specimen photograph produced by a previous reviser digitally available in the original resolution, and can take advantage of existing, openly accessible data and resources produced by peers in digital format in the past. When we as taxonomists provide such findable, accessible, interoperable, and reusable (FAIR) data, the future of biodiversity discovery will accelerate and our own taxonomic legacy will be enhanced. Cybertaxonomic tools provide methods to accomplish this goal and their use and implementation is here summarized in the context of revisionary taxonomy from the standpoint of a publishing taxonomist. While many of the tools have been around for some time now, very few taxonomists embrace and utilize these tools in their publications. This presentation will provide information on what kind of data can and should be openly shared (e.g., specimen occurrence data, digital images, names, descriptions, authors) and outline best practices utilizing globally unique identifiers for specimens and data. Data standards and the best-suited data repositories such as the Global Biodiversity Information Facility (GBIF) and Zenodo, with its Biodiversity Literature Repository, and the Plazi TreatmentBank, an emerging species portal, are discussed to illustrate retrospective and prospective data capture of taxonomic revisions.

Download Full-text

Conclusion

Just Property ◽

10.1093/oso/9780198787105.003.0012 ◽

2020 ◽

pp. 272-306

Author(s):

Christopher Pierson

Keyword(s):

Wealth Inequality ◽

Global Scale ◽

Sovereign Wealth Funds ◽

Alternative Strategies ◽

Natural Right ◽

Legal Studies ◽

Basic Capital ◽

Physical Resources ◽

Life On Earth ◽

Property Regime

In the Conclusion, I return to the key questions raised at the very opening of the first volume of Just Property: what should we do if present levels of wealth inequality cannot be justified? and what consequences follow for our property order from a recognition that the physical resources upon which life on Earth depends are running out? I first establish what present levels of wealth inequality on a global scale look like. I suggest that both (libertarian) arguments from a natural right to appropriate, and alternative strategies built around a ‘no-property’ regime cannot do the work required to make them persuasive. I argue that we need a legal property order, but that it must be grounded in something other than individual natural right. Working through arguments in American critical legal studies, I argue that we need a property regime that is democratically chosen—but in a democracy which is substantially re-tooled. On the issue of depleting resources, I turn to work on ‘green property’ which suggests ways in which we can incorporate a concern with sustainability and limits into our understanding of what and how we can own. Although we have very limited reasons for optimism, I finish by identifying three policy options that might accord with this revised view of a good property order: land value taxation, basic capital/income, and an amplified system of sovereign wealth funds.

Download Full-text

A contribution to the earthworm diversity (Clitellata, Moniligastridae) of Kerala, a component of the Western Ghats biodiversity hotspot, India, using integrated taxonom

Animal Biodiversity and Conservation ◽

10.32800/abc.2021.44.0117 ◽

2021 ◽

pp. 117-137

Author(s):

S. S. Thakur ◽

A. R. Lone ◽

S. K. Tiwari ◽

S. K. Jain ◽

S. W. James ◽

...

Keyword(s):

Western Ghats ◽

National Park ◽

Gap Analysis ◽

Mitochondrial Gene ◽

Dna Barcode ◽

Wildlife Sanctuary ◽

Tiger Reserve ◽

First Time ◽

Barcode Gap

Earthworms (Clitellata, Moniligastridae) of Chaliyar River Malappuram, Eravikulam National Park, Neyyar Wildlife Sanctuary, Parambikulam Tiger Reserve, Peppara Wildlife Sanctuary, Periyar National Park, Shendurney Wildlife Sanctuary and Wayanad Forest, Kerala, a component of the hotspot of Western Ghats, India, were studied by the standard method of taxonomy, and their DNA barcode signatures using the mitochondrial gene cytochrome c oxidase I (COI) were generated for the first time. This study represents eleven species of earthworms of the family Moniligastridae: Drawida brunnea Stephenson, Drawida circumpapillata Aiyer, Drawida ghatensis Michaelsen, Drawida impertusa Stephenson, Drawida nilamburensis (Bourne), Drawida robusta (Bourne), Drawida scandens Rao, Drawida travancorense Michaelsen, Moniligaster aiyeri Gates, Moniligaster deshayesi Perrier, and Moniligaster gravelyi (Stephenson). In the phylogenetic analysis all the species were recovered in both neighbour–joining (NJ) and maximum likelihood (ML) trees with high clade support. The average K2P distance within and between species was 1.2 % and 22 %, whereas the clear barcode gap of 2–5 % was suggested by barcode gap analysis (BGA) of studied species, reflecting the accuracy of characterization. The study presents the first step in the molecular characterization of the native earthworm family Moniligastridae of India. Data published through GBIF (Doi: 10.15470/l2nlhz)

Download Full-text

Sustainable Educational Robotics. Contingency Plan during Lockdown in Primary School

Sustainability ◽

10.3390/su13158388 ◽

2021 ◽

Vol 13 (15) ◽

pp. 8388

Author(s):

Judit Alamo ◽

Eduardo Quevedo ◽

Alejandro Santana ◽

Samuel Ortega ◽

Himar Fabelo ◽

...

Keyword(s):

Primary Education ◽

New Technologies ◽

Training Needs ◽

Online Resources ◽

Educational Robotics ◽

High Quality ◽

Face To Face ◽

Physical Resources ◽

Digital Skills ◽

Need To Evaluate

New technologies have offered great alternatives for education. In this context, we place robotics and programming as innovative and versatile tools that adapt to active methodologies. With the arrival of COVID-19 and lockdowns, physical resources were kept out of use, and the virtual lectures did not propose to incorporate these elements in a meaningful way. This recent situation raises as an objective of study the need to evaluate if robotics and programming are content that can be taught virtually in these circumstances, without physical resources and without face-to-face lectures. To do this, a mixed methodology consisting of questionnaires and interviews has been incorporated, aimed at primary education teachers, families, and primary education grade students. The results suggest that the virtualization of robotics and programming is a feasible and beneficial alternative for students, which allows the development of digital skills, while it is enhanced with the use of audiovisual materials and online resources. Even though face-to-face classes have other benefits not offered by virtualization, and teacher training needs to be up to the task to face this situation, it is a matter of time to respond to these situations and to guarantee a high-quality distance education.

Download Full-text

Unlocking the Entomological Collection of the Natural History Museum of Maputo, Mozambique

Biodiversity Data Journal ◽

10.3897/bdj.9.e64461 ◽

2021 ◽

Vol 9 ◽

Author(s):

Domingos Sandramo ◽

Enrico Nicosia ◽

Silvio Cianciullo ◽

Bernardo Muatinte ◽

Almeida Guissamulo

Keyword(s):

Natural History ◽

Crucial Role ◽

Development Programme ◽

Natural History Museum ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

History Museum ◽

Data Portal ◽

Global Biodiversity ◽

Biodiversity Information

The collections of the Natural History Museum of Maputo have a crucial role in the safeguarding of Mozambique's biodiversity, representing an important repository of data and materials regarding the natural heritage of the country. In this paper, a dataset is described, based on the Museum’s Entomological Collection recording 409 species belonging to seven orders and 48 families. Each specimen’s available data, such as geographical coordinates and taxonomic information, have been digitised to build the dataset. The specimens included in the dataset were obtained between 1914–2018 by collectors and researchers from the Natural History Museum of Maputo (once known as “Museu Alváro de Castro”) in all the country’s provinces, with the exception of Cabo Delgado Province. This paper adds data to the Biodiversity Network of Mozambique and the Global Biodiversity Information Facility, within the objectives of the SECOSUD II Project and the Biodiversity Information for Development Programme. The aforementioned insect dataset is available on the GBIF Engine data portal (https://doi.org/10.15468/j8ikhb). Data were also shared on the Mozambican national portal of biodiversity data BioNoMo (https://bionomo.openscidata.org), developed by SECOSUD II Project.

Download Full-text

Data Location Quality at GBIF

Biodiversity Information Science and Standards ◽

10.3897/biss.3.35829 ◽

2019 ◽

Vol 3 ◽

Author(s):

John Waller

Keyword(s):

Data Quality ◽

Open Data ◽

R Package ◽

Large Network ◽

Global Biodiversity Information Facility ◽

Quality Issue ◽

Data Portal ◽

Quality Issues ◽

Data Location ◽

Biodiversity Information

I will cover how the Global Biodiversity Information Facility (GBIF) handles data quality issues, with specific focus on coordinate location issues, such as gridded datasets (Fig. 1) and country centroids. I will highlight the challenges GBIF faces identifying potential data quality problems and what we and others (Zizka et al. 2019) are doing to discover and address them. GBIF is the largest open-data portal of biodiversity data, which is a large network of individual datasets (> 40k) from various sources and publishers. Since these datasets are variable both within themselves and dataset-to-dataset, this creates a challenge for users wanting to use data collected from museums, smartphones, atlases, satellite tracking, DNA sequencing, and various other sources for research or analysis. Data quality at GBIF will always be a moving target (Chapman 2005), and GBIF already handles many obvious errors such as zero/impossible coordinates, empty or invalid data fields, and fuzzy taxon matching. Since GBIF primarily (but not exclusively) serves lat-lon location information, there is an expectation that occurrences fall somewhat close to where the species actually occurs. This is not always the case. Occurrence data can be hundereds of kilometers away from where the species naturally occur, and there can be multiple reasons for why this can happen, which might not be entirely obvious to users. One reasons is that many GBIF datasets are gridded. Gridded datasets are datasets that have low resolution due to equally-spaced sampling. This can be a data quality issue because a user might assume an occurrence record was recorded exactly at its coordinates. Country centroids are another reason why a species occurrence record might be far from where it occurs naturally. GBIF does not yet flag country centroids, which are records where the dataset publishers has entered the lat-long center of a country instead of leaving the field blank. I will discuss the challenges surrounding locating these issues and the current solutions (such as the CoordinateCleaner R package). I will touch on how existing DWCA terms like coordinateUncertaintyInMeters and footprintWKT are being utilized to highlight low coordinate resolution. Finally, I will highlight some other emerging data quality issues and how GBIF is beginning to experiment with dataset-level flagging. Currently we have flagged around 500 datasets as gridded and around 400 datasets as citizen science, but there are many more potential dataset flags.

Download Full-text

Mind the gap-analysis! – How complete are DNA barcode reference libraries for monitoring-relevant aquatic species in Europe?

ARPHA Conference Abstracts ◽

10.3897/aca.4.e65473 ◽

2021 ◽

Vol 4 ◽

Author(s):

Hannah Weigand

Keyword(s):

Gap Analysis ◽

General Pattern ◽

Dna Barcode ◽

Marine Species ◽

Biotic Index ◽

Reference Database ◽

Data Systems ◽

Freshwater Macroinvertebrates ◽

Public Data ◽

Dna Metabarcoding

Molecular species identification with DNA metabarcoding can potentially accelerate, streamline and standardise biomonitoring routines. Currently, it is tested how this new technique can be implemented for the European Water Framework Directive (WFD) and the European Marine Strategy Framework Directive (MSFD). To connect the results from DNA metabarcoding with the current monitoring routines, an extensive, high-quality DNA barcode reference database is required. Hence, a gap-analysis of the Barcode of Life Data Systems (BOLD) was performed as part of the EU-COST Action DNAqua-Net (Weigand et al. 2019), which was updated in 2021. It aimed to analyse the completeness of BOLD for species on the national WFD monitoring lists and for marine species on the ERMS (European Register of Marine Species) and AMBI (AZTI Marine Biotic Index) lists. The data were supplemented by MitoFish for freshwater fish and Diat.barcode for diatoms. Several thousands of species were included in the gap-analysis, although not all countries currently apply species-level data for all WFD biological quality elements. The barcode coverage of the different taxonomic groups varied strongly, with high levels (> 80%) for fish and freshwater vascular plants, and low levels for diatoms and freshwater plathelminths (< 15%). As a general pattern, species monitored by several countries had a higher coverage compared to those monitored only by a single country. The gap-analysis focused additionally on the availability of metadata (e.g., geographical origin of the specimen or determiner name) for the barcodes. Hence, we analysed if the data were stored public (with access to metadata) or private (without access to metadata) in BOLD or if the data were mined from GenBank (metadata are potentially available but not easy to access). Although public data were stored for many species (43% of freshwater macroinvertebrates and 21% of AMBI marine species), the proportion of species without public metadata was not neglectable (22% of freshwater macroinvertebrates and 22% of AMBI marine species). Another issue that emerged from the gap-analysis was that several deposited barcodes were identified by reverse taxonomy (RT), i.e., specimens were molecularly identified via its DNA barcode and the barcode itself is stored in BOLD with the associated species name. This can be problematic as originally misidentified samples can lead to false RT-identifications, making the data appear more trustworthy than it actually is. For the analysed freshwater macroinvertebrates, 39% of all barcodes and 65% of all public data originated from RT, impacting 11% of all monitored species. As the information about RT is only available for publicly stored data, the real impact of RT might even be higher.

Download Full-text

The Living Atlases community in action: the GBIF Benin data portal

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25488 ◽

2018 ◽

Vol 2 ◽

pp. e25488

Author(s):

Anne-Sophie Archambeau ◽

Fabien Cavière ◽

Kourouma Koura ◽

Marie-Elise Lecoq ◽

Sophie Pamerlon ◽

...

Keyword(s):

African Country ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Capacity Enhancement ◽

Support Programme ◽

Data Portal ◽

Global Biodiversity ◽

The University ◽

Biodiversity Information ◽

Occurrence Records

Atlas of Living Australia (ALA) (https://www.ala.org.au/) is the Global Biodiversity Information Facility (GBIF) node of Australia. They developed an open and free platform for sharing and exploring biodiversity data. All the modules are publicly available for reuse and customization on their GitHub account (https://github.com/AtlasOfLivingAustralia). GBIF Benin, hosted at the University of Abomey-Calavi, has published more than 338 000 occurrence records from 87 datasets and 2 checklists. Through the GBIF Capacity Enhancement Support Programme (https://www.gbif.org/programme/82219/capacity-enhancement-support-programme), GBIF Benin, with the help of GBIF France, is in the process of deploying the Beninese data portal using the GBIF France back-end architecture. GBIF Benin is the first African country to implement this module of the ALA infrastructure. In this presentation, we will show you an overview of the registry and the occurrence search engine using the Beninese data portal. We will begin with the administration interface and how to manage metadata, then we will continue with the user interface of the registry and how you can find Beninese occurrences through the hub.

Download Full-text

Identification of Neoceratitis asiatica (Becker) (Diptera: Tephritidae) based on morphological characteristics and DNA barcode

Zootaxa ◽

10.11646/zootaxa.4363.4.7 ◽

2017 ◽

Vol 4363 (4) ◽

pp. 553

Author(s):

SHAOKUN GUO ◽

JIA HE ◽

ZIHUA ZHAO ◽

LIJUN LIU ◽

LIYUAN GAO ◽

...

Keyword(s):

Dna Sequences ◽

Phylogenetic Trees ◽

Gap Analysis ◽

Dna Barcode ◽

Morphological Characteristics ◽

Coi Gene ◽

Economic Losses ◽

Morphological Identification ◽

Lycium Barbarum ◽

Accurate Identification

Neoceratitis asiatica (Becker), which especially infests wolfberry (Lycium barbarum L.), could cause serious economic losses every year in China, especially to organic wolfberry production. In some important wolfberry plantings, it is difficult and time-consuming to rear the larvae or pupae to adults for morphological identification. Molecular identification based on DNA barcode is a solution to the problem. In this study, 15 samples were collected from Ningxia, China. Among them, five adults were identified according to their morphological characteristics. The utility of mitochondrial DNA (mtDNA) cytochrome c oxidase I (COI) gene sequence as DNA barcode in distinguishing N. asiatica was evaluated by analysing Kimura 2-parameter distances and phylogenetic trees. There were significant differences between intra-specific and inter-specific genetic distances according to the barcoding gap analysis. The uncertain larval and pupal samples were within the same cluster as N. asiatica adults and formed sister cluster to N. cyanescens. A combination of morphological and molecular methods enabled accurate identification of N. asiatica. This is the first study using DNA barcode to identify N. asiatica and the obtained DNA sequences will be added to the DNA barcode database.

Download Full-text

A reference library for the identification of Canadian invertebrates: 1.5 million DNA barcodes, voucher specimens, and genomic samples

10.1101/701805 ◽

2019 ◽

Author(s):

Jeremy R. deWaard ◽

Sujeevan Ratnasingham ◽

Evgeny V. Zakharov ◽

Alex V. Borisenko ◽

Dirk Steinke ◽

...

Keyword(s):

Land Surface ◽

Animal Species ◽

Sequence Data ◽

Dna Barcodes ◽

Reference Library ◽

Global Biodiversity Information Facility ◽

Dna Sequence Data ◽

Voucher Specimens ◽

Data Portal ◽

Biodiversity Information

AbstractThe reliable taxonomic identification of organisms through DNA sequence data requires a well parameterized library of curated reference sequences. However, it is estimated that just 15% of described animal species are represented in public sequence repositories. To begin to address this deficiency, we provide DNA barcodes for 1,500,003 animal specimens collected from 23 terrestrial and aquatic ecozones at sites across Canada, a nation that comprises 7% of the planet’s land surface. In total, 14 phyla, 43 classes, 163 orders, 1123 families, 6186 genera, and 64,264 Barcode Index Numbers (BINs; a proxy for species) are represented. Species-level taxonomy was available for 38% of the specimens, but higher proportions were assigned to a genus (69.5%) and a family (99.9%). Voucher specimens and DNA extracts are archived at the Centre for Biodiversity Genomics where they are available for further research. The corresponding sequence and taxonomic data can be accessed through the Barcode of Life Data System, GenBank, the Global Biodiversity Information Facility, and the Global Genome Biodiversity Network Data Portal.

Download Full-text