The Catalogue of Life: Assembling data into a global taxonomic checklist

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37221 ◽

2019 ◽

Vol 3 ◽

Author(s):

Geoffrey Ower ◽

Yuri Roskov

Keyword(s):

Quality Assurance ◽

Data Quality ◽

Biological Diversity ◽

Basic Knowledge ◽

Biodiversity Data ◽

Future Directions ◽

Data Submission ◽

Standard Data ◽

Data Formats ◽

Taxonomic Groups

Producing a global taxonomic checklist of all species is essential for indexing biodiversity data, and for providing the basic knowledge needed to study, manage, and conserve biological diversity. The Catalogue of Life (CoL) aims to provide a global taxonomic checklist of all species, and includes 1.9 million species names in the 2019 annual edition. The task of assembling data into CoL is complex and requires reformatting data, quality assurance testing, and collaborating with data providers to resolve detected taxonomic conflicts. Global Species Databases (GSDs) are submitted in a wide variety of data formats to CoL by hundreds of taxonomic experts and institutions. Submitted data are reformatted to a standard data submission format: CoL Standard Dataset (ACEF), DarwinCore, or CoLDP. A series of standardized data integrity checks are run to detect and resolve frequently occurring data quality problems including character encoding corruption, non-Latin characters in scientific names, missing parents, duplicated and homonymic names within the GSD and among other GSDs, split taxonomic groups that have been assigned to multiple parent taxa, and other issues. The process and challenges of assembling data into the Catalogue of Life, and future directions of the project in migrating to CoL+ infrastructure will be discussed.

Download Full-text

Regional Approach to Biodiversity Information Management and Reporting in Southeast Europe

Biodiversity Information Science and Standards ◽

10.3897/biss.3.38190 ◽

2019 ◽

Vol 3 ◽

Author(s):

Azra Velagić-Hajrudinović

Keyword(s):

Information Management ◽

Bosnia And Herzegovina ◽

Biological Diversity ◽

Natura 2000 ◽

Convention On Biological Diversity ◽

Southeast Europe ◽

Global Biodiversity Information Facility ◽

Standard Data ◽

Taxonomic Groups ◽

Biodiversity Information

Featuring a large variety of ecosystems, abundant freshwater and forest resources, unique extensive karstic systems, and a high level of biodiversity and endemism, Southeast Europe (SEE) plays a crucial role in the conservation of biodiversity in Europe and beyond. In order to conserve and sustainably use these biodiversity assets and valuable natural resources, a regional concerted approach in the field of biodiversity information management and reporting (BIMR) has been strengthened. This has enabled improvement in access, transparency and exchange of biodiversity data and reporting processes among the participating economies. Certain significant and visible progress among SEE economies and stakeholders is due to to the knowledge gained about regional and national BIMR baselines, agreed and elaborated minimum Convention on Biological Diversity (CBD) and European Union (EU) requirements on BIMR among stakeholders and implemented BIMR tools (e.g., a regionally unified fundamental database for the Information System for Nature Conservation (ISNC), for instance in Montenegro (http://zasticenapodrucja-cg.tk//en), Bosnia and Herzegovina/entity of Republika Srpska (http://e-priroda.rs.ba/en/) and entity of Federation of Bosnia and Herzegovina and North Macedonia (Standard Data Form - SDF application for NATURA 2000) and compiled dataset on five taxonomic groups of endemic taxa using the Darwin Core standard). Therefore, BIMR activities/priorities from the region have become more evident and supported along with ownership of BIMR tools acquired by the partner institutions and recognized at the global level through the Global Biodiversity Information Facility (GBIF).

Download Full-text

The Online Pollen Catalogs Network (RCPol)

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25658 ◽

2018 ◽

Vol 2 ◽

pp. e25658

Author(s):

Allan Veiga ◽

Antonio Saraiva ◽

Cláudia da Silva

Keyword(s):

Quality Assurance ◽

Data Quality ◽

Trophic Interactions ◽

Floral Biology ◽

Plant Taxonomy ◽

Biodiversity Data ◽

Original Language ◽

Training Courses ◽

Quality Tool ◽

Scientific Name

Aiming at promoting interaction among researchers and the integration of data from their pollen collections, herbaria and bee collections, RCPol was created in 2013. In order to structure RCPol work, researchers and collaborators have organized information on Palynology and trophic interactions between bees and plants. During the project development, different computing tools were developed and provided on RCPol website (http://rcpol.org.br), including: interactive keys with multiple inputs for species identification (http://chaves.rcpol.org.br); a glossary of palinology related terms (http://chaves.rcpol.org.br/profile/glossary/eco); a plant-bee interactions database (http://chaves.rcpol.org.br/interactions); and a data quality tool (http://chaves.rcpol.org.br/admin/data-quality). Those tools were developed in partnership with researchers and collaborators from Escola Politécnica (USP) and other Brazilian and foreign institutions that act on palynology, floral biology, pollination, plant taxonomy, ecology, and trophic interactions. The interactive keys are organized in four branches: palynoecology, paleopalynology, palynotaxonomy and spores. These information are collaboratively digitized and managed using standardized Google Spreadsheets. All the information are assessed by a data quality assurance tool (based on the conceptual framework of TDWG Biodiversity Data Quality Interest Group Veiga et al. 2017) and curated by palynology experts. In total, it has published 1,774 specimens records, 1,488 species records (automatically generated by merging specimens records with the same scientific name), 656 interactions records, 370 glossary terms records and 15 institutions records, all of them translated from the original language (usually Portuguese or English) to Portuguese, English and Spanish. During the projectʼs first three years, 106 partners, among researchers and collaborators from 28 institutions from Brazil and abroad, actively participated on the project. An important part of the project's activities involved training researchers and students on palynology, data digitization and on the use of the system. Until now six training courses have reached 192 people.

Download Full-text

Automatic Well Completions and Reservoir Grid Data Quality Assurance for Reservoir Simulation Models

10.2118/175623-ms ◽

2015 ◽

Author(s):

Tariq Al-Zahrani ◽

Muath Al-Mulla ◽

Mohammed Al-Nuaim

Keyword(s):

Quality Assurance ◽

Data Quality ◽

Reservoir Simulation ◽

Simulation Models ◽

Grid Data ◽

Well Completions

Download Full-text

14C AMS at Suerc: Improving QA Data with the 5MV Tandem and 250KV SSAMS

Radiocarbon ◽

10.1017/s003382220004529x ◽

2010 ◽

Vol 52 (2) ◽

pp. 263-271 ◽

Cited By ~ 27

Author(s):

P Naysmith ◽

G T Cook ◽

S P H T Freeman ◽

E M Scott ◽

R Anderson ◽

...

Keyword(s):

Quality Assurance ◽

Sample Preparation ◽

Mass Spectrometer ◽

Random Access ◽

Capacity Measurement ◽

Tandem Accelerator ◽

Standard Data ◽

Accelerator Mass Spectrometer ◽

On Line ◽

Automated Data Acquisition

In 2003, a National Electrostatics Corporation (NEC) 5MV tandem accelerator mass spectrometer was installed at SUERC, providing the radiocarbon laboratory with 14C measurements to 4–5‰ repeatability. In 2007, a 250kV single-stage accelerator mass spectrometer (SSAMS) was added to provide additional 14C capability and is now the preferred system for 14C analysis. Changes to the technology and to our operations are evident in our copious quality assurance data: typically, we now use the 134-position MC-SNICS source, which is filled to capacity. Measurement of standards shows that spectrometer running without the complication of on-line δ13C evaluation is a good operational compromise. Currently, 3‰ 14C/13C measurements are routinely achieved for samples up to nearly 3 half-lives old by consistent sample preparation and an automated data acquisition algorithm with sample random access for measurement repeats. Background and known-age standard data are presented for the period 2003–2008 for the 5MV system and 2007–2008 for the SSAMS, to demonstrate the improvements in data quality.

Download Full-text

Big Data Quality Assurance Through Data Traceability: A Case Study of the National Standard Reference Data Program of Korea

IEEE Access ◽

10.1109/access.2019.2904286 ◽

2019 ◽

Vol 7 ◽

pp. 36294-36299 ◽

Cited By ~ 1

Author(s):

Doyoung Lee

Keyword(s):

Quality Assurance ◽

Big Data ◽

Data Quality ◽

Standard Reference Data ◽

Reference Data ◽

National Standard ◽

Data Program

Download Full-text

Strengthening Quality Assurance in Health Education: Recent Milestones and Future Directions

Health Promotion Practice ◽

10.1177/1524839908329854 ◽

2009 ◽

Vol 10 (2) ◽

pp. 192-200 ◽

Cited By ~ 21

Author(s):

Alyson Taub ◽

David A. Birch ◽

M. Elaine Auld ◽

Linda Lysoby ◽

Laura Rasar King

Keyword(s):

Quality Assurance ◽

Health Education ◽

Future Directions

Download Full-text

Electronic data collection for multi-country, hospital-based, clinical observation of maternal and newborn care: EN-BIRTH study experiences

BMC Pregnancy and Childbirth ◽

10.1186/s12884-020-03426-5 ◽

2021 ◽

Vol 21 (S1) ◽

Cited By ~ 4

Author(s):

Harriet Ruysen ◽

◽

Ahmed Ehsanur Rahman ◽

Vladimir Sergeevich Gordeev ◽

Tanvir Hossain ◽

...

Keyword(s):

Quality Assurance ◽

Data Collection ◽

Data Quality ◽

Clinical Observation ◽

Data Extraction ◽

Observation Data ◽

Electronic Data ◽

Concurrent Events ◽

Maternal And Newborn ◽

Selection Of

Abstract Background Observation of care at birth is challenging with multiple, rapid and potentially concurrent events occurring for mother, newborn and placenta. Design of electronic data (E-data) collection needs to account for these challenges. The Every Newborn Birth Indicators Research Tracking in Hospitals (EN-BIRTH) was an observational study to assess measurement of indicators for priority maternal and newborn interventions and took place in five hospitals in Bangladesh, Nepal and Tanzania (July 2017–July 2018). E-data tools were required to capture individually-linked, timed observation of care, data extraction from hospital register-records or case-notes, and exit-survey data from women. Methods To evaluate this process for EN-BIRTH, we employed a framework organised around five steps for E-data design, data collection and implementation. Using this framework, a mixed methods evaluation synthesised evidence from study documentation, standard operating procedures, stakeholder meetings and design workshops. We undertook focus group discussions with EN-BIRTH researchers to explore experiences from the three different country teams (November–December 2019). Results were organised according to the five a priori steps. Results In accordance with the five-step framework, we found: 1) Selection of data collection approach and software: user-centred design principles were applied to meet the challenges for observation of rapid, concurrent events around the time of birth with time-stamping. 2) Design of data collection tools and programming: required extensive pilot testing of tools to be user-focused and to include in-built error messages and data quality alerts. 3) Recruitment and training of data collectors: standardised with an interactive training package including pre/post-course assessment. 4) Data collection, quality assurance, and management: real-time quality assessments with a tracking dashboard and double observation/data extraction for a 5% case subset, were incorporated as part of quality assurance. Internet-based synchronisation during data collection posed intermittent challenges. 5) Data management, cleaning and analysis: E-data collection was perceived to improve data quality and reduce time cleaning. Conclusions The E-Data system, custom-built for EN-BIRTH, was valued by the site teams, particularly for time-stamped clinical observation of complex multiple simultaneous events at birth, without which the study objectives could not have been met. However before selection of a custom-built E-data tool, the development time, higher training and IT support needs, and connectivity challenges need to be considered against the proposed study or programme’s purpose, and currently available E-data tool options.

Download Full-text

Contribution of Citizen Science to Biodiversity Data Mobilization in Russia

Biodiversity Information Science and Standards ◽

10.3897/biss.4.59197 ◽

2020 ◽

Vol 4 ◽

Author(s):

Natalya Ivanova ◽

Maxim Shashkov

Keyword(s):

New Species ◽

Protected Areas ◽

Citizen Science ◽

Biodiversity Data ◽

Breeding Bird ◽

Breeding Bird Atlas ◽

Research Grade ◽

Taxonomic Groups ◽

Species Occurrences ◽

Biodiversity Information

Currently Russia doesn't have a national biodiversity information system, and is still not a GBIF (Global Biodiversity Information Facility) member. Nevertheless, GBIF is the largest source of biodiversity data for Russia. As of August 2020, >5M species occurrences were available through the GBIF portal, of which 54% were published by Russian organisations. There are 107 institutions from Russia that have become GBIF publishers and 357 datasets have been published. The important trend of data mobilization in Russia is driven by the considerable contribution of citizen science. The most popular platform is iNaturalist. This year, the related GBIF dataset (Ueda 2020) became the largest one for Russia (793,049 species occurrences as of 2020-08-11). The first observation for Russia was posted in 2011, but iNaturalist started becoming popular in 2017. That year, 88 observers added >4500 observations that represented 1390 new species for Russia, 7- and 2-fold more respectively, than for the previous 6 years. Now we have nearly 12,000 observers, about 15,000 observed species and >1M research-grade observations. The ratio of observations for Tracheophyta, Chordata, and Arthropoda in Russia is different compared to the global scale. There are almost an equal amount of observations in the global iNaturalist GBIF dataset for these groups. At the same time in Russia, vascular plants make up 2/3rds of the observations. That is due to the "Flora of Russia" project, which attracted many professional botanists both as observers and experts. Thanks to their activity, Russia has a high proportion of research-grade observations in iNaturalist, 78% versus 60% globally. Another consequence of wide participation by professional researchers is the high rate of species accumulation. For some taxonomic groups conspicuous species were already revealed. There are about 850 bird species in Russia of which 398 species were observed in 2018, and only 83 new species in 2019. Currently, the number of new species recorded over time is decreasing despite the increase in observers and overall user activity. Russian iNaturalist observers have shared a lot of archive photos (taken during past years). In 2018, it was nearly 1/4 of the total number of observations and about 3/4 of new species for the year, with similar trends observed during 2019. Usually archive photos are posted from December until April, but the 2020 pandemic lockdown spurred a new wave of archive photo mobilisation in April and May. There are many iNaturalist projects for protected areas in Russia: 27 for strict nature reserves and national parks, and about 300 for others. About 100,000 observations (7.5% of all Russian observations) from the umbrella project "Protected areas of Russia" represent >34% of the species diversity observed in Russia. For some regions, e.g., Novosibirsk, Nizhniy Novgorod and Vladimir Oblasts, almost all protected areas are covered by iNaturalist projects, and are often their only source of available biodiversity data. There are also other popular citizen science platforms developed by Russian researchers. The first one is the Russian birdwatching network RU-BIRDS.RU. The related GBIF dataset (Ukolov et al. 2019) is the third largest dataset for Russia (>370,000 species occurrences). Another Russian citizen science system is wildlifemonitoring.ru, which includes thematic resources for different taxonomic groups of vertebrates. This is the crowd-sourced web-GIS maintained by the Siberian Environmental Center NGO in Novosibirsk. It is noteworthy that iNaturalist activities in Russia are developed more as a social network than as a way to attract volunteers to participate in scientific research. Of 746 citations in the iNaturalist dataset, only 18 articles include co-authors from Russia. iNaturalist data are used for the management of regional red lists (in the Republic of Bashkortostan, Novosibirsk Oblast and others), and as an additional information source for regional inventories. RU-BIRDS data were used in the European Russia Breeding Bird Atlas and the new edition of the European Breeding Bird Atlas. In Russia, citizen science activities significantly contribute to filling gaps in the global biodiversity map. However, Russian iNaturalist observations available through GBIF originate from the USA. It is not ideal, because the iNaturalist GBIF dataset is growing rapidly, and in the future it will represent more than all other datasets for Russia combined. In our opinion, iNaturalist data should be repatriated during the process of publishing through GBIF, as it is implemented for the eBird dataset (Levatich and Ligocki 2020).

Download Full-text

Multi-taxonomic survey in the Sierra del Abra Tanchipa Biosphere Reserve

Biota Neotropica ◽

10.1590/1676-0611-bn-2020-1050 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Francisco Javier Sahagún-Sánchez ◽

José Arturo De-Nova

Keyword(s):

Protected Area ◽

Biological Diversity ◽

Biosphere Reserve ◽

Natural Protected Areas ◽

Ecological Corridor ◽

Sierra Madre ◽

Baseline Information ◽

Taxonomic Groups ◽

Management And Conservation ◽

Conservation Of Biodiversity

Abstract: Studies on biological diversity are essential to generate baseline information in natural protected areas. In the present study, we developed a multi-taxonomic inventory in the Sierra del Abra Tanchipa Biosphere Reserve, located northeast of San Luis Potosí, Mexico. Systematic samplings were performed between January 2017 to May 2018, for the taxonomic groups of flora, amphibians, reptiles, birds, and mammals. A total of 3 730 records of 683 species were obtained, corresponding to 427 species of flora, 10 of amphibians, 20 of reptiles, 192 of birds, and 34 of mammals, from which 47 species are threatened. The results obtained represent the critical biodiversity that can be found in this natural protected area. The information will be useful for decision-making on the management and conservation of biodiversity in the Sierra Madre Oriental’s ecological corridor.

Download Full-text

Kurator: Tools for Improving Fitness for Use of Biodiversity Data.

Biodiversity Information Science and Standards ◽

10.3897/biss.2.26539 ◽

2018 ◽

Vol 2 ◽

pp. e26539 ◽

Cited By ~ 1

Author(s):

Paul J. Morris ◽

James Hanken ◽

David Lowery ◽

Bertram Ludäscher ◽

James Macklin ◽

...

Keyword(s):

Data Quality ◽

Early Stage ◽

Data Capture ◽

Data Sets ◽

Biodiversity Data ◽

Skill Levels ◽

Use Of Data ◽

Data Life Cycle ◽

Wide Range ◽

Original Observation

As curators of biodiversity data in natural science collections, we are deeply concerned with data quality, but quality is an elusive concept. An effective way to think about data quality is in terms of fitness for use (Veiga 2016). To use data to manage physical collections, the data must be able to accurately answer questions such as what objects are in the collections, where are they and where are they from. Some research uses aggregate data across collections, which involves exchange of data using standard vocabularies. Some research uses require accurate georeferences, collecting dates, and current identifications. It is well understood that the costs of data capture and data quality improvement increase with increasing time from the original observation. These factors point towards two engineering principles for software that is intended to maintain or enhance data quality: build small modular data quality tests that can be easily assembled in suites to assess the fitness of use of data for some particular need; and produce tools that can be applied by users with a wide range of technical skill levels at different points in the data life cycle. In the Kurator project, we have produced code (e.g. Wieczorek et al. 2017, Morris 2016) which consists of small modules that can be incorporated into data management processes as small libraries that address particular data quality tests. These modules can be combined into customizable data quality scripts, which can be run on single computers or scalable architecture and can be incorporated into other software, run as command line programs, or run as suites of canned workflows through a web interface. Kurator modules can be integrated into early stage data capture applications, run to help prepare data for aggregation by matching it to standard vocabularies, be run for quality control or quality assurance on data sets, and can report on data quality in terms of a fitness-for-use framework (Veiga et al. 2017). One of our goals is simple tests usable by anyone anywhere.

Download Full-text