scholarly journals The Online Pollen Catalogs Network (RCPol)

2018 ◽  
Vol 2 ◽  
pp. e25658
Author(s):  
Allan Veiga ◽  
Antonio Saraiva ◽  
Cláudia da Silva

Aiming at promoting interaction among researchers and the integration of data from their pollen collections, herbaria and bee collections, RCPol was created in 2013. In order to structure RCPol work, researchers and collaborators have organized information on Palynology and trophic interactions between bees and plants. During the project development, different computing tools were developed and provided on RCPol website (http://rcpol.org.br), including: interactive keys with multiple inputs for species identification (http://chaves.rcpol.org.br); a glossary of palinology related terms (http://chaves.rcpol.org.br/profile/glossary/eco); a plant-bee interactions database (http://chaves.rcpol.org.br/interactions); and a data quality tool (http://chaves.rcpol.org.br/admin/data-quality). Those tools were developed in partnership with researchers and collaborators from Escola Politécnica (USP) and other Brazilian and foreign institutions that act on palynology, floral biology, pollination, plant taxonomy, ecology, and trophic interactions. The interactive keys are organized in four branches: palynoecology, paleopalynology, palynotaxonomy and spores. These information are collaboratively digitized and managed using standardized Google Spreadsheets. All the information are assessed by a data quality assurance tool (based on the conceptual framework of TDWG Biodiversity Data Quality Interest Group Veiga et al. 2017) and curated by palynology experts. In total, it has published 1,774 specimens records, 1,488 species records (automatically generated by merging specimens records with the same scientific name), 656 interactions records, 370 glossary terms records and 15 institutions records, all of them translated from the original language (usually Portuguese or English) to Portuguese, English and Spanish. During the projectʼs first three years, 106 partners, among researchers and collaborators from 28 institutions from Brazil and abroad, actively participated on the project. An important part of the project's activities involved training researchers and students on palynology, data digitization and on the use of the system. Until now six training courses have reached 192 people.

Author(s):  
Geoffrey Ower ◽  
Yuri Roskov

Producing a global taxonomic checklist of all species is essential for indexing biodiversity data, and for providing the basic knowledge needed to study, manage, and conserve biological diversity. The Catalogue of Life (CoL) aims to provide a global taxonomic checklist of all species, and includes 1.9 million species names in the 2019 annual edition. The task of assembling data into CoL is complex and requires reformatting data, quality assurance testing, and collaborating with data providers to resolve detected taxonomic conflicts. Global Species Databases (GSDs) are submitted in a wide variety of data formats to CoL by hundreds of taxonomic experts and institutions. Submitted data are reformatted to a standard data submission format: CoL Standard Dataset (ACEF), DarwinCore, or CoLDP. A series of standardized data integrity checks are run to detect and resolve frequently occurring data quality problems including character encoding corruption, non-Latin characters in scientific names, missing parents, duplicated and homonymic names within the GSD and among other GSDs, split taxonomic groups that have been assigned to multiple parent taxa, and other issues. The process and challenges of assembling data into the Catalogue of Life, and future directions of the project in migrating to CoL+ infrastructure will be discussed.


2021 ◽  
Vol 21 (S1) ◽  
Author(s):  
Harriet Ruysen ◽  
◽  
Ahmed Ehsanur Rahman ◽  
Vladimir Sergeevich Gordeev ◽  
Tanvir Hossain ◽  
...  

Abstract Background Observation of care at birth is challenging with multiple, rapid and potentially concurrent events occurring for mother, newborn and placenta. Design of electronic data (E-data) collection needs to account for these challenges. The Every Newborn Birth Indicators Research Tracking in Hospitals (EN-BIRTH) was an observational study to assess measurement of indicators for priority maternal and newborn interventions and took place in five hospitals in Bangladesh, Nepal and Tanzania (July 2017–July 2018). E-data tools were required to capture individually-linked, timed observation of care, data extraction from hospital register-records or case-notes, and exit-survey data from women. Methods To evaluate this process for EN-BIRTH, we employed a framework organised around five steps for E-data design, data collection and implementation. Using this framework, a mixed methods evaluation synthesised evidence from study documentation, standard operating procedures, stakeholder meetings and design workshops. We undertook focus group discussions with EN-BIRTH researchers to explore experiences from the three different country teams (November–December 2019). Results were organised according to the five a priori steps. Results In accordance with the five-step framework, we found: 1) Selection of data collection approach and software: user-centred design principles were applied to meet the challenges for observation of rapid, concurrent events around the time of birth with time-stamping. 2) Design of data collection tools and programming: required extensive pilot testing of tools to be user-focused and to include in-built error messages and data quality alerts. 3) Recruitment and training of data collectors: standardised with an interactive training package including pre/post-course assessment. 4) Data collection, quality assurance, and management: real-time quality assessments with a tracking dashboard and double observation/data extraction for a 5% case subset, were incorporated as part of quality assurance. Internet-based synchronisation during data collection posed intermittent challenges. 5) Data management, cleaning and analysis: E-data collection was perceived to improve data quality and reduce time cleaning. Conclusions The E-Data system, custom-built for EN-BIRTH, was valued by the site teams, particularly for time-stamped clinical observation of complex multiple simultaneous events at birth, without which the study objectives could not have been met. However before selection of a custom-built E-data tool, the development time, higher training and IT support needs, and connectivity challenges need to be considered against the proposed study or programme’s purpose, and currently available E-data tool options.


2018 ◽  
Vol 2 ◽  
pp. e26539 ◽  
Author(s):  
Paul J. Morris ◽  
James Hanken ◽  
David Lowery ◽  
Bertram Ludäscher ◽  
James Macklin ◽  
...  

As curators of biodiversity data in natural science collections, we are deeply concerned with data quality, but quality is an elusive concept. An effective way to think about data quality is in terms of fitness for use (Veiga 2016). To use data to manage physical collections, the data must be able to accurately answer questions such as what objects are in the collections, where are they and where are they from. Some research uses aggregate data across collections, which involves exchange of data using standard vocabularies. Some research uses require accurate georeferences, collecting dates, and current identifications. It is well understood that the costs of data capture and data quality improvement increase with increasing time from the original observation. These factors point towards two engineering principles for software that is intended to maintain or enhance data quality: build small modular data quality tests that can be easily assembled in suites to assess the fitness of use of data for some particular need; and produce tools that can be applied by users with a wide range of technical skill levels at different points in the data life cycle. In the Kurator project, we have produced code (e.g. Wieczorek et al. 2017, Morris 2016) which consists of small modules that can be incorporated into data management processes as small libraries that address particular data quality tests. These modules can be combined into customizable data quality scripts, which can be run on single computers or scalable architecture and can be incorporated into other software, run as command line programs, or run as suites of canned workflows through a web interface. Kurator modules can be integrated into early stage data capture applications, run to help prepare data for aggregation by matching it to standard vocabularies, be run for quality control or quality assurance on data sets, and can report on data quality in terms of a fitness-for-use framework (Veiga et al. 2017). One of our goals is simple tests usable by anyone anywhere.


2018 ◽  
Vol 44 (6) ◽  
pp. 785-801
Author(s):  
Hong Huang

This article aims to understand the views of genomic scientists with regard to the data quality assurances associated with semiotics and data–information–knowledge (DIK). The resulting communication of signs generated from genomic curation work, was found within different semantic levels of DIK that correlate specific data quality dimensions with their respective skills. Syntactic data quality dimensions were ranked the highest among all other semiotic data quality dimensions, which indicated that scientists spend great efforts for handling data wrangling activities in genome curation work. Semantic- and pragmatic-related sign communications were about meaningful interpretation, thus required additional adaptive and interpretative skills to deal with data quality issues. This expanded concept of ‘curation’ as sign/semiotic was not previously explored from the practical to the theoretical perspectives. The findings inform policy makers and practitioners to develop framework and cyberinfrastructure that facilitate the initiatives and advocacies of ‘Big Data to Knowledge’ by funding agencies. The findings from this study can also help plan data quality assurance policies and thus maximise the efficiency of genomic data management. Our results give strong support to the relevance of data quality skills communication for relationship with data quality assurance in genome curation activities.


PEDIATRICS ◽  
1992 ◽  
Vol 90 (6) ◽  
pp. 959-965
Author(s):  
Terri A. Slagle ◽  
Jeffrey B. Gould

The purpose of this national survey was to define the extent and features of database use by 445 tertiary level neonatal intensive care nurseries in the United States. Of the 305 centers responding to our survey, 78% had a database in use in 1989 and 15% planned to develop one in the future. Nurseries varied remarkably in the volume of data collected, the amount of time devoted to completing data collection forms, and the personnel involved in data collection. Although data were used primarily for statistical reports (93% of nurseries), quality assurance (73%) and research activities (61%) were also enhanced by database information. Neonatal databases were used to generate reports for the permanent medical record in 38% of centers. Satisfaction with the database was dependent on how useful the database information was to centers which collected and actually used a large volume of information. Overall, nurseries expressed a high degree of confidence in the data they collected, and 65% felt their neonatal database information could be used directly in publication of research. It was disturbing that accuracy of data was not monitored formally by the majority of nurseries. Only 27% of centers followed a routine schedule of data quality assurance, and only 53% had built in error messages for data entry. We caution all who receive database information in the form of morbidity and mortality statistics, clinical reports on patients cared for in neonatal units, and published manuscripts to be attentive to the quality of the data they consume. We feel that future database design efforts need to better address data quality control. Our findings stress the importance and need for immediate efforts to better address database quality control.


Sign in / Sign up

Export Citation Format

Share Document