Practical Application of a Data Stewardship Maturity Matrix for the NOAA OneStop Project

Mapping Intimacies ◽

10.31219/osf.io/fp3js ◽

2018 ◽

Author(s):

Ge Peng ◽

Anna Milan ◽

Nancy A. Ritchey ◽

Robert P. Partee ◽

Sonny Zinn ◽

...

Keyword(s):

North Carolina ◽

Best Practices ◽

Data Quality ◽

User Needs ◽

Data Quality Control ◽

Practical Application ◽

Data Discovery ◽

Data Quality Assessment ◽

Data Stewardship ◽

Do So

Assessing the stewardship maturity of individual datasets is an essential part of ensuring and improving the way datasets are documented, preserved, and disseminated to users. It is a critical step towards meeting U.S. federal regulations, organizational requirements, and user needs. However, it is challenging to do so consistently and quantifiably. The Data Stewardship Maturity Matrix (DSMM), developed jointly by NOAA’s National Centers for Environmental Information (NCEI) and the Cooperative Institute for Climate and Satellites–North Carolina (CICS-NC), provides a uniform framework for consistently rating stewardship maturity of individual datasets in nine key components: preservability, accessibility, usability, production sustainability, data quality assurance, data quality control/monitoring, data quality assessment, transparency/traceability, and data integrity. So far, the DSMM has been applied to over 900 individual datasets that are archived and/or managed by NCEI, in support of the NOAA’s OneStop Data Discovery and Access Framework Project. As a part of the OneStop-ready process, tools, implementation guidance, workflows, and best practices are developed to assist the application of the DSMM and described in this paper. The DSMM ratings are also consistently captured in the ISO standard-based dataset-level quality metadata and citable quality descriptive information documents, which serve as interoperable quality information to both machine and human end-users. These DSMM implementation and integration workflows and best practices could be adopted by other data management and stewardship projects or adapted for applications of other maturity assessment models.

Download Full-text

A three-pronged strategy to improve trust in biodiversity data produced by citizen science programs

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25838 ◽

2018 ◽

Vol 2 ◽

pp. e25838

Author(s):

Robert Stevenson

Keyword(s):

Best Practices ◽

Data Quality ◽

Citizen Science ◽

Point Of View ◽

Data Quality Control ◽

Biodiversity Data ◽

New Approach ◽

Report Data ◽

Instrument Measurement ◽

First Time

The quality of data produced by citizen science (CS) programs has been called into question by academic scientists, governments, and corporations. Their doubts arise because they perceive CS groups as intruding on the rightful opportunities of standard science and industry organizations, because of a normal skepticism of novel approaches, and because of a lack of understanding of how CS produces data. I propose a three-pronged strategy to overcome these objections and improve trust in CS data. Develop methods for CS programs to advertise their efforts in data quality control and quality assurance (QCQA). As a first step the PPSR core could incorporate a field that would allow programs to point to webpages that document the QAQC practices of each program. It is my experience that many programs think carefully about data quality, but the CS community currently lacks an established protocol to share this information. Define and implement best practices for generating biodiversity data using different methods. Wiggins et al. 2011 published a list of approaches that can be used for QCQA in CS projects but how these approaches should be implemented has not been systematically investigated. Measure and report data quality. If one takes the point of view that citizen science is akin to a new category of scientific instruments, then the ideas of instrument measurement and calibration can be applied CS. Scientists are well aware that any instrument needs to be calibrated before its efficacy can be established. However, because CS is new approach, the specific procedures needed for different kinds of programs are just now being worked out for the first time. Develop methods for CS programs to advertise their efforts in data quality control and quality assurance (QCQA). As a first step the PPSR core could incorporate a field that would allow programs to point to webpages that document the QAQC practices of each program. It is my experience that many programs think carefully about data quality, but the CS community currently lacks an established protocol to share this information. Define and implement best practices for generating biodiversity data using different methods. Wiggins et al. 2011 published a list of approaches that can be used for QCQA in CS projects but how these approaches should be implemented has not been systematically investigated. Measure and report data quality. If one takes the point of view that citizen science is akin to a new category of scientific instruments, then the ideas of instrument measurement and calibration can be applied CS. Scientists are well aware that any instrument needs to be calibrated before its efficacy can be established. However, because CS is new approach, the specific procedures needed for different kinds of programs are just now being worked out for the first time. The strategy outlined above faces some specific challenges. Citizen science biodiversity programs must address two important problems that standard scientific entities encounter when sampling and monitoring biodiversity. The first is correctly identifying species. For citizens this can be a problem because they often do not have the training and background of scientist teams. Likewise, it may be difficult for CS projects to manage updating and maintaining the taxonomies of the species being investigated. A second set of challenges is the diverse kinds of biodiversity data collected by CS programs. For instances, Notes from Nature decodes that labels of museum specimens, Snapshot Serengeti identifies species of large mammals from camera trap photographs, iNaturalist collections images of species and then has a crowdsource identification processs, while eBird collects observations of birds that are immediately filtered with computer algorithms for review by the observer and if, subsequently flagged, reviewed by a local expert. Each of these programs likely requires a different set of best practices and methods to measure data quality.

Download Full-text

Assessing the quality of administrative data for research: a framework from the Manitoba Centre for Health Policy

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx078 ◽

2017 ◽

Vol 25 (3) ◽

pp. 224-229 ◽

Cited By ~ 12

Author(s):

Mark Smith ◽

Lisa M Lix ◽

Mahmoud Azimaee ◽

Jennifer E Enns ◽

Justine Orr ◽

...

Keyword(s):

Health Policy ◽

Best Practices ◽

Data Quality ◽

Administrative Data ◽

Research Literature ◽

Data Repositories ◽

The Core ◽

Data Quality Assessment ◽

Quality Framework

Abstract The growth of administrative data repositories worldwide has spurred the development and application of data quality frameworks to ensure that research analyses based on these data can be used to draw meaningful conclusions. However, the research literature on administrative data quality is sparse, and there is little consensus regarding which dimensions of data quality should be measured. Here we present the core dimensions of the data quality framework developed at the Manitoba Centre for Health Policy, a world leader in the use of administrative data for research purposes, and provide examples and context for the application of these dimensions to conducting data quality evaluations. In sharing this framework, our ultimate aim is to promote best practices in rigorous data quality assessment among users of administrative data for research.

Download Full-text

TACTICS FOR DYNAMIC DATA CLEANSING AND DATA PROFILING USING DIMENSIONS FOR DATA QUALITY ASSESSMENT

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i4.271276 ◽

2018 ◽

Vol 6 (4) ◽

pp. 271-276

Author(s):

A. Ghouse Mohiddin S. Ramakrishna ◽

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Data Cleansing ◽

Dynamic Data ◽

Data Profiling ◽

Data Quality Assessment

Download Full-text

Data Assimilation and Ocean Data Quality Control Upgrades in SWAFS

10.21236/ada626595 ◽

2002 ◽

Author(s):

Clark Rowley ◽

James Cummings

Keyword(s):

Quality Control ◽

Data Assimilation ◽

Data Quality ◽

Data Quality Control

Download Full-text

Internet of Samples (iSamples): Toward an interdisciplinary cyberinfrastructure for material samples

GigaScience ◽

10.1093/gigascience/giab028 ◽

2021 ◽

Vol 10 (5) ◽

Author(s):

Neil Davies ◽

John Deck ◽

Eric C Kansa ◽

Sarah Whitcher Kansa ◽

John Kunze ◽

...

Keyword(s):

Best Practices ◽

Built Environment ◽

21St Century ◽

Natural Science ◽

Natural World ◽

The Internet ◽

Data Standards ◽

Meta Data ◽

Cross Domain ◽

Data Stewardship

Abstract Sampling the natural world and built environment underpins much of science, yet systems for managing material samples and associated (meta)data are fragmented across institutional catalogs, practices for identification, and discipline-specific (meta)data standards. The Internet of Samples (iSamples) is a standards-based collaboration to uniquely, consistently, and conveniently identify material samples, record core metadata about them, and link them to other samples, data, and research products. iSamples extends existing resources and best practices in data stewardship to render a cross-domain cyberinfrastructure that enables transdisciplinary research, discovery, and reuse of material samples in 21st century natural science.

Download Full-text

Automated Data Quality Control in FDOPA brain PET Imaging using Deep Learning

Computer Methods and Programs in Biomedicine ◽

10.1016/j.cmpb.2021.106239 ◽

2021 ◽

pp. 106239

Author(s):

Antonella D. Pontoriero ◽

Giovanna Nordio ◽

Rubaida Easmin ◽

Alessio Giacomel ◽

Barbara Santangelo ◽

...

Keyword(s):

Quality Control ◽

Deep Learning ◽

Data Quality ◽

Pet Imaging ◽

Data Quality Control ◽

Brain Pet

Download Full-text

BREC: an R package/Shiny app for automatically identifying heterochromatin boundaries and estimating local recombination rates along chromosomes

BMC Bioinformatics ◽

10.1186/s12859-021-04233-1 ◽

2021 ◽

Vol 22 (S6) ◽

Author(s):

Yasmine Mansour ◽

Annie Chateau ◽

Anna-Sophie Fiston-Lavier

Keyword(s):

Data Quality ◽

Data Science ◽

Fruit Fly ◽

R Package ◽

Model Organisms ◽

Data Quality Control ◽

Recombination Rates ◽

Functional Dynamics ◽

Shiny App ◽

User Friendly

Abstract Background Meiotic recombination is a vital biological process playing an essential role in genome's structural and functional dynamics. Genomes exhibit highly various recombination profiles along chromosomes associated with several chromatin states. However, eu-heterochromatin boundaries are not available nor easily provided for non-model organisms, especially for newly sequenced ones. Hence, we miss accurate local recombination rates necessary to address evolutionary questions. Results Here, we propose an automated computational tool, based on the Marey maps method, allowing to identify heterochromatin boundaries along chromosomes and estimating local recombination rates. Our method, called BREC (heterochromatin Boundaries and RECombination rate estimates) is non-genome-specific, running even on non-model genomes as long as genetic and physical maps are available. BREC is based on pure statistics and is data-driven, implying that good input data quality remains a strong requirement. Therefore, a data pre-processing module (data quality control and cleaning) is provided. Experiments show that BREC handles different markers' density and distribution issues. Conclusions BREC's heterochromatin boundaries have been validated with cytological equivalents experimentally generated on the fruit fly Drosophila melanogaster genome, for which BREC returns congruent corresponding values. Also, BREC's recombination rates have been compared with previously reported estimates. Based on the promising results, we believe our tool has the potential to help bring data science into the service of genome biology and evolution. We introduce BREC within an R-package and a Shiny web-based user-friendly application yielding a fast, easy-to-use, and broadly accessible resource. The BREC R-package is available at the GitHub repository https://github.com/GenomeStructureOrganization.

Download Full-text

Importance of GNSS data quality assessment with novel control criteria in professional soccer match-play

International Journal of Performance Analysis in Sport ◽

10.1080/24748668.2021.1947017 ◽

2021 ◽

pp. 1-11

Author(s):

Aman Singh Shergill ◽

Jamie Twist ◽

Craig Highton

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Match Play ◽

Soccer Match ◽

Professional Soccer ◽

Data Quality Assessment ◽

Control Criteria ◽

Gnss Data

Download Full-text

A Machine Learning Approach for Data Quality Control of Earth Observation Data Management System

IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss39084.2020.9323615 ◽

2020 ◽

Author(s):

Weiguo Han ◽

Matthew Jochum

Keyword(s):

Machine Learning ◽

Quality Control ◽

Data Quality ◽

Earth Observation ◽

Data Management System ◽

Learning Approach ◽

Observation Data ◽

Data Quality Control ◽

Machine Learning Approach ◽

Earth Observation Data

Download Full-text

Privacy Preserving Data Quality Assessment for High-Fidelity Data Sharing

Proceedings of the 2014 ACM Workshop on Information Sharing & Collaborative Security - WISCS '14 ◽

10.1145/2663876.2663885 ◽

2014 ◽

Cited By ~ 1

Author(s):

Julien Freudiger ◽

Shantanu Rane ◽

Alejandro E. Brito ◽

Ersin Uzun

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Data Sharing ◽

Privacy Preserving ◽

High Fidelity ◽

Data Quality Assessment

Download Full-text