scholarly journals Practical Application of a Data Stewardship Maturity Matrix for the NOAA OneStop Project

2018 ◽  
Author(s):  
Ge Peng ◽  
Anna Milan ◽  
Nancy A. Ritchey ◽  
Robert P. Partee ◽  
Sonny Zinn ◽  
...  

Assessing the stewardship maturity of individual datasets is an essential part of ensuring and improving the way datasets are documented, preserved, and disseminated to users. It is a critical step towards meeting U.S. federal regulations, organizational requirements, and user needs. However, it is challenging to do so consistently and quantifiably. The Data Stewardship Maturity Matrix (DSMM), developed jointly by NOAA’s National Centers for Environmental Information (NCEI) and the Cooperative Institute for Climate and Satellites–North Carolina (CICS-NC), provides a uniform framework for consistently rating stewardship maturity of individual datasets in nine key components: preservability, accessibility, usability, production sustainability, data quality assurance, data quality control/monitoring, data quality assessment, transparency/traceability, and data integrity. So far, the DSMM has been applied to over 900 individual datasets that are archived and/or managed by NCEI, in support of the NOAA’s OneStop Data Discovery and Access Framework Project. As a part of the OneStop-ready process, tools, implementation guidance, workflows, and best practices are developed to assist the application of the DSMM and described in this paper. The DSMM ratings are also consistently captured in the ISO standard-based dataset-level quality metadata and citable quality descriptive information documents, which serve as interoperable quality information to both machine and human end-users. These DSMM implementation and integration workflows and best practices could be adopted by other data management and stewardship projects or adapted for applications of other maturity assessment models.

2018 ◽  
Vol 2 ◽  
pp. e25838
Author(s):  
Robert Stevenson

The quality of data produced by citizen science (CS) programs has been called into question by academic scientists, governments, and corporations. Their doubts arise because they perceive CS groups as intruding on the rightful opportunities of standard science and industry organizations, because of a normal skepticism of novel approaches, and because of a lack of understanding of how CS produces data. I propose a three-pronged strategy to overcome these objections and improve trust in CS data. Develop methods for CS programs to advertise their efforts in data quality control and quality assurance (QCQA). As a first step the PPSR core could incorporate a field that would allow programs to point to webpages that document the QAQC practices of each program. It is my experience that many programs think carefully about data quality, but the CS community currently lacks an established protocol to share this information. Define and implement best practices for generating biodiversity data using different methods. Wiggins et al. 2011 published a list of approaches that can be used for QCQA in CS projects but how these approaches should be implemented has not been systematically investigated. Measure and report data quality. If one takes the point of view that citizen science is akin to a new category of scientific instruments, then the ideas of instrument measurement and calibration can be applied CS. Scientists are well aware that any instrument needs to be calibrated before its efficacy can be established. However, because CS is new approach, the specific procedures needed for different kinds of programs are just now being worked out for the first time. Develop methods for CS programs to advertise their efforts in data quality control and quality assurance (QCQA). As a first step the PPSR core could incorporate a field that would allow programs to point to webpages that document the QAQC practices of each program. It is my experience that many programs think carefully about data quality, but the CS community currently lacks an established protocol to share this information. Define and implement best practices for generating biodiversity data using different methods. Wiggins et al. 2011 published a list of approaches that can be used for QCQA in CS projects but how these approaches should be implemented has not been systematically investigated. Measure and report data quality. If one takes the point of view that citizen science is akin to a new category of scientific instruments, then the ideas of instrument measurement and calibration can be applied CS. Scientists are well aware that any instrument needs to be calibrated before its efficacy can be established. However, because CS is new approach, the specific procedures needed for different kinds of programs are just now being worked out for the first time. The strategy outlined above faces some specific challenges. Citizen science biodiversity programs must address two important problems that standard scientific entities encounter when sampling and monitoring biodiversity. The first is correctly identifying species. For citizens this can be a problem because they often do not have the training and background of scientist teams. Likewise, it may be difficult for CS projects to manage updating and maintaining the taxonomies of the species being investigated. A second set of challenges is the diverse kinds of biodiversity data collected by CS programs. For instances, Notes from Nature decodes that labels of museum specimens, Snapshot Serengeti identifies species of large mammals from camera trap photographs, iNaturalist collections images of species and then has a crowdsource identification processs, while eBird collects observations of birds that are immediately filtered with computer algorithms for review by the observer and if, subsequently flagged, reviewed by a local expert. Each of these programs likely requires a different set of best practices and methods to measure data quality.


2017 ◽  
Vol 25 (3) ◽  
pp. 224-229 ◽  
Author(s):  
Mark Smith ◽  
Lisa M Lix ◽  
Mahmoud Azimaee ◽  
Jennifer E Enns ◽  
Justine Orr ◽  
...  

Abstract The growth of administrative data repositories worldwide has spurred the development and application of data quality frameworks to ensure that research analyses based on these data can be used to draw meaningful conclusions. However, the research literature on administrative data quality is sparse, and there is little consensus regarding which dimensions of data quality should be measured. Here we present the core dimensions of the data quality framework developed at the Manitoba Centre for Health Policy, a world leader in the use of administrative data for research purposes, and provide examples and context for the application of these dimensions to conducting data quality evaluations. In sharing this framework, our ultimate aim is to promote best practices in rigorous data quality assessment among users of administrative data for research.


GigaScience ◽  
2021 ◽  
Vol 10 (5) ◽  
Author(s):  
Neil Davies ◽  
John Deck ◽  
Eric C Kansa ◽  
Sarah Whitcher Kansa ◽  
John Kunze ◽  
...  

Abstract Sampling the natural world and built environment underpins much of science, yet systems for managing material samples and associated (meta)data are fragmented across institutional catalogs, practices for identification, and discipline-specific (meta)data standards. The Internet of Samples (iSamples) is a standards-based collaboration to uniquely, consistently, and conveniently identify material samples, record core metadata about them, and link them to other samples, data, and research products. iSamples extends existing resources and best practices in data stewardship to render a cross-domain cyberinfrastructure that enables transdisciplinary research, discovery, and reuse of material samples in 21st century natural science.


Author(s):  
Antonella D. Pontoriero ◽  
Giovanna Nordio ◽  
Rubaida Easmin ◽  
Alessio Giacomel ◽  
Barbara Santangelo ◽  
...  

2021 ◽  
Vol 22 (S6) ◽  
Author(s):  
Yasmine Mansour ◽  
Annie Chateau ◽  
Anna-Sophie Fiston-Lavier

Abstract Background Meiotic recombination is a vital biological process playing an essential role in genome's structural and functional dynamics. Genomes exhibit highly various recombination profiles along chromosomes associated with several chromatin states. However, eu-heterochromatin boundaries are not available nor easily provided for non-model organisms, especially for newly sequenced ones. Hence, we miss accurate local recombination rates necessary to address evolutionary questions. Results Here, we propose an automated computational tool, based on the Marey maps method, allowing to identify heterochromatin boundaries along chromosomes and estimating local recombination rates. Our method, called BREC (heterochromatin Boundaries and RECombination rate estimates) is non-genome-specific, running even on non-model genomes as long as genetic and physical maps are available. BREC is based on pure statistics and is data-driven, implying that good input data quality remains a strong requirement. Therefore, a data pre-processing module (data quality control and cleaning) is provided. Experiments show that BREC handles different markers' density and distribution issues. Conclusions BREC's heterochromatin boundaries have been validated with cytological equivalents experimentally generated on the fruit fly Drosophila melanogaster genome, for which BREC returns congruent corresponding values. Also, BREC's recombination rates have been compared with previously reported estimates. Based on the promising results, we believe our tool has the potential to help bring data science into the service of genome biology and evolution. We introduce BREC within an R-package and a Shiny web-based user-friendly application yielding a fast, easy-to-use, and broadly accessible resource. The BREC R-package is available at the GitHub repository https://github.com/GenomeStructureOrganization.


Sign in / Sign up

Export Citation Format

Share Document