The Online Pollen Catalogs Network (RCPol) data quality assurance system

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25657 ◽

2018 ◽

Vol 2 ◽

pp. e25657

Author(s):

Allan Veiga ◽

Antonio Saraiva ◽

Cláudia da Silva

Keyword(s):

Quality Assurance ◽

Data Quality ◽

Quality Assessment ◽

Quality Measurement ◽

Expert Assessment ◽

Web Based ◽

Data Quality Assessment ◽

Measurement Validation ◽

Machine Readable ◽

Quality Reports

The Online Pollen Catalogs Network (RCPol) (http://rcpol.org.br) was conceived to promote interaction among researchers and the integration of data from pollen collections, herbaria and bee collections. In order to structure RCPol work, researchers and collaborators have organized information on Palynology in four branches: palynoecology, paleopalynology, palynotaxonomy and spores. This information is collaboratively digitized and managed using standardized Google Spreadsheets. These datasets are assessed by the RCPol palynology experts and when a dataset is compliant with the RCPol data quality policy, it is published to http://chaves.rcpol.org.br. Data quality assessment used to be performed manually by the experts and was time-consuming and inconsistent in detecting data quality problemas such as incomplete and inconsistent information. In order to support data quality assessment in a more automated and effective way, we are developing a data quality tool which implements a series of mechanisms to measure, validate and improve completeness, consistency, conformity, accessibility and uniqueness of data, prior to a manual expert assessment. The system was designed according to the conceptual framework proposed by Task Group 1 of the Biodiversity Data Quality Interest Group Veiga et al. 2017. For each sheet in the Google Spreadsheet, the system generates a set of assertions of measures, validations and amendments for the records (rows) and datasets (sheets), according to a profile defined for RCPol. The profile follows the policies of data quality measurement, validation and enhancement. The data quality measurement policy encompassess the dimensions of completeness, consistency, conformity, accessibility and uniqueness. RCPol uses a quality assurance approach: only data that are compliant with all the quality requirements are published in the system. Therefore, its data quality validation policy only considers datasets with 100% completeness, consistency, conformity, accessibility and uniqueness. In order to improve the quality in each relevant dimension, a set of enhancements was defined in the data quality enhancement policy. Based on this RCPol profile, the system is able to generate reports that contain measures, validations and amendments assertions with the method and tool used to generate the assertion. This web-based system can be tested at http://chaves.rcpol.org.br/admin/data-quality with the dataset https://docs.google.com/spreadsheets/u/1/d/1gH0aa2qqnAgfAixGom3Gnx6Qp 91ZvWhUHPb_QeoIreQ. This system is able to assure that only data compliant with the data quality profile defined by RCPol are fit for use and can be published. This system contributes significantly to decreasing the workload of the experts. Some data may still contain values that cannot be easily automatically assessed, e.g. validate if the content of an image matches the respective scientific name, so expert manual assessment remains necessary. After the system reports that data are compliant with the profile, a manual assessment must be performed by the experts, using the data quality report as support, and only after that will the data be published. The next steps include archival of the data quality reports in a database, improving the web interface to enable searching and sorting of assertions, and to provide a machine readable interface for the data quality reports.

Download Full-text

Data quality assessment in the routine health information system: an application of the Lot Quality Assurance Sampling in Benin

Health Policy and Planning ◽

10.1093/heapol/czu067 ◽

2014 ◽

Vol 30 (7) ◽

pp. 837-843 ◽

Cited By ~ 15

Author(s):

Yolaine Glèlè Ahanhanzo ◽

Edgard-Marius Ouendo ◽

Alphonse Kpozèhouen ◽

Alain Levêque ◽

Michel Makoutodé ◽

...

Keyword(s):

Information System ◽

Quality Assurance ◽

Data Quality ◽

Quality Assessment ◽

Health Information ◽

Health Information System ◽

Data Quality Assessment ◽

Lot Quality Assurance Sampling

Download Full-text

Chemical modelling of multicomponent mixtures: quality assurance is more than just equilibrium data quality assessment

Accreditation and Quality Assurance ◽

10.1007/s00769-010-0701-x ◽

2010 ◽

Vol 16 (4-5) ◽

pp. 179-184 ◽

Cited By ~ 3

Author(s):

Peter M. May ◽

Montserrat Filella

Keyword(s):

Quality Assurance ◽

Data Quality ◽

Quality Assessment ◽

Multicomponent Mixtures ◽

Data Quality Assessment ◽

Equilibrium Data ◽

Chemical Modelling

Download Full-text

APIs: A Common Interface for the Global Biodiversity Informatics Community

Biodiversity Information Science and Standards ◽

10.3897/biss.5.75267 ◽

2021 ◽

Vol 5 ◽

Author(s):

Ben Norton

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Heterogeneous Data ◽

Data Sources ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Web Based ◽

Heterogeneous Data Sources ◽

Data Quality Assessment ◽

Global Biodiversity

Web APIs (Application Programming Interfaces) facilitate the exchange of resources (data) between two functionally independent entities across a common programmatic interface. In more general terms, Web APIs can connect almost anything to the world wide web. Unlike traditional software, APIs are not compiled, installed, or run. Instead, data are read (or consumed in API speak) through a web-based transaction, where a client makes a request and a server responds. Web APIs can be loosely grouped into two categories within the scope of biodiversity informatics, based on purpose. First, Product APIs deliver data products to end-users. Examples include the Global Biodiversity Information Facility (GBIF) and iNaturalist APIs. Designed and built to solve specific problems, web-based Service APIs are the second type and the focus of this presentation (referred to as Service APIs). Their primary function is to provide on-demand support to existing programmatic processes. Examples of this type include Elasticsearch Suggester API and geolocation, a service that delivers geographic locations from spatial input (latitude and longitude coordinates) (Pejic et al. 2010). Many challenges lie ahead for biodiversity informatics and the sharing of global biodiversity data (e.g., Blair et al. 2020). Service-driven, standardized web-based Service APIs that adhere to best practices within the scope of biodiversity informatics can provide the transformational change needed to address many of these issues. This presentation will highlight several critical areas of interest in the biodiversity data community, describing how Service APIs can address each individually. The main topics include: standardized vocabularies, interoperability of heterogeneous data sources and data quality assessment and remediation. standardized vocabularies, interoperability of heterogeneous data sources and data quality assessment and remediation. Fundamentally, the value of any innovative technical solution can be measured by the extent of community adoption. In the context of Service APIs, adoption takes two primary forms: financial and temporal investment in the construction of clients that utilize Service APIs and willingness of the community to integrate Service APIs into their own systems and workflows. financial and temporal investment in the construction of clients that utilize Service APIs and willingness of the community to integrate Service APIs into their own systems and workflows. To achieve this, Service APIs must be simple, easy to use, pragmatic, and designed with all major stakeholder groups in mind, including users, providers, aggregators, and architects (Anderson et al. 2020Anderson et al. 2020; this study). Unfortunately, many innovative and promising technical solutions have fallen short not because of an inability to solve problems (Verner et al. 2008), rather, they were difficult to use, built in isolation, and/or designed without effective communication with stakeholders. Fortunately, projects such as Darwin Core (Wieczorek et al. 2012), the Integrated Publishing Toolkit (Robertson et al. 2014), and Megadetector (Microsoft 2021) provide the blueprint for successful community adoption of a technological solution within the biodiversity community. The final section of this presentation will examine the often overlooked non-technical aspects of this technical endeavor. Within this context, specifically how following these models can broaden community engagement and bridge the knowledge gap between the major stakeholders, resulting in the successful implementation of Service APIs.

Download Full-text

TACTICS FOR DYNAMIC DATA CLEANSING AND DATA PROFILING USING DIMENSIONS FOR DATA QUALITY ASSESSMENT

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i4.271276 ◽

2018 ◽

Vol 6 (4) ◽

pp. 271-276

Author(s):

A. Ghouse Mohiddin S. Ramakrishna ◽

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Data Cleansing ◽

Dynamic Data ◽

Data Profiling ◽

Data Quality Assessment

Download Full-text

Importance of GNSS data quality assessment with novel control criteria in professional soccer match-play

International Journal of Performance Analysis in Sport ◽

10.1080/24748668.2021.1947017 ◽

2021 ◽

pp. 1-11

Author(s):

Aman Singh Shergill ◽

Jamie Twist ◽

Craig Highton

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Match Play ◽

Soccer Match ◽

Professional Soccer ◽

Data Quality Assessment ◽

Control Criteria ◽

Gnss Data

Download Full-text

Privacy Preserving Data Quality Assessment for High-Fidelity Data Sharing

Proceedings of the 2014 ACM Workshop on Information Sharing & Collaborative Security - WISCS '14 ◽

10.1145/2663876.2663885 ◽

2014 ◽

Cited By ~ 1

Author(s):

Julien Freudiger ◽

Shantanu Rane ◽

Alejandro E. Brito ◽

Ersin Uzun

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Data Sharing ◽

Privacy Preserving ◽

High Fidelity ◽

Data Quality Assessment

Download Full-text

Towards Configurable Composite Data Quality Assessment

2019 IEEE 21st Conference on Business Informatics (CBI) ◽

10.1109/cbi.2019.00035 ◽

2019 ◽

Cited By ~ 1

Author(s):

Paolo Ceravolo ◽

Emanuele Bellini

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Data Quality Assessment ◽

Composite Data

Download Full-text

Supporting data quality assessment in eScience

10.47749/t/unicamp.2013.910347 ◽

2013 ◽

Author(s):

Joana Esther Gonzales Malaverri

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Data Quality Assessment

Download Full-text

Applying Domain Knowledge for Data Quality Assessment in Dermatology

Intelligent Decision Technologies 2017 - Smart Innovation, Systems and Technologies ◽

10.1007/978-3-319-59424-8_14 ◽

2017 ◽

pp. 147-156

Author(s):

Nemanja Igić ◽

Branko Terzić ◽

Milan Matić ◽

Vladimir Ivančević ◽

Ivan Luković

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Domain Knowledge ◽

Data Quality Assessment

Download Full-text

Evaluation of a staff training programme to reimplement a comprehensive health assessment

BMJ Open Quality ◽

10.1136/bmjoq-2018-000353 ◽

2018 ◽

Vol 7 (4) ◽

pp. e000353 ◽

Cited By ~ 1

Author(s):

Luke A Turcotte ◽

Jake Tran ◽

Joshua Moralejo ◽

Nancy Curtin-Telegdi ◽

Leslie Eckel ◽

...

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Health Information ◽

Health Assessment ◽

Training Programme ◽

Care Planning ◽

Assessment Framework ◽

High Quality ◽

Data Quality Assessment ◽

Comprehensive Health

BackgroundHealth information systems with applications in patient care planning and decision support depend on high-quality data. A postacute care hospital in Ontario, Canada, conducted data quality assessment and focus group interviews to guide the development of a cross-disciplinary training programme to reimplement the Resident Assessment Instrument–Minimum Data Set (RAI-MDS) 2.0 comprehensive health assessment into the hospital’s clinical workflows.MethodsA hospital-level data quality assessment framework based on time series comparisons against an aggregate of Ontario postacute care hospitals was used to identify areas of concern. Focus groups were used to evaluate assessment practices and the use of health information in care planning and clinical decision support. The data quality assessment and focus groups were repeated to evaluate the effectiveness of the training programme.ResultsInitial data quality assessment and focus group indicated that knowledge, practice and cultural barriers prevented both the collection and use of high-quality clinical data. Following the implementation of the training, there was an improvement in both data quality and the culture surrounding the RAI-MDS 2.0 assessment.ConclusionsIt is important for facilities to evaluate the quality of their health information to ensure that it is suitable for decision-making purposes. This study demonstrates the use of a data quality assessment framework that can be applied for quality improvement planning.

Download Full-text