scholarly journals The Online Pollen Catalogs Network (RCPol) data quality assurance system

2018 ◽  
Vol 2 ◽  
pp. e25657
Author(s):  
Allan Veiga ◽  
Antonio Saraiva ◽  
Cláudia da Silva

The Online Pollen Catalogs Network (RCPol) (http://rcpol.org.br) was conceived to promote interaction among researchers and the integration of data from pollen collections, herbaria and bee collections. In order to structure RCPol work, researchers and collaborators have organized information on Palynology in four branches: palynoecology, paleopalynology, palynotaxonomy and spores. This information is collaboratively digitized and managed using standardized Google Spreadsheets. These datasets are assessed by the RCPol palynology experts and when a dataset is compliant with the RCPol data quality policy, it is published to http://chaves.rcpol.org.br. Data quality assessment used to be performed manually by the experts and was time-consuming and inconsistent in detecting data quality problemas such as incomplete and inconsistent information. In order to support data quality assessment in a more automated and effective way, we are developing a data quality tool which implements a series of mechanisms to measure, validate and improve completeness, consistency, conformity, accessibility and uniqueness of data, prior to a manual expert assessment. The system was designed according to the conceptual framework proposed by Task Group 1 of the Biodiversity Data Quality Interest Group Veiga et al. 2017. For each sheet in the Google Spreadsheet, the system generates a set of assertions of measures, validations and amendments for the records (rows) and datasets (sheets), according to a profile defined for RCPol. The profile follows the policies of data quality measurement, validation and enhancement. The data quality measurement policy encompassess the dimensions of completeness, consistency, conformity, accessibility and uniqueness. RCPol uses a quality assurance approach: only data that are compliant with all the quality requirements are published in the system. Therefore, its data quality validation policy only considers datasets with 100% completeness, consistency, conformity, accessibility and uniqueness. In order to improve the quality in each relevant dimension, a set of enhancements was defined in the data quality enhancement policy. Based on this RCPol profile, the system is able to generate reports that contain measures, validations and amendments assertions with the method and tool used to generate the assertion. This web-based system can be tested at http://chaves.rcpol.org.br/admin/data-quality with the dataset https://docs.google.com/spreadsheets/u/1/d/1gH0aa2qqnAgfAixGom3Gnx6Qp 91ZvWhUHPb_QeoIreQ. This system is able to assure that only data compliant with the data quality profile defined by RCPol are fit for use and can be published. This system contributes significantly to decreasing the workload of the experts. Some data may still contain values that cannot be easily automatically assessed, e.g. validate if the content of an image matches the respective scientific name, so expert manual assessment remains necessary. After the system reports that data are compliant with the profile, a manual assessment must be performed by the experts, using the data quality report as support, and only after that will the data be published. The next steps include archival of the data quality reports in a database, improving the web interface to enable searching and sorting of assertions, and to provide a machine readable interface for the data quality reports.

2014 ◽  
Vol 30 (7) ◽  
pp. 837-843 ◽  
Author(s):  
Yolaine Glèlè Ahanhanzo ◽  
Edgard-Marius Ouendo ◽  
Alphonse Kpozèhouen ◽  
Alain Levêque ◽  
Michel Makoutodé ◽  
...  

Author(s):  
Ben Norton

Web APIs (Application Programming Interfaces) facilitate the exchange of resources (data) between two functionally independent entities across a common programmatic interface. In more general terms, Web APIs can connect almost anything to the world wide web. Unlike traditional software, APIs are not compiled, installed, or run. Instead, data are read (or consumed in API speak) through a web-based transaction, where a client makes a request and a server responds. Web APIs can be loosely grouped into two categories within the scope of biodiversity informatics, based on purpose. First, Product APIs deliver data products to end-users. Examples include the Global Biodiversity Information Facility (GBIF) and iNaturalist APIs. Designed and built to solve specific problems, web-based Service APIs are the second type and the focus of this presentation (referred to as Service APIs). Their primary function is to provide on-demand support to existing programmatic processes. Examples of this type include Elasticsearch Suggester API and geolocation, a service that delivers geographic locations from spatial input (latitude and longitude coordinates) (Pejic et al. 2010). Many challenges lie ahead for biodiversity informatics and the sharing of global biodiversity data (e.g., Blair et al. 2020). Service-driven, standardized web-based Service APIs that adhere to best practices within the scope of biodiversity informatics can provide the transformational change needed to address many of these issues. This presentation will highlight several critical areas of interest in the biodiversity data community, describing how Service APIs can address each individually. The main topics include: standardized vocabularies, interoperability of heterogeneous data sources and data quality assessment and remediation. standardized vocabularies, interoperability of heterogeneous data sources and data quality assessment and remediation. Fundamentally, the value of any innovative technical solution can be measured by the extent of community adoption. In the context of Service APIs, adoption takes two primary forms: financial and temporal investment in the construction of clients that utilize Service APIs and willingness of the community to integrate Service APIs into their own systems and workflows. financial and temporal investment in the construction of clients that utilize Service APIs and willingness of the community to integrate Service APIs into their own systems and workflows. To achieve this, Service APIs must be simple, easy to use, pragmatic, and designed with all major stakeholder groups in mind, including users, providers, aggregators, and architects (Anderson et al. 2020Anderson et al. 2020; this study). Unfortunately, many innovative and promising technical solutions have fallen short not because of an inability to solve problems (Verner et al. 2008), rather, they were difficult to use, built in isolation, and/or designed without effective communication with stakeholders. Fortunately, projects such as Darwin Core (Wieczorek et al. 2012), the Integrated Publishing Toolkit (Robertson et al. 2014), and Megadetector (Microsoft 2021) provide the blueprint for successful community adoption of a technological solution within the biodiversity community. The final section of this presentation will examine the often overlooked non-technical aspects of this technical endeavor. Within this context, specifically how following these models can broaden community engagement and bridge the knowledge gap between the major stakeholders, resulting in the successful implementation of Service APIs.


Author(s):  
Nemanja Igić ◽  
Branko Terzić ◽  
Milan Matić ◽  
Vladimir Ivančević ◽  
Ivan Luković

2018 ◽  
Vol 7 (4) ◽  
pp. e000353 ◽  
Author(s):  
Luke A Turcotte ◽  
Jake Tran ◽  
Joshua Moralejo ◽  
Nancy Curtin-Telegdi ◽  
Leslie Eckel ◽  
...  

BackgroundHealth information systems with applications in patient care planning and decision support depend on high-quality data. A postacute care hospital in Ontario, Canada, conducted data quality assessment and focus group interviews to guide the development of a cross-disciplinary training programme to reimplement the Resident Assessment Instrument–Minimum Data Set (RAI-MDS) 2.0 comprehensive health assessment into the hospital’s clinical workflows.MethodsA hospital-level data quality assessment framework based on time series comparisons against an aggregate of Ontario postacute care hospitals was used to identify areas of concern. Focus groups were used to evaluate assessment practices and the use of health information in care planning and clinical decision support. The data quality assessment and focus groups were repeated to evaluate the effectiveness of the training programme.ResultsInitial data quality assessment and focus group indicated that knowledge, practice and cultural barriers prevented both the collection and use of high-quality clinical data. Following the implementation of the training, there was an improvement in both data quality and the culture surrounding the RAI-MDS 2.0 assessment.ConclusionsIt is important for facilities to evaluate the quality of their health information to ensure that it is suitable for decision-making purposes. This study demonstrates the use of a data quality assessment framework that can be applied for quality improvement planning.


Sign in / Sign up

Export Citation Format

Share Document