scholarly journals A Data Element-Function Conceptual Model for Data Quality Checks

Author(s):  
James R. Rogers ◽  
Tiffany J. Callahan ◽  
Tian Kang ◽  
Alan Bauck ◽  
Ritu Khare ◽  
...  
Trials ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Sophie Relph ◽  
◽  
Maria Elstad ◽  
Bolaji Coker ◽  
Matias C. Vieira ◽  
...  

Abstract Background The use of electronic patient records for assessing outcomes in clinical trials is a methodological strategy intended to drive faster and more cost-efficient acquisition of results. The aim of this manuscript was to outline the data collection and management considerations of a maternity and perinatal clinical trial using data from electronic patient records, exemplifying the DESiGN Trial as a case study. Methods The DESiGN Trial is a cluster randomised control trial assessing the effect of a complex intervention versus standard care for identifying small for gestational age foetuses. Data on maternal/perinatal characteristics and outcomes including infants admitted to neonatal care, parameters from foetal ultrasound and details of hospital activity for health-economic evaluation were collected at two time points from four types of electronic patient records held in 22 different electronic record systems at the 13 research clusters. Data were pseudonymised on site using a bespoke Microsoft Excel macro and securely transferred to the central data store. Data quality checks were undertaken. Rules for data harmonisation of the raw data were developed and a data dictionary produced, along with rules and assumptions for data linkage of the datasets. The dictionary included descriptions of the rationale and assumptions for data harmonisation and quality checks. Results Data were collected on 182,052 babies from 178,350 pregnancies in 165,397 unique women. Data availability and completeness varied across research sites; each of eight variables which were key to calculation of the primary outcome were completely missing in median 3 (range 1–4) clusters at the time of the first data download. This improved by the second data download following clarification of instructions to the research sites (each of the eight key variables were completely missing in median 1 (range 0–1) cluster at the second time point). Common data management challenges were harmonising a single variable from multiple sources and categorising free-text data, solutions were developed for this trial. Conclusions Conduct of clinical trials which use electronic patient records for the assessment of outcomes can be time and cost-effective but still requires appropriate time and resources to maximise data quality. A difficulty for pregnancy and perinatal research in the UK is the wide variety of different systems used to collect patient data across maternity units. In this manuscript, we describe how we managed this and provide a detailed data dictionary covering the harmonisation of variable names and values that will be helpful for other researchers working with these data. Trial registration Primary registry and trial identifying number: ISRCTN 67698474. Registered on 02/11/16.


Hydrology ◽  
2021 ◽  
Vol 8 (1) ◽  
pp. 33
Author(s):  
Yiannis Panagopoulos ◽  
Anna Konstantinidou ◽  
Konstantinos Lazogiannis ◽  
Anastasios Papadopoulos ◽  
Elias Dimitriou

The monitoring of surface waters is of fundamental importance for their preservation under good quantitative and qualitative conditions, as it can facilitate the understanding of the actual status of water and indicate suitable management actions. Taking advantage of the experience gained from the coordination of the national water monitoring program in Greece and the available funding from two ongoing infrastructure projects, the Institute of Inland Waters of the Hellenic Centre for Marine Research has developed the first homogeneous real-time network of automatic water monitoring across many Greek rivers. In this paper, its installation and maintenance procedures are presented with emphasis on the data quality checks, based on values range and variability tests, before their online publication and dissemination to end-users. Preliminary analyses revealed that the water pH and dissolved oxygen (DO) sensors and produced data need increased maintenance and quality checks respectively, compared to the more reliably recorded water stage, temperature (T) and electrical conductivity (EC). Moreover, the data dissemination platform and selected data visualization options are demonstrated and the need for both this platform and the monitoring network to be maintained and potentially expanded after the termination of the funding projects is highlighted.


2018 ◽  
Vol 60 (1) ◽  
pp. 32-49 ◽  
Author(s):  
Mingnan Liu ◽  
Laura Wronski

This study examines the use of trap questions as indicators of data quality in online surveys. Trap questions are intended to identify respondents who are not paying close attention to survey questions, which would mean that they are providing sub-optimal responses to not only the trap question itself but to other questions included in the survey. We conducted three experiments using an online non-probability panel. In the first experiment, we examine whether there is any difference in responses to surveys with one trap question as those that have two trap questions. In the second study, we examine responses to surveys with trap questions of varying difficulty. In the third experiment, we test the level of difficulty, the placement of the trap question, and other forms of attention checks. In all studies, we correlate the responses to the trap question(s) with other data quality checks, most of which were derived from the literature on satisficing. Also, we compare the responses to several substance questions by the response to the trap questions. This would tell us whether participants who failed the trap questions gave consistently different answers from those who passed the trap questions. We find that the rate of passing/failing various trap questions varies widely, from 27% to 87% among the types we tested. We also find evidence that some types of trap questions are more significantly correlated with other data quality measures.


2021 ◽  
Author(s):  
Clair Blacketer ◽  
Frank J Defalco ◽  
Patrick B Ryan ◽  
Peter R Rijnbeek

Advances in standardization of observational healthcare data have enabled methodological breakthroughs, rapid global collaboration, and generation of real-world evidence to improve patient outcomes. Standardizations in data structure, such as use of Common Data Models (CDM), need to be coupled with standardized approaches for data quality assessment. To ensure confidence in real-world evidence generated from the analysis of real-world data, one must first have confidence in the data itself. The Data Quality Dashboard is an open-source R package that reports potential quality issues in an OMOP CDM instance through the systematic execution and summarization of over 3,300 configurable data quality checks. We describe the implementation of check types across a data quality framework of conformance, completeness, plausibility, with both verification and validation. We illustrate how data quality checks, paired with decision thresholds, can be configured to customize data quality reporting across a range of observational health data sources. We discuss how data quality reporting can become part of the overall real-world evidence generation and dissemination process to promote transparency and build confidence in the resulting output. Transparently communicating how well CDM standardized databases adhere to a set of quality measures adds a crucial piece that is currently missing from observational research. Assessing and improving the quality of our data will inherently improve the quality of the evidence we generate.


2019 ◽  
Vol 1 ◽  
pp. ed1
Author(s):  
Shaun Yon-Seng Khoo

Almost every open access neuroscience journal is pay-to-publish. This leaves neuroscientists with a choice of submitting to journals that not all of our colleagues can legitimately access and choosing to pay large sums of money to publish open access. Neuroanatomy and Behaviour is a new platinum open access journal published by a non-profit association of scientists. Since we do not charge fees, we will focus entirely on the quality of submitted articles and encourage the adoption of reproducibility-enhancing practices, like open data, preregistration, and data quality checks. We hope that our colleagues will join us in this endeavour so that we can support good neuroscience no matter where it comes from.


2020 ◽  
Vol 14 (1) ◽  
pp. 1-30 ◽  
Author(s):  
Arie Purwanto ◽  
Anneke Zuiderwijk ◽  
Marijn Janssen

Purpose Citizen engagement is key to the success of many Open Government Data (OGD) initiatives. However, not much is known regarding how this type of engagement emerges. This study aims to investigate the necessary conditions for the emergence of citizen-led engagement with OGD and to identify which factors stimulate this type of engagement. Design/methodology/approach First, the authors created a systematic overview of the literature to develop a conceptual model of conditions and factors of OGD citizen engagement at the societal, organizational and individual level. Second, the authors used the conceptual model to systematically study citizens’ engagement in the case of a particular OGD initiative, namely, the digitization of presidential election results data in Indonesia in 2014. The authors used multiple information sources, including interviews and documents, to explore the conditions and factors of OGD citizen-led engagement in this case. Findings From the literature the authors identified five conditions for the emergence of OGD citizen-led engagement as follows: the availability of a legal and political framework that grants a mandate to open up government data, sufficient budgetary resources allocated for OGD provision, the availability of OGD feedback mechanisms, citizens’ perceived ease of engagement and motivated citizens. In the literature, the authors found six factors contributing to OGD engagement as follows: democratic culture, the availability of supporting institutional arrangements, the technical factors of OGD provision, the availability of citizens’ resources, the influence of social relationships and citizens’ perceived data quality. Some of these conditions and factors were found to be less important in the studied case, namely, citizens’ perceived ease of engagement and citizens’ perceived data quality. Moreover, the authors found several new conditions that were not mentioned in the studied literature, namely, citizens’ sense of urgency, competition among citizen-led OGD engagement initiatives, the diversity of citizens’ skills and capabilities and the intensive use of social media. The difference between the conditions and factors that played an important role in the case and those derived from the literature review might be because of the type of OGD engagement that the authors studied, namely, citizen-led engagement, without any government involvement. Research limitations/implications The findings are derived using a single case study approach. Future research can investigate multiple cases and compare the conditions and factors for citizen-led engagement with OGD in different contexts. Practical implications The conditions and factors for citizen-led engagement with OGD have been evaluated in practice and discussed with public managers and practitioners through interviews. Governmental organizations should prioritize and stimulate those conditions and factors that enhance OGD citizen engagement to create more value with OGD. Originality/value While some research on government-led engagement with OGD exists, there is hardly any research on citizen-led engagement with OGD. This study is the first to develop a conceptual model of necessary conditions and factors for citizen engagement with OGD. Furthermore, the authors applied the developed multilevel conceptual model to a case study and gathered empirical evidence of OGD engagement and its contributions to solving societal problems, rather than staying at the conceptual level. This research can be used to investigate citizen engagement with OGD in other cases and offers possibilities for systematic cross-case lesson-drawing.


Informatics ◽  
2019 ◽  
Vol 6 (1) ◽  
pp. 10 ◽  
Author(s):  
Otmane Azeroual ◽  
Gunter Saake ◽  
Mohammad Abuosba

The topic of data integration from external data sources or independent IT-systems has received increasing attention recently in IT departments as well as at management level, in particular concerning data integration in federated database systems. An example of the latter are commercial research information systems (RIS), which regularly import, cleanse, transform and prepare the analysis research information of the institutions of a variety of databases. In addition, all these so-called steps must be provided in a secured quality. As several internal and external data sources are loaded for integration into the RIS, ensuring information quality is becoming increasingly challenging for the research institutions. Before the research information is transferred to a RIS, it must be checked and cleaned up. An important factor for successful or competent data integration is therefore always the data quality. The removal of data errors (such as duplicates and harmonization of the data structure, inconsistent data and outdated data, etc.) are essential tasks of data integration using extract, transform, and load (ETL) processes. Data is extracted from the source systems, transformed and loaded into the RIS. At this point conflicts between different data sources are controlled and solved, as well as data quality issues during data integration are eliminated. Against this background, our paper presents the process of data transformation in the context of RIS which gains an overview of the quality of research information in an institution’s internal and external data sources during its integration into RIS. In addition, the question of how to control and improve the quality issues during the integration process in RIS will be addressed.


2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Girum S. Ejigu ◽  
Kakshmi Radhakrishnan ◽  
Paul McMurray ◽  
Roseanne English

ObjectiveReview the impact of applying regular data quality checks to assess completeness of core data elements that support syndromic surveillance.IntroductionThe National Syndromic Surveillance Program (NSSP) is a community focused collaboration among federal, state, and local public health agencies and partners for timely exchange of syndromic data. These data, captured in nearly real time, are intended to improve the nation's situational awareness and responsiveness to hazardous events and disease outbreaks. During CDC’s previous implementation of a syndromic surveillance system (BioSense 2), there was a reported lack of transparency and sharing of information on the data processing applied to data feeds, encumbering the identification and resolution of data quality issues. The BioSense Governance Group Data Quality Workgroup paved the way to rethink surveillance data flow and quality. Their work and collaboration with state and local partners led to NSSP redesigning the program’s data flow. The new data flow provided a ripe opportunity for NSSP analysts to study the data landscape (e.g., capturing of HL7 messages and core data elements), assess end-to-end data flow, and make adjustments to ensure all data being reported were processed, stored, and made accessible to the user community. In addition, NSSP extensively documented the new data flow, providing the transparency the community needed to better understand the disposition of facility data. Even with a new and improved data flow, data quality issues that were issues in the past, but went unreported, remained issues in the new data. However, these issues were now identified. The newly designed data flow provided opportunities to report and act on issues found in the data unlike previous versions. Therefore, an important component of the NSSP data flow was the implementation of regularly scheduled standard data quality checks, and release of standard data quality reports summarizing data quality findings.MethodsNSSP data was assessed for the national-level completeness of chief complaint and discharge diagnosis data. Completeness is the rate of non- null values (Batini et al., 2009). It was defined as the percent of visits (e.g., emergency department, urgent care center) with a non-null value found among the one or more records associated with the visit. National completeness rates for visits in 2016 were compared with completeness rates of visits in 2017 (a partial year including visits through August 2017). In addition, facility-level progress was quantified after scoring each facility based on the percent completeness change between 2016 and 2017. Legacy data processed prior to introducing the new NSSP data flow were not included in this assessment.ResultsNationally, the percent completeness of chief complaint for visits in 2016 was 82.06% (N=58,192,721), and the percent completeness of chief complaint for visits in 2017 was 87.15% (N=80,603,991). Of the 2,646 facilities that sent visits data in 2016 and 2017, 114 (4.31%) facilities showed an increase of at least 10% in chief complaint completeness in 2017 compared with 2016. As for discharge diagnosis, national results showed the percent completeness of discharge diagnosis for 2016 visits was 50.83% (N=36,048,334), and the percent completeness of discharge diagnosis for 2017 was 59.23% (N=54,776,310). Of the 2,646 facilities that sent data for visits in 2016 and 2017, 306 (11.56%) facilities showed more than a 10% increase in percent completeness of discharge diagnosis in 2017 compared with 2016.ConclusionsNationally, the percent completeness of chief complaint for visits in 2016 was 82.06% (N=58,192,721), and the percent completeness of chief complaint for visits in 2017 was 87.15% (N=80,603,991). Of the 2,646 facilities that sent visits data in 2016 and 2017, 114 (4.31%) facilities showed an increase of at least 10% in chief complaint completeness in 2017 compared with 2016. As for discharge diagnosis, national results showed the percent completeness of discharge diagnosis for 2016 visits was 50.83% (N=36,048,334), and the percent completeness of discharge diagnosis for 2017 was 59.23% (N=54,776,310). Of the 2,646 facilities that sent data for visits in 2016 and 2017, 306 (11.56%) facilities showed more than a 10% increase in percent completeness of discharge diagnosis in 2017 compared with 2016.ReferencesBatini, C., Cappiello. C., Francalanci, C. and Maurino, A. (2009) Methodologies for data quality assessment and improvement. ACM Comput. Surv., 41(3). 1-52.


2000 ◽  
Vol 1719 (1) ◽  
pp. 140-146 ◽  
Author(s):  
Cesar Quiroga ◽  
Russell Henk ◽  
Marc Jacobson

Described are the results of a pilot application intended to automate the data collection and data reduction phases of roadside origin-destination (O-D) studies. Most techniques used to obtain O-D data are quite labor intensive, during both the data collection and the data reduction phase. Frequently, they result in extensive data quality checks and long turnaround periods between the data collection work and the submittal of the corresponding survey report. The application described automates the data collection and data reduction phases by using portable, handheld data collection devices. These devices can be connected to a desktop or laptop computer to transfer the O-D data to a depository database. Included are a brief background discussion, a description of the hardware and software used and the design and development of O-D applications, a description of two applications of the handheld data collection devices, and a list of lessons learned.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Harsh Vivek Harkare ◽  
Daniel J. Corsi ◽  
Rockli Kim ◽  
Sebastian Vollmer ◽  
S. V. Subramanian

AbstractThe importance of data quality to correctly determine prevalence estimates of child anthropometric failures has been a contentious issue among policymakers and researchers. Our research objective was to ascertain the impact of improved DHS data quality on the prevalence estimates of stunting, wasting, and underweight. The study also looks for the drivers of data quality. Using five data quality indicators based on age, sex, anthropometric measurements, and normality distribution, we arrive at two datasets of differential data quality and their estimates of anthropometric failures. For this purpose, we use the 2005–2006 and 2015–2016 NFHS data covering 311,182 observations from India. The prevalence estimates of stunting and underweight were virtually unchanged after the application of quality checks. The estimate of wasting had fallen 2 percentage points, indicating an overestimation of the true prevalence. However, this differential impact on the estimate of wasting was driven by the flagging procedure’s sensitivity and was in accordance with empirical evidence from existing literature. We found DHS data quality to be of sufficiently high quality for the prevalence estimates of stunting and underweight, to not change significantly after further improving the data quality. The differential estimate of wasting is attributable to the sensitivity of the flagging procedure.


Sign in / Sign up

Export Citation Format

Share Document