Ensuring Data Readiness for Quality Requirements with Help from Procedure Reuse

Rada Chirkova; Jon Doyle; Juan Reutter

doi:10.1145/3428154

Ensuring Data Readiness for Quality Requirements with Help from Procedure Reuse

Journal of Data and Information Quality ◽

10.1145/3428154 ◽

2021 ◽

Vol 13 (3) ◽

pp. 1-15

Author(s):

Rada Chirkova ◽

Jon Doyle ◽

Juan Reutter

Keyword(s):

Big Data ◽

Data Quality ◽

Data Cleaning ◽

Data Migration ◽

Quality Of Data ◽

Design Data ◽

Big Data Applications ◽

Schema Design ◽

Data Transforming

Assessing and improving the quality of data are fundamental challenges in Big-Data applications. These challenges have given rise to numerous solutions targeting transformation, integration, and cleaning of data. However, while schema design, data cleaning, and data migration are nowadays reasonably well understood in isolation, not much attention has been given to the interplay between standalone tools in these areas. In this article, we focus on the problem of determining whether the available data-transforming procedures can be used together to bring about the desired quality characteristics of the data in business or analytics processes. For example, to help an organization avoid building a data-quality solution from scratch when facing a new analytics task, we ask whether the data quality can be improved by reusing the tools that are already available, and if so, which tools to apply, and in which order, all without presuming knowledge of the internals of the tools, which may be external or proprietary. Toward addressing this problem, we conduct a formal study in which individual data cleaning, data migration, or other data-transforming tools are abstracted as black-box procedures with only some of the properties exposed, such as their applicability requirements, the parts of the data that the procedure modifies, and the conditions that the data satisfy once the procedure has been applied. As a proof of concept, we provide foundational results on sequential applications of procedures abstracted in this way, to achieve prespecified data-quality objectives, for the use case of relational data and for procedures described by standard relational constraints. We show that, while reasoning in this framework may be computationally infeasible in general, there exist well-behaved cases in which these foundational results can be applied in practice for achieving desired data-quality results on Big Data.

Download Full-text

Data Quality Associated with Big Data Processing: A Survey

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/05386 ◽

2021 ◽

Vol 23 (06) ◽

pp. 1011-1018

Author(s):

Aishrith P Rao ◽

◽

Raghavendra J C ◽

Dr. Sowmyarani C N ◽

Dr. Padmashree T ◽

...

Keyword(s):

Big Data ◽

Data Quality ◽

Data Gathering ◽

Cost Effective ◽

Data Repository ◽

Critical Approach ◽

Critical Aspect ◽

Quality Of Data ◽

Data Group

With the advancement of technology and the large volume of data produced, processed, and stored, it is becoming increasingly important to maintain the quality of data in a cost-effective and productive manner. The most important aspects of Big Data (BD) are storage, processing, privacy, and analytics. The Big Data group has identified quality as a critical aspect of its maturity. Nonetheless, it is a critical approach that should be adopted early in the lifecycle and gradually extended to other primary processes. Companies are very reliant and drive profits from the huge amounts of data they collect. When its consistency deteriorates, the ramifications are uncertain and may result in completely undesirable conclusions. In the sense of BD, determining data quality is difficult, but it is essential that we uphold the data quality before we can proceed with any analytics. We investigate data quality during the stages of data gathering, preprocessing, data repository, and evaluation/analysis of BD processing in this paper. The related solutions are also suggested based on the elaboration and review of the proposed problems.

Download Full-text

Quality of Open Research Data: Values, Convergences and Governance

Information ◽

10.3390/info11040175 ◽

2020 ◽

Vol 11 (4) ◽

pp. 175 ◽

Cited By ~ 3

Author(s):

Tibor Koltay

Keyword(s):

Big Data ◽

Data Quality ◽

Academic Research ◽

Research Data ◽

Data Governance ◽

Quality Of Data ◽

Open Research ◽

Research Environments

This paper focuses on the characteristics of research data quality, and aims to cover the most important issues related to it, giving particular attention to its attributes and to data governance. The corporate word’s considerable interest in the quality of data is obvious in several thoughts and issues reported in business-related publications, even if there are apparent differences between values and approaches to data in corporate and in academic (research) environments. The paper also takes into consideration that addressing data quality would be unimaginable without considering big data.

Download Full-text

Enhancement of Data Quality in the Information Systems within Organization

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38226 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1490-1497

Author(s):

Nishita Shewale

Keyword(s):

Information Systems ◽

Adverse Effects ◽

Data Quality ◽

Data Cleaning ◽

Quality Data ◽

Quality Of Data ◽

New Techniques ◽

Data Profiling ◽

The Subject

Abstract: To introduce unified information systems, this will provide different establishments with an insight on how data related activities take place and there results with assured quality. Considering data accumulation, replication, missing entities, incorrect formatting, anomalies etc. can come to light in the collection of data in different information systems, which can cause an array of adverse effects on data quality, the subject of data quality should be treated with better results. This paper inspects the data quality problems in information systems and introduces the new techniques that enable organizations to improve their quality of data. Keywords: Information Systems (IS), Data Quality, Data Cleaning, Data Profiling, Standardization, Database, Organization

Download Full-text

Designing Information Product (IP) Maps On the Process of Data Processing and Academic Information

International Journal of New Media Technology ◽

10.31937/ijnmt.v4i1.534 ◽

2017 ◽

Vol 4 (1) ◽

pp. 25-31 ◽

Cited By ~ 1

Author(s):

Diana Effendi

Keyword(s):

Data Quality ◽

Data Management ◽

Information Management ◽

Information Quality ◽

Quality Data ◽

Management Approach ◽

Quality Of Data ◽

Information Product ◽

Academic Activities

Information Product Approach (IP Approach) is an information management approach. It can be used to manage product information and data quality analysis. IP-Map can be used by organizations to facilitate the management of knowledge in collecting, storing, maintaining, and using the data in an organized. The process of data management of academic activities in X University has not yet used the IP approach. X University has not given attention to the management of information quality of its. During this time X University just concern to system applications used to support the automation of data management in the process of academic activities. IP-Map that made in this paper can be used as a basis for analyzing the quality of data and information. By the IP-MAP, X University is expected to know which parts of the process that need improvement in the quality of data and information management. Index term: IP Approach, IP-Map, information quality, data quality. REFERENCES[1] H. Zhu, S. Madnick, Y. Lee, and R. Wang, “Data and Information Quality Research: Its Evolution and Future,” Working Paper, MIT, USA, 2012.[2] Lee, Yang W; at al, Journey To Data Quality, MIT Press: Cambridge, 2006.[3] L. Al-Hakim, Information Quality Management: Theory and Applications. Idea Group Inc (IGI), 2007.[4] “Access : A semiotic information quality framework: development and comparative analysis : Journal ofInformation Technology.” [Online]. Available: http://www.palgravejournals.com/jit/journal/v20/n2/full/2000038a.html. [Accessed: 18-Sep-2015].[5] Effendi, Diana, Pengukuran Dan Perbaikan Kualitas Data Dan Informasi Di Perguruan Tinggi MenggunakanCALDEA Dan EVAMECAL (Studi Kasus X University), Proceeding Seminar Nasional RESASTEK, 2012, pp.TIG.1-TI-G.6.

Download Full-text

Response Behavior and Quality of Survey Data: Comparing Elderly Respondents in Institutions and Private Households

Sociological Methods & Research ◽

10.1177/0049124121995534 ◽

2021 ◽

pp. 004912412199553

Author(s):

Jan-Lucas Schanze

Keyword(s):

Data Quality ◽

The Elderly ◽

Response Behavior ◽

Quality Of Data ◽

Social Surveys ◽

Private Households ◽

Confounding Variables ◽

Health Related ◽

Survey Interviews

An increasing age of respondents and cognitive impairment are usual suspects for increasing difficulties in survey interviews and a decreasing data quality. This is why survey researchers tend to label residents in retirement and nursing homes as hard-to-interview and exclude them from most social surveys. In this article, I examine to what extent this label is justified and whether quality of data collected among residents in institutions for the elderly really differs from data collected within private households. For this purpose, I analyze the response behavior and quality indicators in three waves of Survey of Health, Ageing and Retirement in Europe. To control for confounding variables, I use propensity score matching to identify respondents in private households who share similar characteristics with institutionalized residents. My results confirm that most indicators of response behavior and data quality are worse in institutions compared to private households. However, when controlling for sociodemographic and health-related variables, differences get very small. These results suggest the importance of health for the data quality irrespective of the housing situation.

Download Full-text

Thinking about police data: Analysts’ perceptions of data quality in Canadian policing

The Police Journal Theory Practice and Principles ◽

10.1177/0032258x211021461 ◽

2021 ◽

pp. 0032258X2110214

Author(s):

Christopher D O’Connor ◽

John Ng ◽

Dallas Hill ◽

Tyler Frederick

Keyword(s):

Big Data ◽

Data Collection ◽

Data Quality ◽

Research Culture ◽

Police Services ◽

Police Data ◽

Data Collection And Analysis ◽

Quality Issues

Policing is increasingly being shaped by data collection and analysis. However, we still know little about the quality of the data police services acquire and utilize. Drawing on a survey of analysts from across Canada, this article examines several data collection, analysis, and quality issues. We argue that as we move towards an era of big data policing it is imperative that police services pay more attention to the quality of the data they collect. We conclude by discussing the implications of ignoring data quality issues and the need to develop a more robust research culture in policing.

Download Full-text

Integrating Experimental and Analytic Approaches to Improve Data Quality in Genome-wide RNAi Screens

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057108317145 ◽

2008 ◽

Vol 13 (5) ◽

pp. 378-389 ◽

Cited By ~ 26

Author(s):

Xiaohua Douglas Zhang ◽

Amy S. Espeseth ◽

Eric N. Johnson ◽

Jayne Chin ◽

Adam Gates ◽

...

Keyword(s):

Data Quality ◽

High Throughput Screening ◽

Daily Practice ◽

Systematic Errors ◽

Quality Of Data ◽

Improve Data Quality ◽

Research Journal ◽

Genome Wide ◽

Assess Data Quality

RNA interference (RNAi) not only plays an important role in drug discovery but can also be developed directly into drugs. RNAi high-throughput screening (HTS) biotechnology allows us to conduct genome-wide RNAi research. A central challenge in genome-wide RNAi research is to integrate both experimental and computational approaches to obtain high quality RNAi HTS assays. Based on our daily practice in RNAi HTS experiments, we propose the implementation of 3 experimental and analytic processes to improve the quality of data from RNAi HTS biotechnology: (1) select effective biological controls; (2) adopt appropriate plate designs to display and/or adjust for systematic errors of measurement; and (3) use effective analytic metrics to assess data quality. The applications in 5 real RNAi HTS experiments demonstrate the effectiveness of integrating these processes to improve data quality. Due to the effectiveness in improving data quality in RNAi HTS experiments, the methods and guidelines contained in the 3 experimental and analytic processes are likely to have broad utility in genome-wide RNAi research. ( Journal of Biomolecular Screening 2008:378-389)

Download Full-text

Peningkatan Kualitas Data Bidang Tanah di Kantor Pertanahan Kota Administrasi Jakarta Selatan

Tunas Agraria ◽

10.31292/jta.v4i2.143 ◽

2021 ◽

Vol 4 (2) ◽

pp. 168-174

Author(s):

Maslusatun Mawadah

Keyword(s):

Quality Improvement ◽

Data Quality ◽

Research Method ◽

The South ◽

Quality Of Data ◽

Regional Division ◽

Land Administration ◽

Problems And Solutions ◽

Descriptive Approach

The South Jakarta Administrative City Land Office is one of the cities targeted to be a city with complete land administration in 2020. The current condition of land parcel data demands an update, namely improving the quality of data from KW1 to KW6 towards KW1 valid. The purpose of this study is to determine the condition of land data quality in South Jakarta, the implementation of data quality improvement, as well as problems and solutions in implementing data quality improvement. The research method used is qualitative with a descriptive approach. The results showed that the condition of the data quality after the implementation of the improvement, namely KW1 increased from 86.45% to 87.01%. The roles of man, material, machine, and method have been fulfilled and the implementation of data quality improvement is not in accordance with the 2019 Complete City Guidelines in terms of territorial boundary inventory, and there are still obstacles in the implementation of improving the quality of land parcel data, namely the absence of buku tanah, surat ukur, and gambar ukur at the land office, the existence of regional division, the boundaries of the sub district are not yet certain, and the existence of land parcels that have been separated from mapping without being noticed by the office administrator.

Download Full-text

Big Data Quality for Data Mining in Business Intelligence Applications

Advances in Business Information Systems and Analytics - Integration Challenges for Analytics, Business Intelligence, and Data Mining ◽

10.4018/978-1-7998-5781-5.ch004 ◽

2021 ◽

pp. 64-91

Author(s):

Arun Thotapalli Sundararaman

Keyword(s):

Data Mining ◽

Big Data ◽

Data Quality ◽

Business Intelligence ◽

Future Research ◽

Quality Factors ◽

Data Volume ◽

Mining Algorithms ◽

Data Mining Application

Study of data quality for data mining application has always been a complex topic; in the recent years, this topic has gained further complexity with the advent of big data as the source for data mining and business intelligence (BI) applications. In a big data environment, data is consumed in various states and various forms serving as input for data mining, and this is the main source of added complexity. These new complexities and challenges arise from the underlying dimensions of big data (volume, variety, velocity, and value) together with the ability to consume data at various stages of transition from raw data to standardized datasets. These have created a need for expanding the traditional data quality (DQ) factors into BDQ (big data quality) factors besides the need for new BDQ assessment and measurement frameworks for data mining and BI applications. However, very limited advancement has been made in research and industry in the topic of BDQ and their relevance and criticality for data mining and BI applications. Data quality in data mining refers to the quality of the patterns or results of the models built using mining algorithms. DQ for data mining in business intelligence applications should be aligned with the objectives of the BI application. Objective measures, training/modeling approaches, and subjective measures are three major approaches that exist to measure DQ for data mining. However, there is no agreement yet on definitions or measurements or interpretations of DQ for data mining. Defining the factors of DQ for data mining and their measurement for a BI system has been one of the major challenges for researchers as well as practitioners. This chapter provides an overview of existing research in the area of BDQ definitions and measurement for data mining for BI, analyzes the gaps therein, and provides a direction for future research and practice in this area.

Download Full-text

Evaluating the Data Quality of a National Sample of Young Sexual and Gender Minorities Recruited Using Social Media: The Influence of Different Design Formats

Social Science Computer Review ◽

10.1177/0894439320928240 ◽

2020 ◽

pp. 089443932092824 ◽

Cited By ~ 1

Author(s):

Michael J. Stern ◽

Erin Fordyce ◽

Rachel Carpenter ◽

Melissa Heim Viox ◽

Stuart Michaels ◽

...

Keyword(s):

Social Media ◽

Data Quality ◽

National Sample ◽

Quality Data ◽

Quality Of Data ◽

Gender Minorities ◽

Population Recruitment ◽

Youth Population ◽

And Gender

Social media recruitment is no longer an uncharted avenue for survey research. The results thus far provide evidence of an engaging means of recruiting hard-to-reach populations. Questions remain, however, regarding whether the data collected using this method of recruitment produce quality data. This article assesses one aspect that may influence the quality of data gathered through nonprobability sampling using social media advertisements for a hard-to-reach sexual and gender minority youth population: recruitment design formats. The data come from the Survey of Today’s Adolescent Relationships and Transitions, which used a variety of forms of advertisements as survey recruitment tools on Facebook, Instagram, and Snapchat. Results demonstrate that design decisions such as the format of the advertisement (e.g., video or static) and the use of eligibility language on the advertisements impact the quality of the data as measured by break-off rates and the use of nonsubstantive responses. Additionally, the type of device used affected the measures of data quality.

Download Full-text