Semantic Integrity Constraint Rule Discovery and Outlier Detection in Relational Data as a Data Quality Mining Technique

R. VasanthKumarMehta; S. Rajalakshmi

doi:10.5120/15357-3819

Outlier Detection for Sensor Systems (ODSS): A MATLAB Macro for Evaluating Microphone Sensor Data Quality

Sensors ◽

10.3390/s17102329 ◽

2017 ◽

Vol 17 (10) ◽

pp. 2329 ◽

Cited By ~ 2

Author(s):

Robert Vasta ◽

Ian Crandell ◽

Anthony Millican ◽

Leanna House ◽

Eric Smith

Keyword(s):

Data Quality ◽

Outlier Detection ◽

Sensor Data ◽

Sensor Systems

Download Full-text

Using Machine Learning for Dependable Outlier Detection in Environmental Monitoring Systems

ACM Transactions on Cyber-Physical Systems ◽

10.1145/3445812 ◽

2021 ◽

Vol 5 (3) ◽

pp. 1-30

Author(s):

Gonçalo Jesus ◽

António Casimiro ◽

Anabela Oliveira

Keyword(s):

Machine Learning ◽

Environmental Monitoring ◽

Data Quality ◽

Outlier Detection ◽

Prediction Models ◽

Sensor Data ◽

Natural Phenomenon ◽

Monitoring Systems ◽

Data Errors ◽

Redundant Data

Sensor platforms used in environmental monitoring applications are often subject to harsh environmental conditions while monitoring complex phenomena. Therefore, designing dependable monitoring systems is challenging given the external disturbances affecting sensor measurements. Even the apparently simple task of outlier detection in sensor data becomes a hard problem, amplified by the difficulty in distinguishing true data errors due to sensor faults from deviations due to natural phenomenon, which look like data errors. Existing solutions for runtime outlier detection typically assume that the physical processes can be accurately modeled, or that outliers consist in large deviations that are easily detected and filtered by appropriate thresholds. Other solutions assume that it is possible to deploy multiple sensors providing redundant data to support voting-based techniques. In this article, we propose a new methodology for dependable runtime detection of outliers in environmental monitoring systems, aiming to increase data quality by treating them. We propose the use of machine learning techniques to model each sensor behavior, exploiting the existence of correlated data provided by other related sensors. Using these models, along with knowledge of processed past measurements, it is possible to obtain accurate estimations of the observed environment parameters and build failure detectors that use these estimations. When a failure is detected, these estimations also allow one to correct the erroneous measurements and hence improve the overall data quality. Our methodology not only allows one to distinguish truly abnormal measurements from deviations due to complex natural phenomena, but also allows the quantification of each measurement quality, which is relevant from a dependability perspective. We apply the methodology to real datasets from a complex aquatic monitoring system, measuring temperature and salinity parameters, through which we illustrate the process for building the machine learning prediction models using a technique based on Artificial Neural Networks, denoted ANNODE ( ANN Outlier Detection ). From this application, we also observe the effectiveness of our ANNODE approach for accurate outlier detection in harsh environments. Then we validate these positive results by comparing ANNODE with state-of-the-art solutions for outlier detection. The results show that ANNODE improves existing solutions regarding accuracy of outlier detection.

Download Full-text

The utility of multivariate outlier detection techniques for data quality evaluation in large studies: an application within the ONDRI project

BMC Medical Research Methodology ◽

10.1186/s12874-019-0737-5 ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 12

Author(s):

Kelly M. Sunderland ◽

◽

Derek Beaton ◽

Julia Fraser ◽

Donna Kwan ◽

...

Keyword(s):

Data Quality ◽

Outlier Detection ◽

Quality Evaluation ◽

Multivariate Outlier Detection ◽

Detection Techniques

Download Full-text

Outlier detection in relational data: A case study in geographical information systems

Expert Systems with Applications ◽

10.1016/j.eswa.2011.09.125 ◽

2012 ◽

Vol 39 (5) ◽

pp. 4718-4728 ◽

Cited By ~ 9

Author(s):

Joris Maervoet ◽

Celine Vens ◽

Greet Vanden Berghe ◽

Hendrik Blockeel ◽

Patrick De Causmaecker

Keyword(s):

Information Systems ◽

Outlier Detection ◽

Geographical Information Systems ◽

Geographical Information ◽

Relational Data

Download Full-text

Input data quality control for NDNQI national comparative statistics and quarterly reports: a contrast of three robust scale estimators for multiple outlier detection

BMC Research Notes ◽

10.1186/1756-0500-5-456 ◽

2012 ◽

Vol 5 (1) ◽

Cited By ~ 2

Author(s):

Qingjiang Hou ◽

Brandon Crosser ◽

Jonathan D Mahnken ◽

Byron J Gajewski ◽

Nancy Dunton

Keyword(s):

Quality Control ◽

Data Quality ◽

Outlier Detection ◽

Input Data ◽

Data Quality Control

Download Full-text

Extended tuple constraint type as a complex integrity constraint type in XML data model - definition and enforcement

Computer Science and Information Systems ◽

10.2298/csis180324029v ◽

2018 ◽

Vol 15 (3) ◽

pp. 821-843

Author(s):

Jovana Vidakovic ◽

Sonja Ristic ◽

Slavica Kordic ◽

Ivan Lukovic

Keyword(s):

Data Model ◽

Relational Databases ◽

Database Management System ◽

Integrity Constraints ◽

Relational Data ◽

Integrity Constraint ◽

Xml Data ◽

Relational Data Model ◽

Unique Constraint ◽

Schema Languages

A database management system (DBMS) is based on a data model whose concepts are used to express a database schema. Each data model has a specific set of integrity constraint types. There are integrity constraint types, such as key constraint, unique constraint and foreign key constraint that are supported by most DBMSs. Other, more complex constraint types are difficult to express and enforce and are mostly completely disregarded by actual DBMSs. The users have to manage those using custom procedures or triggers. eXtended Markup Language (XML) has become the universal format for representing and exchanging data. Very often XML data are generated from relational databases and exported to a target application or another database. In this context, integrity constraints play the essential role in preserving the original semantics of data. Integrity constraints have been extensively studied in the relational data model. Mechanisms provided by XML schema languages rely on a simple form of constraints that is sufficient neither for expressing semantic constraints commonly found in databases nor for expressing more complex constraints induced by the business rules of the system under study. In this paper we present a classification of constraint types in relational data model, discuss possible declarative mechanisms for their specification and enforcement in the XML data model, and illustrate our approach to the definition and enforcement of complex constraint types in the XML data model on the example of extended tuple constraint type.

Download Full-text

Data Quality Evaluation, Outlier Detection and Missing Data Imputation Methods for IoT in Smart Cities

Studies in Computational Intelligence - Machine Intelligence and Data Analytics for Sustainable Future Smart Cities ◽

10.1007/978-3-030-72065-0_1 ◽

2021 ◽

pp. 1-18

Author(s):

Vera Van Zoest ◽

Xiuming Liu ◽

Edith Ngai

Keyword(s):

Missing Data ◽

Data Quality ◽

Outlier Detection ◽

Quality Evaluation ◽

Smart Cities ◽

Data Imputation ◽

Missing Data Imputation ◽

Imputation Methods

Download Full-text

A Data Quality Management of Chain Stores based on Outlier Detection

Studies in Classification, Data Analysis, and Knowledge Organization - Advanced Studies in Classification and Data Science ◽

10.1007/978-981-15-3311-2_27 ◽

2020 ◽

pp. 341-353

Author(s):

Linh Nguyen ◽

Tsukasa Ishigaki

Keyword(s):

Quality Management ◽

Data Quality ◽

Outlier Detection ◽

Data Quality Management ◽

Chain Stores

Download Full-text

Ensuring high sensor data quality through use of online outlier detection techniques

International Journal of Sensor Networks ◽

10.1504/ijsnet.2010.033116 ◽

2010 ◽

Vol 7 (3) ◽

pp. 141 ◽

Cited By ~ 27

Author(s):

Yang Zhang ◽

Nirvana Meratnia ◽

Paul J.M. Havinga

Keyword(s):

Data Quality ◽

Outlier Detection ◽

Sensor Data ◽

Detection Techniques

Download Full-text

Mining Rare Association Rules on Banpheo Hospital (Public Organization) via Apriori MSG-P Algorithm

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.201262.54337 ◽

1970 ◽

Vol 6 (2) ◽

pp. 156-165

Author(s):

Taweechai Ouypornkochagorn

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Public Organization ◽

Rule Discovery ◽

Rule Mining ◽

Mining Technique ◽

Association Rule Discovery ◽

Hidden Knowledge ◽

The One ◽

Mining Association Rule

Mining association rule is one of the important techniques in data mining to exploit hidden knowledge in large database. Many businesses in several areas need this technique for examine their enormous information, and public health is the one area that highly requires. Several hidden information conceal in daily operation data such as relation between visit time and symptom, relation between disease and patient age, etc. By the way, association rule discovery via traditional Apriori algorithm, the fundamental way to retrieve hidden rules, has to pay with tremendous resources and time. This research implements the modification of association rule mining technique, Apriori MSG-P, in operational database of Banpheo hospital (Public Organization), Sumutsakon province, Thailand. The objectives target on epidemic information and patient behavior on hospital’s services. The research’s outcomes show that our implementation can evaluate a lot of valuable information that can be used by both of operation staffs and executive staffs. Moreover, the research’s outcomes demonstrate that Apriori MSG-P can be the proper one technique that can implement the realworld databases.

Download Full-text