scholarly journals Semantic Integrity Constraint Rule Discovery and Outlier Detection in Relational Data as a Data Quality Mining Technique

2014 ◽  
Vol 88 (6) ◽  
pp. 23-26
Author(s):  
R. VasanthKumarMehta ◽  
S. Rajalakshmi
Sensors ◽  
2017 ◽  
Vol 17 (10) ◽  
pp. 2329 ◽  
Author(s):  
Robert Vasta ◽  
Ian Crandell ◽  
Anthony Millican ◽  
Leanna House ◽  
Eric Smith

2021 ◽  
Vol 5 (3) ◽  
pp. 1-30
Author(s):  
Gonçalo Jesus ◽  
António Casimiro ◽  
Anabela Oliveira

Sensor platforms used in environmental monitoring applications are often subject to harsh environmental conditions while monitoring complex phenomena. Therefore, designing dependable monitoring systems is challenging given the external disturbances affecting sensor measurements. Even the apparently simple task of outlier detection in sensor data becomes a hard problem, amplified by the difficulty in distinguishing true data errors due to sensor faults from deviations due to natural phenomenon, which look like data errors. Existing solutions for runtime outlier detection typically assume that the physical processes can be accurately modeled, or that outliers consist in large deviations that are easily detected and filtered by appropriate thresholds. Other solutions assume that it is possible to deploy multiple sensors providing redundant data to support voting-based techniques. In this article, we propose a new methodology for dependable runtime detection of outliers in environmental monitoring systems, aiming to increase data quality by treating them. We propose the use of machine learning techniques to model each sensor behavior, exploiting the existence of correlated data provided by other related sensors. Using these models, along with knowledge of processed past measurements, it is possible to obtain accurate estimations of the observed environment parameters and build failure detectors that use these estimations. When a failure is detected, these estimations also allow one to correct the erroneous measurements and hence improve the overall data quality. Our methodology not only allows one to distinguish truly abnormal measurements from deviations due to complex natural phenomena, but also allows the quantification of each measurement quality, which is relevant from a dependability perspective. We apply the methodology to real datasets from a complex aquatic monitoring system, measuring temperature and salinity parameters, through which we illustrate the process for building the machine learning prediction models using a technique based on Artificial Neural Networks, denoted ANNODE ( ANN Outlier Detection ). From this application, we also observe the effectiveness of our ANNODE approach for accurate outlier detection in harsh environments. Then we validate these positive results by comparing ANNODE with state-of-the-art solutions for outlier detection. The results show that ANNODE improves existing solutions regarding accuracy of outlier detection.


2012 ◽  
Vol 39 (5) ◽  
pp. 4718-4728 ◽  
Author(s):  
Joris Maervoet ◽  
Celine Vens ◽  
Greet Vanden Berghe ◽  
Hendrik Blockeel ◽  
Patrick De Causmaecker

2018 ◽  
Vol 15 (3) ◽  
pp. 821-843
Author(s):  
Jovana Vidakovic ◽  
Sonja Ristic ◽  
Slavica Kordic ◽  
Ivan Lukovic

A database management system (DBMS) is based on a data model whose concepts are used to express a database schema. Each data model has a specific set of integrity constraint types. There are integrity constraint types, such as key constraint, unique constraint and foreign key constraint that are supported by most DBMSs. Other, more complex constraint types are difficult to express and enforce and are mostly completely disregarded by actual DBMSs. The users have to manage those using custom procedures or triggers. eXtended Markup Language (XML) has become the universal format for representing and exchanging data. Very often XML data are generated from relational databases and exported to a target application or another database. In this context, integrity constraints play the essential role in preserving the original semantics of data. Integrity constraints have been extensively studied in the relational data model. Mechanisms provided by XML schema languages rely on a simple form of constraints that is sufficient neither for expressing semantic constraints commonly found in databases nor for expressing more complex constraints induced by the business rules of the system under study. In this paper we present a classification of constraint types in relational data model, discuss possible declarative mechanisms for their specification and enforcement in the XML data model, and illustrate our approach to the definition and enforcement of complex constraint types in the XML data model on the example of extended tuple constraint type.


Author(s):  
Taweechai Ouypornkochagorn

Mining association rule is one of the important techniques in data mining to exploit hidden knowledge in large database. Many businesses in several areas need this technique for examine their enormous information, and public health is the one area that highly requires. Several hidden information conceal in daily operation data such as relation between visit time and symptom, relation between disease and patient age, etc. By the way, association rule discovery via traditional Apriori algorithm, the fundamental way to retrieve hidden rules, has to pay with tremendous resources and time. This research implements the modification of association rule mining technique, Apriori MSG-P, in operational database of Banpheo hospital (Public Organization), Sumutsakon province, Thailand. The objectives target on epidemic information and patient behavior on hospital’s services. The research’s outcomes show that our implementation can evaluate a lot of valuable information that can be used by both of operation staffs and executive staffs. Moreover, the research’s outcomes demonstrate that Apriori MSG-P can be the proper one technique that can implement the realworld databases.


Sign in / Sign up

Export Citation Format

Share Document