scholarly journals Outlier Detection Methods for Industrial Applications

Author(s):  
Silvia Cateni ◽  
Valentina Colla ◽  
Marco Vannucci
2021 ◽  
Vol 15 (4) ◽  
pp. 1-20
Author(s):  
Georg Steinbuss ◽  
Klemens Böhm

Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various and unknown characteristics. Fully synthetic data usually consists of outliers and regular instances with clear characteristics and thus allows for a more meaningful evaluation of detection methods in principle. Nonetheless, there have only been few attempts to include synthetic data in benchmarks for outlier detection. This might be due to the imprecise notion of outliers or to the difficulty to arrive at a good coverage of different domains with synthetic data. In this work, we propose a generic process for the generation of datasets for such benchmarking. The core idea is to reconstruct regular instances from existing real-world benchmark data while generating outliers so that they exhibit insightful characteristics. We propose and describe a generic process for the benchmarking of unsupervised outlier detection, as sketched so far. We then describe three instantiations of this generic process that generate outliers with specific characteristics, like local outliers. To validate our process, we perform a benchmark with state-of-the-art detection methods and carry out experiments to study the quality of data reconstructed in this way. Next to showcasing the workflow, this confirms the usefulness of our proposed process. In particular, our process yields regular instances close to the ones from real data. Summing up, we propose and validate a new and practical process for the benchmarking of unsupervised outlier detection.


Author(s):  
Fabrizio Angiulli

Data mining techniques can be grouped in four main categories: clustering, classification, dependency detection, and outlier detection. Clustering is the process of partitioning a set of objects into homogeneous groups, or clusters. Classification is the task of assigning objects to one of several predefined categories. Dependency detection searches for pairs of attribute sets which exhibit some degree of correlation in the data set at hand. The outlier detection task can be defined as follows: “Given a set of data points or objects, find the objects that are considerably dissimilar, exceptional or inconsistent with respect to the remaining data”. These exceptional objects as also referred to as outliers. Most of the early methods for outlier identification have been developed in the field of statistics (Hawkins, 1980; Barnett & Lewis, 1994). Hawkins’ definition of outlier clarifies the approach: “An outlier is an observation that deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism”. Indeed, statistical techniques assume that the given data set has a distribution model. Outliers are those points that satisfy a discordancy test, that is, that are significantly far from what would be their expected position given the hypothesized distribution. Many clustering, classification and dependency detection methods produce outliers as a by-product of their main task. For example, in classification, mislabeled objects are considered outliers and thus they are removed from the training set to improve the accuracy of the resulting classifier, while in clustering, objects that do not strongly belong to any cluster are considered outliers. Nevertheless, it must be said that searching for outliers through techniques specifically designed for tasks different from outlier detection could not be advantageous. As an example, clusters can be distorted by outliers and, thus, the quality of the outliers returned is affected by their presence. Moreover, other than returning a solution of higher quality, outlier detection algorithms can be vastly more efficient than non ad-hoc algorithms. While in many contexts outliers are considered as noise that must be eliminated, as pointed out elsewhere, “one person’s noise could be another person’s signal”, and thus outliers themselves can be of great interest. Outlier mining is used in telecom or credit card frauds to detect the atypical usage of telecom services or credit cards, in intrusion detection for detecting unauthorized accesses, in medical analysis to test abnormal reactions to new medical therapies, in marketing and customer segmentations to identify customers spending much more or much less than average customer, in surveillance systems, in data cleaning, and in many other fields.


2018 ◽  
Vol 64 ◽  
pp. 08006 ◽  
Author(s):  
Kummerow André ◽  
Nicolai Steffen ◽  
Bretschneider Peter

The scope of this survey is the uncovering of potential critical events from mixed PMU data sets. An unsupervised procedure is introduced with the use of different outlier detection methods. For that, different techniques for signal analysis are used to generate features in time and frequency domain as well as linear and non-linear dimension reduction techniques. That approach enables the exploration of critical grid dynamics in power systems without prior knowledge about existing failure patterns. Furthermore new failure patterns can be extracted for the creation of training data sets used for online detection algorithms.


2020 ◽  
Vol 21 (13) ◽  
pp. 4574
Author(s):  
Elena Rosini ◽  
Paola D’Antona ◽  
Loredano Pollegioni

D-enantiomers of amino acids (D-AAs) are only present in low amounts in nature, frequently at trace levels, and for this reason, their biological function was undervalued for a long time. In the past 25 years, the improvements in analytical methods, such as gas chromatography, HPLC, and capillary electrophoresis, allowed to detect D-AAs in foodstuffs and biological samples and to attribute them specific biological functions in mammals. These methods are time-consuming, expensive, and not suitable for online application; however, life science investigations and industrial applications require rapid and selective determination of D-AAs, as only biosensors can offer. In the present review, we provide a status update concerning biosensors for detecting and quantifying D-AAs and their applications for safety and quality of foods, human health, and neurological research. The review reports the main challenges in the field, such as selectivity, in order to distinguish the different D-AAs present in a solution, the simultaneous assay of both L- and D-AAs, the production of implantable devices, and surface-scanning biosensors. These innovative tools will push future research aimed at investigating the neurological role of D-AAs, a vibrant field that is growing at an accelerating pace.


Author(s):  
Hongzuo Xu ◽  
Yongjun Wang ◽  
Zhiyue Wu ◽  
Yijie Wang

Non-IID categorical data is ubiquitous and common in realworld applications. Learning various kinds of couplings has been proved to be a reliable measure when detecting outliers in such non-IID data. However, it is a critical yet challenging problem to model, represent, and utilise high-order complex value couplings. Existing outlier detection methods normally only focus on pairwise primary value couplings and fail to uncover real relations that hide in complex couplings, resulting in suboptimal and unstable performance. This paper introduces a novel unsupervised embedding-based complex value coupling learning framework EMAC and its instance SCAN to address these issues. SCAN first models primary value couplings. Then, coupling bias is defined to capture complex value couplings with different granularities and highlight the essence of outliers. An embedding method is performed on the value network constructed via biased value couplings, which further learns high-order complex value couplings and embeds these couplings into a value representation matrix. Bidirectional selective value coupling learning is proposed to show how to estimate value and object outlierness through value couplings. Substantial experiments show that SCAN (i) significantly outperforms five state-of-the-art outlier detection methods on thirteen real-world datasets; and (ii) has much better resilience to noise than its competitors.


2020 ◽  
Vol 64 (11) ◽  
pp. 1825-1833
Author(s):  
Jennifer S. Li ◽  
Andreas Hamann ◽  
Elisabeth Beaubien

2014 ◽  
Vol 602-605 ◽  
pp. 1594-1597
Author(s):  
Han Xin Chen ◽  
Shi Qi Yang

This paper investigated the ultrasonic mechanism of Time of Flight Diffraction (TOFD) by finite element analysis for the better applications of ultrasonic TOFD (Time of Flight Diffraction) detection technology. The welding steel plate with the artificial defects is used in the finite element analysis model. The experimental A-scan signal with higher noise is filtered by the wavelet transform, which can clearly show defective diffracted wave. The software simulation of ultrasound is used to present the propagation process of ultrasonic signal inside the sample. Simulation results are compared with the experimental results, which shows valid basis for the practical TOFD ultrasonic detection methods in industrial applications.


Author(s):  
Noam Amir ◽  
Oded Barzelay ◽  
Amir Yefet ◽  
Tal Pechter

Acoustic Pulse Reflectometry (APR) has been applied extensively to tubular systems in research laboratories, for purposes of measuring input impedance, bore reconstruction, and fault detection. Industrial applications have been mentioned in the literature, though they have not been widely implemented. Academic APR systems are extremely bulky, often employing source tubes of six meters in length, which limits their industrial use severely. Furthermore, leak detection methods described in the literature are based on indirect methods, by carrying out bore reconstruction and finding discrepancies between the expected and reconstructed bore. In this paper we describe an APR system designed specifically for detecting faults commonly found in industrial tube systems: leaks, increases in internal diameter caused by wall thinning, and constrictions. The system employs extremely short source tubes, on the order of 20cm, making it extremely portable, but creating a large degree of overlap between forward and backward propagating waves in the system. A series of algorithmic innovations enable the system to perform the wave separation mathematically, and then identify the above faults automatically, with a measurement time on the order of 10 seconds per tube. We present several case studies of condenser tube inspection, showing how different faults are identified and reported.


Sign in / Sign up

Export Citation Format

Share Document