Outlier detection methods are still effective even using virtual species created with the probabilistic approach

Canran Liu; Matt White; Graeme Newell

doi:10.1111/jbi.13872

Identifying mixture copula components using outlier detection methods and goodness-of-fit tests

The Journal of Risk ◽

10.21314/jor.2014.288 ◽

2014 ◽

Vol 16 (4) ◽

pp. 61-101 ◽

Cited By ~ 1

Author(s):

Gregor Weiß

Keyword(s):

Outlier Detection ◽

Goodness Of Fit ◽

Detection Methods ◽

Goodness Of Fit Tests

Download Full-text

Detection and Health Risk Associated with Low Virus Concentration in Drinking Water

Water Science & Technology ◽

10.2166/wst.1985.0100 ◽

1985 ◽

Vol 17 (10) ◽

pp. 97-103 ◽

Cited By ~ 4

Author(s):

P. Payment ◽

M. Trudel

Keyword(s):

Drinking Water ◽

Water Treatment ◽

Health Problem ◽

Conventional Treatment ◽

Probabilistic Approach ◽

Detection Methods ◽

Virus Concentration ◽

The Subject ◽

Treated Drinking Water ◽

Dangerous Level

During the last decade, with the amelioration of the detection methods and the increasing number of studies on the subject, the isolation of viruses in treated drinking water has been reported more frequently than ever. These reports have in common the very low number of viruses isolated and these viruses are usually found only after concentration procedures involving several hundred liters of water. Our own studies have shown that during the conventional treatment of drinking water 99.998% of the indigenous viruses are removed. The residual viral fraction does not exceed 10 viruses per 1 000 liters of water. Using a probabilistic approach this viral concentration in drinking water is well below any dangerous level of enteric viruses in water and the presence of these viruses should not be considered as a health problem but more as the limit of the water treatment methodology.

Download Full-text

Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3441453 ◽

2021 ◽

Vol 15 (4) ◽

pp. 1-20

Author(s):

Georg Steinbuss ◽

Klemens Böhm

Keyword(s):

Outlier Detection ◽

Synthetic Data ◽

Real Data ◽

Detection Methods ◽

Quality Of Data ◽

Benchmark Data ◽

Core Idea ◽

Generic Process ◽

Unsupervised Outlier Detection

Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various and unknown characteristics. Fully synthetic data usually consists of outliers and regular instances with clear characteristics and thus allows for a more meaningful evaluation of detection methods in principle. Nonetheless, there have only been few attempts to include synthetic data in benchmarks for outlier detection. This might be due to the imprecise notion of outliers or to the difficulty to arrive at a good coverage of different domains with synthetic data. In this work, we propose a generic process for the generation of datasets for such benchmarking. The core idea is to reconstruct regular instances from existing real-world benchmark data while generating outliers so that they exhibit insightful characteristics. We propose and describe a generic process for the benchmarking of unsupervised outlier detection, as sketched so far. We then describe three instantiations of this generic process that generate outliers with specific characteristics, like local outliers. To validate our process, we perform a benchmark with state-of-the-art detection methods and carry out experiments to study the quality of data reconstructed in this way. Next to showcasing the workflow, this confirms the usefulness of our proposed process. In particular, our process yields regular instances close to the ones from real data. Summing up, we propose and validate a new and practical process for the benchmarking of unsupervised outlier detection.

Download Full-text

Outlier Detection Techniques for Data Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch228 ◽

2011 ◽

pp. 1483-1488

Author(s):

Fabrizio Angiulli

Keyword(s):

Data Mining ◽

Outlier Detection ◽

Credit Card ◽

Detection Methods ◽

Distribution Model ◽

Main Task ◽

Data Set ◽

Homogeneous Groups ◽

Definition Of ◽

Dependency Detection

Data mining techniques can be grouped in four main categories: clustering, classification, dependency detection, and outlier detection. Clustering is the process of partitioning a set of objects into homogeneous groups, or clusters. Classification is the task of assigning objects to one of several predefined categories. Dependency detection searches for pairs of attribute sets which exhibit some degree of correlation in the data set at hand. The outlier detection task can be defined as follows: “Given a set of data points or objects, find the objects that are considerably dissimilar, exceptional or inconsistent with respect to the remaining data”. These exceptional objects as also referred to as outliers. Most of the early methods for outlier identification have been developed in the field of statistics (Hawkins, 1980; Barnett & Lewis, 1994). Hawkins’ definition of outlier clarifies the approach: “An outlier is an observation that deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism”. Indeed, statistical techniques assume that the given data set has a distribution model. Outliers are those points that satisfy a discordancy test, that is, that are significantly far from what would be their expected position given the hypothesized distribution. Many clustering, classification and dependency detection methods produce outliers as a by-product of their main task. For example, in classification, mislabeled objects are considered outliers and thus they are removed from the training set to improve the accuracy of the resulting classifier, while in clustering, objects that do not strongly belong to any cluster are considered outliers. Nevertheless, it must be said that searching for outliers through techniques specifically designed for tasks different from outlier detection could not be advantageous. As an example, clusters can be distorted by outliers and, thus, the quality of the outliers returned is affected by their presence. Moreover, other than returning a solution of higher quality, outlier detection algorithms can be vastly more efficient than non ad-hoc algorithms. While in many contexts outliers are considered as noise that must be eliminated, as pointed out elsewhere, “one person’s noise could be another person’s signal”, and thus outliers themselves can be of great interest. Outlier mining is used in telecom or credit card frauds to detect the atypical usage of telecom services or credit cards, in intrusion detection for detecting unauthorized accesses, in medical analysis to test abnormal reactions to new medical therapies, in marketing and customer segmentations to identify customers spending much more or much less than average customer, in surveillance systems, in data cleaning, and in many other fields.

Download Full-text

Outlier Detection Methods for Uncovering of Critical Events in Historical Phasor Measurement Records

E3S Web of Conferences ◽

10.1051/e3sconf/20186408006 ◽

2018 ◽

Vol 64 ◽

pp. 08006 ◽

Cited By ~ 1

Author(s):

Kummerow André ◽

Nicolai Steffen ◽

Bretschneider Peter

Keyword(s):

Power Systems ◽

Outlier Detection ◽

Training Data ◽

Detection Methods ◽

Data Sets ◽

Critical Events ◽

Failure Patterns ◽

Detection Algorithms ◽

Reduction Techniques ◽

Dimension Reduction Techniques

The scope of this survey is the uncovering of potential critical events from mixed PMU data sets. An unsupervised procedure is introduced with the use of different outlier detection methods. For that, different techniques for signal analysis are used to generate features in time and frequency domain as well as linear and non-linear dimension reduction techniques. That approach enables the exploration of critical grid dynamics in power systems without prior knowledge about existing failure patterns. Furthermore new failure patterns can be extracted for the creation of training data sets used for online detection algorithms.

Download Full-text

Embedding-Based Complex Feature Value Coupling Learning for Detecting Outliers in Non-IID Categorical Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015541 ◽

2019 ◽

Vol 33 ◽

pp. 5541-5548 ◽

Cited By ~ 2

Author(s):

Hongzuo Xu ◽

Yongjun Wang ◽

Zhiyue Wu ◽

Yijie Wang

Keyword(s):

Outlier Detection ◽

Categorical Data ◽

State Of The Art ◽

High Order ◽

Detection Methods ◽

Order Complex ◽

Value Network ◽

Learning Framework ◽

A Value ◽

Real World Datasets

Non-IID categorical data is ubiquitous and common in realworld applications. Learning various kinds of couplings has been proved to be a reliable measure when detecting outliers in such non-IID data. However, it is a critical yet challenging problem to model, represent, and utilise high-order complex value couplings. Existing outlier detection methods normally only focus on pairwise primary value couplings and fail to uncover real relations that hide in complex couplings, resulting in suboptimal and unstable performance. This paper introduces a novel unsupervised embedding-based complex value coupling learning framework EMAC and its instance SCAN to address these issues. SCAN first models primary value couplings. Then, coupling bias is defined to capture complex value couplings with different granularities and highlight the essence of outliers. An embedding method is performed on the value network constructed via biased value couplings, which further learns high-order complex value couplings and embeds these couplings into a value representation matrix. Bidirectional selective value coupling learning is proposed to show how to estimate value and object outlierness through value couplings. Substantial experiments show that SCAN (i) significantly outperforms five state-of-the-art outlier detection methods on thirteen real-world datasets; and (ii) has much better resilience to noise than its competitors.

Download Full-text

Outlier detection methods to improve the quality of citizen science data

International Journal of Biometeorology ◽

10.1007/s00484-020-01968-z ◽

2020 ◽

Vol 64 (11) ◽

pp. 1825-1833

Author(s):

Jennifer S. Li ◽

Andreas Hamann ◽

Elisabeth Beaubien

Keyword(s):

Outlier Detection ◽

Citizen Science ◽

Detection Methods ◽

Science Data

Download Full-text

Nullifying Malicious Users for Cooperative Spectrum Sensing in Cognitive Radio Networks Using Outlier Detection Methods

Ubiquitous Computing Application and Wireless Sensor - Lecture Notes in Electrical Engineering ◽

10.1007/978-94-017-9618-7_12 ◽

2015 ◽

pp. 123-131 ◽

Cited By ~ 2

Author(s):

Prakash Prasain ◽

Dong-You Choi

Keyword(s):

Cognitive Radio ◽

Outlier Detection ◽

Spectrum Sensing ◽

Cognitive Radio Networks ◽

Cooperative Spectrum Sensing ◽

Radio Networks ◽

Detection Methods

Download Full-text

Detection of bad data images in long-term MODIS land surface temperature image time series using statistical outlier detection methods

Journal of Applied Remote Sensing ◽

10.1117/1.jrs.13.048504 ◽

2019 ◽

Vol 13 (04) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Ritesh Mujawdiya ◽

Rajat S. Chatterjee ◽

Dheeraj Kumar ◽

Narendra Singh

Keyword(s):

Time Series ◽

Surface Temperature ◽

Outlier Detection ◽

Land Surface Temperature ◽

Land Surface ◽

Detection Methods ◽

Modis Land Surface Temperature ◽

Bad Data

Download Full-text

Outlier Detection in Growth Data: Beyond Biologically Implausible Values

Current Developments in Nutrition ◽

10.1093/cdn/nzaa056_021 ◽

2020 ◽

Vol 4 (Supplement_2) ◽

pp. 1174-1174

Author(s):

Paraskevi Massara ◽

Robert Bandsma ◽

Celine Bourdon ◽

Jonathon Maguire ◽

Elena Comelli ◽

...

Keyword(s):

Outlier Detection ◽

Sensitivity And Specificity ◽

Detection Method ◽

Nutritional Assessment ◽

Empirical Method ◽

Child Growth ◽

Detection Methods ◽

Healthy Children ◽

Growth Data ◽

Growth Standards

Abstract Objectives Eliminating anthropometry measurement error and employing outlier and biological implausible values (BIV) detection methods adapted to longitudinal measurements is important for the study of growth. This work aimed to review and assess the accuracy of the available BIV and outlier detection methods and propose a growth trajectory outlier detection method. Methods We included 2354 infants from the Applied Research Group for Kids (TARGet Kids! ) cohort-based in Toronto (ON, Canada) that recruits healthy children from birth to 5 years of age. We considered infants with at least 8 length and weight measurements available between the 1st and the 24th month of age. Weight-for-length z-scores (wflz) were calculated using the WHO growth standards. Outlier measurements were randomly introduced in 5% of the wflz measurements using a normal distribution (μ = 0, σ = 1). We employed 4 outlier detection methods; an empirical detection method for BIV using the cut-offs derived from the WHO Child Growth Standards, a clustering method, a method based on cluster prototypes for individual outlier measurements and a method based on cluster prototypes for entire growth trajectories. Each method was applied individually and evaluated using the sensitivity and specificity indexes based on the manually introduced outliers. We also calculated the Kappa statistic to evaluate the agreement of each method against the manual outliers. Results After excluding premature (<37 weeks), low birth weight (<1500 g) neonates and children with missing length and weight measurements, we analyzed 393 children with a total of 3144 measurements. Sensitivity and specificity for the four methods ranged between 4.4%–55.0% and 83.7% −99.7%, respectively, with kappa being non-significant (P > 0.05) only for the empirical. The clustering detection method reported a higher finding rate, while the empirical method found most of the BIV, but few of the rest of the outliers. Conclusions BIV account for a small portion of the possible outliers in growth datasets. We show that additional statistical or model-based methods are required for a more comprehensive outlier detection process, which has implications for growth analysis and nutritional assessment. Funding Sources Joannah and Brian Lawson Center for Child Nutrition, Connaught Fund, Onassis Foundation.

Download Full-text