Partition-Aware Scalable Outlier Detection Using Unsupervised Learning

Author(s):  
Pallabi Parveen ◽  
Melissa Lee ◽  
Austin Henslee ◽  
Matt Dugan ◽  
Brad Ford
2019 ◽  
Vol 17 (2) ◽  
pp. 272-280
Author(s):  
Adeel Hashmi ◽  
Tanvir Ahmad

Anomaly/Outlier detection is the process of finding abnormal data points in a dataset or data stream. Most of the anomaly detection algorithms require setting of some parameters which significantly affect the performance of the algorithm. These parameters are generally set by hit-and-trial; hence performance is compromised with default or random values. In this paper, the authors propose a self-optimizing algorithm for anomaly detection based on firefly meta-heuristic, and named as Firefly Algorithm for Anomaly Detection (FAAD). The proposed solution is a non-clustering unsupervised learning approach for anomaly detection. The algorithm is implemented on Apache Spark for scalability and hence the solution can handle big data as well. Experiments were conducted on various datasets, and the results show that the proposed solution is much accurate than the standard algorithms of anomaly detection.


2021 ◽  
Vol 33 (6) ◽  
pp. 265-274
Author(s):  
Hyeon-Jae Kim ◽  
Dong-Hoon Kim ◽  
Chaewook Lim ◽  
Youngtak Shin ◽  
Sang-Chul Lee ◽  
...  

Outlier detection research in ocean data has traditionally been performed using statistical and distance-based machine learning algorithms. Recently, AI-based methods have received a lot of attention and so-called supervised learning methods that require classification information for data are mainly used. This supervised learning method requires a lot of time and costs because classification information (label) must be manually designated for all data required for learning. In this study, an autoencoder based on unsupervised learning was applied as an outlier detection to overcome this problem. For the experiment, two experiments were designed: one is univariate learning, in which only SST data was used among the observation data of Deokjeok Island and the other is multivariate learning, in which SST, air temperature, wind direction, wind speed, air pressure, and humidity were used. Period of data is 25 years from 1996 to 2020, and a pre-processing considering the characteristics of ocean data was applied to the data. An outlier detection of actual SST data was tried with a learned univariate and multivariate autoencoder. We tried to detect outliers in real SST data using trained univariate and multivariate autoencoders. To compare model performance, various outlier detection methods were applied to synthetic data with artificially inserted errors. As a result of quantitatively evaluating the performance of these methods, the multivariate/univariate accuracy was about 96%/91%, respectively, indicating that the multivariate autoencoder had better outlier detection performance. Outlier detection using an unsupervised learning-based autoencoder is expected to be used in various ways in that it can reduce subjective classification errors and cost and time required for data labeling.


2012 ◽  
Vol 2 (3) ◽  
pp. 98-101 ◽  
Author(s):  
E.Sateesh E.Sateesh ◽  
◽  
M.L.Prasanthi M.L.Prasanthi

Sign in / Sign up

Export Citation Format

Share Document