Multi-Level Clustering-Based Outlier’s Detection (MCOD) Using Self-Organizing Maps

Menglu Li; Rasha Kashef; Ahmed Ibrahim

doi:10.3390/bdcc4040024

Multi-Level Clustering-Based Outlier’s Detection (MCOD) Using Self-Organizing Maps

Big Data and Cognitive Computing ◽

10.3390/bdcc4040024 ◽

2020 ◽

Vol 4 (4) ◽

pp. 24

Author(s):

Menglu Li ◽

Rasha Kashef ◽

Ahmed Ibrahim

Keyword(s):

Outlier Detection ◽

Detection Method ◽

Detection Algorithm ◽

Detection Methods ◽

Self Organizing Maps ◽

Business Outcomes ◽

Online Transactions ◽

Business Applications ◽

Efficient Detection ◽

Multi Level

Outlier detection is critical in many business applications, as it recognizes unusual behaviours to prevent losses and optimize revenue. For example, illegitimate online transactions can be detected based on its pattern with outlier detection. The performance of existing outlier detection methods is limited by the pattern/behaviour of the dataset; these methods may not perform well without prior knowledge of the dataset. This paper proposes a multi-level outlier detection algorithm (MCOD) that uses multi-level unsupervised learning to cluster the data and discover outliers. The proposed detection method is tested on datasets in different fields with different sizes and dimensions. Experimental analysis has shown that the proposed MCOD algorithm has the ability to improving the outlier detection rate, as compared to the traditional anomaly detection methods. Enterprises and organizations can adopt the proposed MCOD algorithm to ensure a sustainable and efficient detection of frauds/outliers to increase profitability (and/or) to enhance business outcomes.

Download Full-text

A multiple pattern complex event detection scheme based on decomposition and merge sharing for massive event streams

International Journal of Distributed Sensor Networks ◽

10.1177/1550147720961336 ◽

2020 ◽

Vol 16 (10) ◽

pp. 155014772096133

Author(s):

Jianhua Wang ◽

Bang Ji ◽

Feng Lin ◽

Shilei Lu ◽

Yubin Lan ◽

...

Keyword(s):

Event Detection ◽

Detection Method ◽

Detection Algorithm ◽

Detection Methods ◽

Detection Scheme ◽

Event Stream ◽

Detection Model ◽

Efficient Detection ◽

High Efficient ◽

Complex Events

Quickly detecting related primitive events for multiple complex events from massive event stream usually faces with a great challenge due to their single pattern characteristic of the existing complex event detection methods. Aiming to solve the problem, a multiple pattern complex event detection scheme based on decomposition and merge sharing is proposed in this article. The achievement of this article lies that we successfully use decomposition and merge sharing technology to realize the high-efficient detection for multiple complex events from massive event streams. Specially, in our scheme, we first use decomposition sharing technology to decompose pattern expressions into multiple subexpressions, which can provide many sharing opportunities for subexpressions. We then use merge sharing technology to construct a multiple pattern complex events by merging sharing all the same prefix, suffix, or subpattern into one based on the above decomposition results. As a result, our proposed detection method in this article can effectively solve the above problem. The experimental results show that the proposed detection method in this article outperforms some general detection methods in detection model and detection algorithm in multiple pattern complex event detection as a whole.

Download Full-text

Data Streams Oriented Outlier Detection Method: A Fast Minimal Infrequent Pattern Mining

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/6/14 ◽

2021 ◽

Author(s):

ZhongYu Zhou ◽

DeChang Pi

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Pattern Mining ◽

Detection Method ◽

Detection Algorithm ◽

Detection Methods ◽

Mining Method ◽

Telemetry Data ◽

Process Data ◽

Mining Data Streams

Outlier detection is a common method for analyzing data streams. In the existing outlier detection methods, most of methods compute distance of points to solve certain specific outlier detection problems. However, these methods are computationally expensive and cannot process data streams quickly. The outlier detection method based on pattern mining resolves the aforementioned issues, but the existing methods are inefficient and cannot meet requirements of quickly mining data streams. In order to improve the efficiency of the method, a new outlier detection method is proposed in this paper. First, a fast minimal infrequent pattern mining method is proposed to mine the minimal infrequent pattern from data streams. Second, an efficient outlier detection algorithm based on minimal infrequent pattern is proposed for detecting the outliers in the data streams by mining minimal infrequent pattern. The algorithm proposed in this paper is demonstrated by real telemetry data of a satellite in orbit. The experimental results show that the proposed method not only can be applied to satellite outlier detection, but also is superior to the existing methods.

Download Full-text

Outlier Detection in Growth Data: Beyond Biologically Implausible Values

Current Developments in Nutrition ◽

10.1093/cdn/nzaa056_021 ◽

2020 ◽

Vol 4 (Supplement_2) ◽

pp. 1174-1174

Author(s):

Paraskevi Massara ◽

Robert Bandsma ◽

Celine Bourdon ◽

Jonathon Maguire ◽

Elena Comelli ◽

...

Keyword(s):

Outlier Detection ◽

Sensitivity And Specificity ◽

Detection Method ◽

Nutritional Assessment ◽

Empirical Method ◽

Child Growth ◽

Detection Methods ◽

Healthy Children ◽

Growth Data ◽

Growth Standards

Abstract Objectives Eliminating anthropometry measurement error and employing outlier and biological implausible values (BIV) detection methods adapted to longitudinal measurements is important for the study of growth. This work aimed to review and assess the accuracy of the available BIV and outlier detection methods and propose a growth trajectory outlier detection method. Methods We included 2354 infants from the Applied Research Group for Kids (TARGet Kids! ) cohort-based in Toronto (ON, Canada) that recruits healthy children from birth to 5 years of age. We considered infants with at least 8 length and weight measurements available between the 1st and the 24th month of age. Weight-for-length z-scores (wflz) were calculated using the WHO growth standards. Outlier measurements were randomly introduced in 5% of the wflz measurements using a normal distribution (μ = 0, σ = 1). We employed 4 outlier detection methods; an empirical detection method for BIV using the cut-offs derived from the WHO Child Growth Standards, a clustering method, a method based on cluster prototypes for individual outlier measurements and a method based on cluster prototypes for entire growth trajectories. Each method was applied individually and evaluated using the sensitivity and specificity indexes based on the manually introduced outliers. We also calculated the Kappa statistic to evaluate the agreement of each method against the manual outliers. Results After excluding premature (<37 weeks), low birth weight (<1500 g) neonates and children with missing length and weight measurements, we analyzed 393 children with a total of 3144 measurements. Sensitivity and specificity for the four methods ranged between 4.4%–55.0% and 83.7% −99.7%, respectively, with kappa being non-significant (P > 0.05) only for the empirical. The clustering detection method reported a higher finding rate, while the empirical method found most of the BIV, but few of the rest of the outliers. Conclusions BIV account for a small portion of the possible outliers in growth datasets. We show that additional statistical or model-based methods are required for a more comprehensive outlier detection process, which has implications for growth analysis and nutritional assessment. Funding Sources Joannah and Brian Lawson Center for Child Nutrition, Connaught Fund, Onassis Foundation.

Download Full-text

A Parameter-Free Outlier Detection Algorithm Based on Dataset Optimization Method

Information ◽

10.3390/info11010026 ◽

2019 ◽

Vol 11 (1) ◽

pp. 26

Author(s):

Liying Wang ◽

Lei Shi ◽

Liancheng Xu ◽

Peiyu Liu ◽

Lindong Zhang ◽

...

Keyword(s):

Outlier Detection ◽

Detection Method ◽

Computational Cost ◽

Optimization Method ◽

Detection Algorithm ◽

Threshold Function ◽

Detection Algorithms ◽

Original Dataset ◽

Data Clusters ◽

High Computational Cost

Recently, outlier detection has widespread applications in different areas. The task is to identify outliers in the dataset and extract potential information. The existing outlier detection algorithms mainly do not solve the problems of parameter selection and high computational cost, which leaves enough room for further improvements. To solve the above problems, our paper proposes a parameter-free outlier detection algorithm based on dataset optimization method. Firstly, we propose a dataset optimization method (DOM), which initializes the original dataset in which density is greater than a specific threshold. In this method, we propose the concepts of partition function (P) and threshold function (T). Secondly, we establish a parameter-free outlier detection method. Similarly, we propose the concept of the number of residual neighbors, as the number of residual neighbors and the size of data clusters are used as the basis of outlier detection to obtain a more accurate outlier set. Finally, extensive experiments are carried out on a variety of datasets and experimental results show that our method performs well in terms of the efficiency of outlier detection and time complexity.

Download Full-text

A Review on Outliers-Detection Methods for Multivariate Data

Journal of Statistical Modelling and Analytics ◽

10.22452/josma.vol3no1.1 ◽

2021 ◽

Vol 3 (1) ◽

pp. 1-15

Author(s):

Sharifah Sakinah Syed Abd Mutalib ◽

Siti Zanariah Satari ◽

Wan Nur Syahidah Wan Yusoff

Keyword(s):

Cluster Analysis ◽

Multivariate Analysis ◽

Outlier Detection ◽

High Dimension ◽

Detection Method ◽

Multivariate Data ◽

Projection Pursuit ◽

Detection Methods ◽

And Cluster Analysis ◽

Detection Of Outliers

Data in practice are often of high dimension and multivariate in nature. Detection of outliers has been one of the problems in multivariate analysis. Detecting outliers in multivariate data is difficult and it is not sufficient by using only graphical inspection. In this paper, a nontechnical and brief outlier detection method for multivariate data which are projection pursuit method, methods based on robust distance and cluster analysis are reviewed. The strengths and weaknesses of each method are briefly discussed.

Download Full-text

Outlier Detection Method for Flash Flood Disaster Monitoring Data based on Information Entropy

Journal of Physics Conference Series ◽

10.1088/1742-6596/2138/1/012013 ◽

2021 ◽

Vol 2138 (1) ◽

pp. 012013

Author(s):

Yongzhi Chen ◽

Ziao Xu ◽

Chaoqun Niu

Keyword(s):

Outlier Detection ◽

Information Entropy ◽

Detection Method ◽

Flash Flood ◽

False Positive Rate ◽

Flood Disaster ◽

Detection Methods ◽

Positive Rate ◽

Disaster Monitoring ◽

Local Outlier

Abstract In the research of flash flood disaster monitoring and early warning, the Internet of Things is widely used in real-time information collection. There are abnormal situations such as noise, repetition and errors in a large amount of data collected by sensors, which will lead to false alarm, lower prediction accuracy and other problems. Aiming at the characteristic that outliers flow of sensors will cause obvious fluctuation of information entropy, this paper proposes a local outlier detection method based on information entropy and optimized by sliding window and LOF (Local Outlier Factor). This method can be used to improve the data quality, thus improving the accuracy of disaster prediction. The method is applied to data stream processing of water sensor, and the experimental results show that the method can accurately detect outliers. Compared with the existing detection methods that only use data distance to determine, the test positive rate is improved and the false positive rate is reduced.

Download Full-text

Establishment and Application of Reverse Transcription Recombinase-Aided Amplification as A Rapid Detection Method for SARS-COV-2

10.21203/rs.3.rs-92818/v1 ◽

2020 ◽

Author(s):

Lele Ai ◽

Wei Liu ◽

Fuqiang Ye ◽

Chenxi Ding ◽

Han Dai ◽

...

Keyword(s):

Nucleic Acid ◽

Reverse Transcription ◽

Rapid Detection ◽

Detection Method ◽

Detection Methods ◽

Influenza B Virus ◽

Influenza B ◽

The Novel ◽

Efficient Detection ◽

Novel Coronavirus

Abstract Background: By the end of August 2020, >23 million cases and 800,000 deaths were attributed to SARS-CoV-2 in >200 countries. The improvement of simple, rapid, and efficient detection methods is of great significance for the early detection, timely isolation, and protection of susceptible populations. This study aimed to provide an alternative method for the rapid detection of viral nucleic acid.Methods: This study provided a rapid nucleic acid detection method mediated by recombinant enzyme based on the novel coronavirus (SARS-CoV-2). Primers and probes were designed based on the N gene sequence of coronavirus. The method was performed at 39 °C, the detection time was short (<20 min), and the detection limit was up to 101 copies/mL.Results: The primer-probe did not show any cross-reaction with adenovirus, Zika virus, influenza B virus, and chikungunya virus, with good specificity. A total of 106 clinical throat swab samples were compared by reverse transcription recombinase-aided amplification (RT-RAA) and commercial reverse transcription-quantitative real-time polymerase chain reaction (RT-qPCR); the results were identical.Conclusions: The novel coronavirus RT-RAA method established in this study had high sensitivity, strong specificity, simple operation, and fast detection speed, and hence, is suitable for the rapid detection of novel coronavirus under the current epidemic situation.

Download Full-text

Anomaly Pattern Detection in Streaming Data Based on the Transformation to Multiple Binary-Valued Data Streams

Journal of Artificial Intelligence and Soft Computing Research ◽

10.2478/jaiscr-2022-0002 ◽

2021 ◽

Vol 12 (1) ◽

pp. 19-27

Author(s):

Taegong Kim ◽

Cheong Hee Park

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Data Stream ◽

Detection Method ◽

Binary Classification ◽

Streaming Data ◽

Pattern Detection ◽

Detection Methods ◽

Anomaly Pattern ◽

Isolation Forest

Abstract Anomaly pattern detection in a data stream aims to detect a time point where outliers begin to occur abnormally. Recently, a method for anomaly pattern detection has been proposed based on binary classification for outliers and statistical tests in the data stream of binary labels of normal or an outlier. It showed that an anomaly pattern can be detected accurately even when outlier detection performance is relatively low. However, since the anomaly pattern detection method is based on the binary classification for outliers, most well-known outlier detection methods, with the output of real-valued outlier scores, can not be used directly. In this paper, we propose an anomaly pattern detection method in a data stream using the transformation to multiple binary-valued data streams from real-valued outlier scores. By using three outlier detection methods, Isolation Forest(IF), Autoencoder-based outlier detection, and Local outlier factor(LOF), the proposed anomaly pattern detection method is tested using artificial and real data sets. The experimental results show that anomaly pattern detection using Isolation Forest gives the best performance.

Download Full-text

A Framework to Test Resistency of Detection Algorithms for Stepping-Stone Intrusion on Time-Jittering Manipulation

Wireless Communications and Mobile Computing ◽

10.1155/2021/1807509 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Lixin Wang ◽

Jianhua Yang ◽

Michael Workman ◽

Peng-Jun Wan

Keyword(s):

Data Stream ◽

Detection Method ◽

Detection Algorithm ◽

Efficient Algorithms ◽

Detection Methods ◽

The Internet ◽

Stepping Stones ◽

Stepping Stone ◽

Detection Algorithms ◽

A Chain

Hackers on the Internet usually send attacking packets using compromised hosts, called stepping-stones, in order to avoid being detected and caught. With stepping-stone attacks, an intruder remotely logins these stepping-stones using programs like SSH or telnet, uses a chain of Internet hosts as relay machines, and then sends the attacking packets. A great number of detection approaches have been developed for stepping-stone intrusion (SSI) in the literature. Many of these existing detection methods worked effectively only when session manipulation by intruders is not present. When the session is manipulated by attackers, there are few known effective detection methods for SSI. It is important to know whether a detection algorithm for SSI is resistant on session manipulation by attackers. For session manipulation with chaff perturbation, software tools such as Scapy can be used to inject meaningless packets into a data stream. However, to the best of our knowledge, there are no existing effective tools or efficient algorithms to produce time-jittered network traffic that can be used to test whether an SSI detection method is resistant on intruders’ time-jittering manipulation. In this paper, we propose a framework to test resistency of detection algorithms for SSI on time-jittering manipulation. Our proposed framework can be used to test whether an existing or new SSI detection method is resistant on session manipulation by intruders with time-jittering.

Download Full-text

Detection of Outliers in a Time Series of Available Parking Spaces

Mathematical Problems in Engineering ◽

10.1155/2013/416267 ◽

2013 ◽

Vol 2013 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Yanjie Ji ◽

Dounan Tang ◽

Weihong Guo ◽

Phil T. Blythe ◽

Gang Ren

Keyword(s):

Outlier Detection ◽

Detection Method ◽

Experimental Tests ◽

Detection Algorithm ◽

Phase Detection ◽

Real World Data ◽

Two Phase ◽

Real Time Information ◽

The City ◽

Parking Facilities

With the provision of any source of real-time information, the timeliness and accuracy of the data provided are paramount to the effectiveness and success of the system and its acceptance by the users. In order to improve the accuracy and reliability of parking guidance systems (PGSs), the technique of outlier mining has been introduced for detecting and analysing outliers in available parking space (APS) datasets. To distinguish outlier features from the APS’s overall periodic tendency, and to simultaneously identify the two types of outliers which naturally exist in APS datasets with intrinsically distinct statistical features, a two-phase detection method is proposed whereby an improved density-based detection algorithm named “local entropy based weighted outlier detection” (EWOD) is also incorporated. Real-world data from parking facilities in the City of Newcastle upon Tyne was used to test the hypothesis. Thereafter, experimental tests were carried out for a comparative study in which the outlier detection performances of the two-phase detection method, statistic-based method, and traditional density-based method were compared and contrasted. The results showed that the proposed method can identify two different kinds of outliers simultaneously and can give a high identifying accuracy of 100% and 92.7% for the first and second types of outliers, respectively.

Download Full-text