scholarly journals Multi-Level Clustering-Based Outlier’s Detection (MCOD) Using Self-Organizing Maps

2020 ◽  
Vol 4 (4) ◽  
pp. 24
Author(s):  
Menglu Li ◽  
Rasha Kashef ◽  
Ahmed Ibrahim

Outlier detection is critical in many business applications, as it recognizes unusual behaviours to prevent losses and optimize revenue. For example, illegitimate online transactions can be detected based on its pattern with outlier detection. The performance of existing outlier detection methods is limited by the pattern/behaviour of the dataset; these methods may not perform well without prior knowledge of the dataset. This paper proposes a multi-level outlier detection algorithm (MCOD) that uses multi-level unsupervised learning to cluster the data and discover outliers. The proposed detection method is tested on datasets in different fields with different sizes and dimensions. Experimental analysis has shown that the proposed MCOD algorithm has the ability to improving the outlier detection rate, as compared to the traditional anomaly detection methods. Enterprises and organizations can adopt the proposed MCOD algorithm to ensure a sustainable and efficient detection of frauds/outliers to increase profitability (and/or) to enhance business outcomes.

2020 ◽  
Vol 16 (10) ◽  
pp. 155014772096133
Author(s):  
Jianhua Wang ◽  
Bang Ji ◽  
Feng Lin ◽  
Shilei Lu ◽  
Yubin Lan ◽  
...  

Quickly detecting related primitive events for multiple complex events from massive event stream usually faces with a great challenge due to their single pattern characteristic of the existing complex event detection methods. Aiming to solve the problem, a multiple pattern complex event detection scheme based on decomposition and merge sharing is proposed in this article. The achievement of this article lies that we successfully use decomposition and merge sharing technology to realize the high-efficient detection for multiple complex events from massive event streams. Specially, in our scheme, we first use decomposition sharing technology to decompose pattern expressions into multiple subexpressions, which can provide many sharing opportunities for subexpressions. We then use merge sharing technology to construct a multiple pattern complex events by merging sharing all the same prefix, suffix, or subpattern into one based on the above decomposition results. As a result, our proposed detection method in this article can effectively solve the above problem. The experimental results show that the proposed detection method in this article outperforms some general detection methods in detection model and detection algorithm in multiple pattern complex event detection as a whole.


Author(s):  
ZhongYu Zhou ◽  
DeChang Pi

Outlier detection is a common method for analyzing data streams. In the existing outlier detection methods, most of methods compute distance of points to solve certain specific outlier detection problems. However, these methods are computationally expensive and cannot process data streams quickly. The outlier detection method based on pattern mining resolves the aforementioned issues, but the existing methods are inefficient and cannot meet requirements of quickly mining data streams. In order to improve the efficiency of the method, a new outlier detection method is proposed in this paper. First, a fast minimal infrequent pattern mining method is proposed to mine the minimal infrequent pattern from data streams. Second, an efficient outlier detection algorithm based on minimal infrequent pattern is proposed for detecting the outliers in the data streams by mining minimal infrequent pattern. The algorithm proposed in this paper is demonstrated by real telemetry data of a satellite in orbit. The experimental results show that the proposed method not only can be applied to satellite outlier detection, but also is superior to the existing methods.


2020 ◽  
Vol 4 (Supplement_2) ◽  
pp. 1174-1174
Author(s):  
Paraskevi Massara ◽  
Robert Bandsma ◽  
Celine Bourdon ◽  
Jonathon Maguire ◽  
Elena Comelli ◽  
...  

Abstract Objectives Eliminating anthropometry measurement error and employing outlier and biological implausible values (BIV) detection methods adapted to longitudinal measurements is important for the study of growth. This work aimed to review and assess the accuracy of the available BIV and outlier detection methods and propose a growth trajectory outlier detection method. Methods We included 2354 infants from the Applied Research Group for Kids (TARGet Kids! ) cohort-based in Toronto (ON, Canada) that recruits healthy children from birth to 5 years of age. We considered infants with at least 8 length and weight measurements available between the 1st and the 24th month of age. Weight-for-length z-scores (wflz) were calculated using the WHO growth standards. Outlier measurements were randomly introduced in 5% of the wflz measurements using a normal distribution (μ = 0, σ = 1). We employed 4 outlier detection methods; an empirical detection method for BIV using the cut-offs derived from the WHO Child Growth Standards, a clustering method, a method based on cluster prototypes for individual outlier measurements and a method based on cluster prototypes for entire growth trajectories. Each method was applied individually and evaluated using the sensitivity and specificity indexes based on the manually introduced outliers. We also calculated the Kappa statistic to evaluate the agreement of each method against the manual outliers. Results After excluding premature (<37 weeks), low birth weight (<1500 g) neonates and children with missing length and weight measurements, we analyzed 393 children with a total of 3144 measurements. Sensitivity and specificity for the four methods ranged between 4.4%–55.0% and 83.7% −99.7%, respectively, with kappa being non-significant (P > 0.05) only for the empirical. The clustering detection method reported a higher finding rate, while the empirical method found most of the BIV, but few of the rest of the outliers. Conclusions BIV account for a small portion of the possible outliers in growth datasets. We show that additional statistical or model-based methods are required for a more comprehensive outlier detection process, which has implications for growth analysis and nutritional assessment. Funding Sources Joannah and Brian Lawson Center for Child Nutrition, Connaught Fund, Onassis Foundation.


Information ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 26
Author(s):  
Liying Wang ◽  
Lei Shi ◽  
Liancheng Xu ◽  
Peiyu Liu ◽  
Lindong Zhang ◽  
...  

Recently, outlier detection has widespread applications in different areas. The task is to identify outliers in the dataset and extract potential information. The existing outlier detection algorithms mainly do not solve the problems of parameter selection and high computational cost, which leaves enough room for further improvements. To solve the above problems, our paper proposes a parameter-free outlier detection algorithm based on dataset optimization method. Firstly, we propose a dataset optimization method (DOM), which initializes the original dataset in which density is greater than a specific threshold. In this method, we propose the concepts of partition function (P) and threshold function (T). Secondly, we establish a parameter-free outlier detection method. Similarly, we propose the concept of the number of residual neighbors, as the number of residual neighbors and the size of data clusters are used as the basis of outlier detection to obtain a more accurate outlier set. Finally, extensive experiments are carried out on a variety of datasets and experimental results show that our method performs well in terms of the efficiency of outlier detection and time complexity.


2021 ◽  
Vol 3 (1) ◽  
pp. 1-15
Author(s):  
Sharifah Sakinah Syed Abd Mutalib ◽  
Siti Zanariah Satari ◽  
Wan Nur Syahidah Wan Yusoff

Data in practice are often of high dimension and multivariate in nature. Detection of outliers has been one of the problems in multivariate analysis. Detecting outliers in multivariate data is difficult and it is not sufficient by using only graphical inspection. In this paper, a nontechnical and brief outlier detection method for multivariate data which are projection pursuit method, methods based on robust distance and cluster analysis are reviewed. The strengths and weaknesses of each method are briefly discussed.


2021 ◽  
Vol 2138 (1) ◽  
pp. 012013
Author(s):  
Yongzhi Chen ◽  
Ziao Xu ◽  
Chaoqun Niu

Abstract In the research of flash flood disaster monitoring and early warning, the Internet of Things is widely used in real-time information collection. There are abnormal situations such as noise, repetition and errors in a large amount of data collected by sensors, which will lead to false alarm, lower prediction accuracy and other problems. Aiming at the characteristic that outliers flow of sensors will cause obvious fluctuation of information entropy, this paper proposes a local outlier detection method based on information entropy and optimized by sliding window and LOF (Local Outlier Factor). This method can be used to improve the data quality, thus improving the accuracy of disaster prediction. The method is applied to data stream processing of water sensor, and the experimental results show that the method can accurately detect outliers. Compared with the existing detection methods that only use data distance to determine, the test positive rate is improved and the false positive rate is reduced.


2020 ◽  
Author(s):  
Lele Ai ◽  
Wei Liu ◽  
Fuqiang Ye ◽  
Chenxi Ding ◽  
Han Dai ◽  
...  

Abstract Background: By the end of August 2020, >23 million cases and 800,000 deaths were attributed to SARS-CoV-2 in >200 countries. The improvement of simple, rapid, and efficient detection methods is of great significance for the early detection, timely isolation, and protection of susceptible populations. This study aimed to provide an alternative method for the rapid detection of viral nucleic acid.Methods: This study provided a rapid nucleic acid detection method mediated by recombinant enzyme based on the novel coronavirus (SARS-CoV-2). Primers and probes were designed based on the N gene sequence of coronavirus. The method was performed at 39 °C, the detection time was short (<20 min), and the detection limit was up to 101 copies/mL.Results: The primer-probe did not show any cross-reaction with adenovirus, Zika virus, influenza B virus, and chikungunya virus, with good specificity. A total of 106 clinical throat swab samples were compared by reverse transcription recombinase-aided amplification (RT-RAA) and commercial reverse transcription-quantitative real-time polymerase chain reaction (RT-qPCR); the results were identical.Conclusions: The novel coronavirus RT-RAA method established in this study had high sensitivity, strong specificity, simple operation, and fast detection speed, and hence, is suitable for the rapid detection of novel coronavirus under the current epidemic situation.


Author(s):  
Taegong Kim ◽  
Cheong Hee Park

Abstract Anomaly pattern detection in a data stream aims to detect a time point where outliers begin to occur abnormally. Recently, a method for anomaly pattern detection has been proposed based on binary classification for outliers and statistical tests in the data stream of binary labels of normal or an outlier. It showed that an anomaly pattern can be detected accurately even when outlier detection performance is relatively low. However, since the anomaly pattern detection method is based on the binary classification for outliers, most well-known outlier detection methods, with the output of real-valued outlier scores, can not be used directly. In this paper, we propose an anomaly pattern detection method in a data stream using the transformation to multiple binary-valued data streams from real-valued outlier scores. By using three outlier detection methods, Isolation Forest(IF), Autoencoder-based outlier detection, and Local outlier factor(LOF), the proposed anomaly pattern detection method is tested using artificial and real data sets. The experimental results show that anomaly pattern detection using Isolation Forest gives the best performance.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Lixin Wang ◽  
Jianhua Yang ◽  
Michael Workman ◽  
Peng-Jun Wan

Hackers on the Internet usually send attacking packets using compromised hosts, called stepping-stones, in order to avoid being detected and caught. With stepping-stone attacks, an intruder remotely logins these stepping-stones using programs like SSH or telnet, uses a chain of Internet hosts as relay machines, and then sends the attacking packets. A great number of detection approaches have been developed for stepping-stone intrusion (SSI) in the literature. Many of these existing detection methods worked effectively only when session manipulation by intruders is not present. When the session is manipulated by attackers, there are few known effective detection methods for SSI. It is important to know whether a detection algorithm for SSI is resistant on session manipulation by attackers. For session manipulation with chaff perturbation, software tools such as Scapy can be used to inject meaningless packets into a data stream. However, to the best of our knowledge, there are no existing effective tools or efficient algorithms to produce time-jittered network traffic that can be used to test whether an SSI detection method is resistant on intruders’ time-jittering manipulation. In this paper, we propose a framework to test resistency of detection algorithms for SSI on time-jittering manipulation. Our proposed framework can be used to test whether an existing or new SSI detection method is resistant on session manipulation by intruders with time-jittering.


2013 ◽  
Vol 2013 ◽  
pp. 1-12 ◽  
Author(s):  
Yanjie Ji ◽  
Dounan Tang ◽  
Weihong Guo ◽  
Phil T. Blythe ◽  
Gang Ren

With the provision of any source of real-time information, the timeliness and accuracy of the data provided are paramount to the effectiveness and success of the system and its acceptance by the users. In order to improve the accuracy and reliability of parking guidance systems (PGSs), the technique of outlier mining has been introduced for detecting and analysing outliers in available parking space (APS) datasets. To distinguish outlier features from the APS’s overall periodic tendency, and to simultaneously identify the two types of outliers which naturally exist in APS datasets with intrinsically distinct statistical features, a two-phase detection method is proposed whereby an improved density-based detection algorithm named “local entropy based weighted outlier detection” (EWOD) is also incorporated. Real-world data from parking facilities in the City of Newcastle upon Tyne was used to test the hypothesis. Thereafter, experimental tests were carried out for a comparative study in which the outlier detection performances of the two-phase detection method, statistic-based method, and traditional density-based method were compared and contrasted. The results showed that the proposed method can identify two different kinds of outliers simultaneously and can give a high identifying accuracy of 100% and 92.7% for the first and second types of outliers, respectively.


Sign in / Sign up

Export Citation Format

Share Document