An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples

Shi Ying; Bingming Wang; Lu Wang; Qingshan Li; Yishi Zhao; Jianga Shang; Hao Huang; Guoli Cheng; Zhe Yang; Jiangyi Geng

doi:10.1145/3441448

An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3441448 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-22

Author(s):

Shi Ying ◽

Bingming Wang ◽

Lu Wang ◽

Qingshan Li ◽

Yishi Zhao ◽

...

Keyword(s):

Anomaly Detection ◽

Pattern Mining ◽

Nearest Neighbor ◽

Detection Method ◽

Frequent Pattern ◽

Detection Methods ◽

K Nearest Neighbor ◽

Record System ◽

Log Data ◽

Sample Set

Logs that record system abnormal states (anomaly logs) can be regarded as outliers, and the k-Nearest Neighbor (kNN) algorithm has relatively high accuracy in outlier detection methods. Therefore, we use the kNN algorithm to detect anomalies in the log data. However, there are some problems when using the kNN algorithm to detect anomalies, three of which are: excessive vector dimension leads to inefficient kNN algorithm, unlabeled log data cannot support the kNN algorithm, and the imbalance of the number of log data distorts the classification decision of kNN algorithm. In order to solve these three problems, we propose an efficient log anomaly detection method based on an improved kNN algorithm with an automatically labeled sample set. This method first proposes a log parsing method based on N-gram and frequent pattern mining (FPM) method, which reduces the dimension of the log vector converted with Term frequency.Inverse Document Frequency (TF-IDF) technology. Then we use clustering and self-training method to get labeled log data sample set from historical logs automatically. Finally, we improve the kNN algorithm using average weighting technology, which improves the accuracy of the kNN algorithm on unbalanced samples. The method in this article is validated on six log datasets with different types.

Download Full-text

A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection

Scientific Programming ◽

10.1155/2020/4365356 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

Bingming Wang ◽

Shi Ying ◽

Zhe Yang

Keyword(s):

Anomaly Detection ◽

Large Scale ◽

Nearest Neighbor ◽

Detection Method ◽

Tree Model ◽

Log Data ◽

Neighbor Search ◽

Different Types ◽

Efficient Selection ◽

Selection Of

Using the k-nearest neighbor (kNN) algorithm in the supervised learning method to detect anomalies can get more accurate results. However, when using kNN algorithm to detect anomaly, it is inefficient at finding k neighbors from large-scale log data; at the same time, log data are imbalanced in quantity, so it is a challenge to select proper k neighbors for different data distributions. In this paper, we propose a log-based anomaly detection method with efficient selection of neighbors and automatic selection of k neighbors. First, we propose a neighbor search method based on minhash and MVP-tree. The minhash algorithm is used to group similar logs into the same bucket, and MVP-tree model is built for samples in each bucket. In this way, we can reduce the effort of distance calculation and the number of neighbor samples that need to be compared, so as to improve the efficiency of finding neighbors. In the process of selecting k neighbors, we propose an automatic method based on the Silhouette Coefficient, which can select proper k neighbors to improve the accuracy of anomaly detection. Our method is verified on six different types of log data to prove its universality and feasibility.

Download Full-text

OUTLIER DETECTION METHOD USE FOR THE NETWORK FLOW ANOMALY DETECTION / IŠSKIRČIŲ RADIMO METODŲ TAIKYMAS ANOMALIJOMS KOMPIUTERIŲ TINKLO PAKETŲ SRAUTUOSE APTIKTI

Mokslas - Lietuvos ateitis ◽

10.3846/mla.2016.928 ◽

2016 ◽

Vol 8 (3) ◽

pp. 327-333 ◽

Cited By ~ 3

Author(s):

Rimas Ciplinskas ◽

Nerijus Paulauskas

Keyword(s):

Anomaly Detection ◽

Network Flow ◽

Detection Method ◽

Attack Detection ◽

Detection Methods ◽

Cyber Attack ◽

Normal Network ◽

New Type ◽

Flow Detection ◽

F Measure

New and existing methods of cyber-attack detection are constantly being developed and improved because there is a great number of attacks and the demand to protect from them. In prac-tice, current methods of attack detection operates like antivirus programs, i. e. known attacks signatures are created and attacks are detected by using them. These methods have a drawback – they cannot detect new attacks. As a solution, anomaly detection methods are used. They allow to detect deviations from normal network behaviour that may show a new type of attack. This article introduces a new method that allows to detect network flow anomalies by using local outlier factor algorithm. Accom-plished research allowed to identify groups of features which showed the best results of anomaly flow detection according the highest values of precision, recall and F-measure. Kibernetinių atakų gausa ir įvairovė bei siekis nuo jų apsisaugoti verčia nuolat kurti naujus ir tobulinti jau esamus atakų aptikimo metodus. Kaip rodo praktika, dabartiniai atakų atpažinimo metodai iš esmės veikia pagal antivirusinių programų principą, t.y. sudaromi žinomų atakų šablonai, kuriais remiantis yra aptinkamos atakos, tačiau pagrindinis tokių metodų trūkumas – negalėjimas aptikti naujų, dar nežinomų atakų. Šiai problemai spręsti yra pasitelkiami anomalijų aptikimo metodai, kurie leidžia aptikti nukrypimus nuo normalios tinklo būsenos. Straipsnyje yra pateiktas naujas metodas, leidžiantis aptikti kompiuterių tinklo paketų srauto anomalijas taikant lokalių išskirčių faktorių algoritmą. Atliktas tyrimas leido surasti požymių grupes, kurias taikant anomalūs tinklo srautai yra atpažįstami geriausiai, t. y. pasiekiamos didžiausios tikslumo, atkuriamumo ir F-mato reikšmės.

Download Full-text

Outlier Detection for Soft-Sensor Modeling Data Based on k-Nearest Neighbor

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.468-471.2504 ◽

2012 ◽

Vol 468-471 ◽

pp. 2504-2509

Author(s):

Qiang Da Yang ◽

Zhen Quan Liu

Keyword(s):

Outlier Detection ◽

Nearest Neighbor ◽

Detection Method ◽

Soft Sensor ◽

K Nearest Neighbor ◽

Sensor Model ◽

Sensor Technique ◽

On Line ◽

Modeling Data ◽

Sensor Modeling

The on-line estimation of some key hard-to-measure process variables by using soft-sensor technique has received extensive concern in industrial production process. The precision of on-line estimation is closely related to the accuracy of soft-sensor model, while the accuracy of soft-sensor model depends strongly on the accuracy of modeling data. Aiming at the special character of the definition for outliers in soft-sensor modeling process, an outlier detection method based on k-nearest neighbor (k-NN) is proposed in this paper. The proposed method can be realized conveniently from data without priori knowledge and assumption of the process. The simulation result and practical application show that the proposed outlier detection method based on k-NN has good detection effect and high application value.

Download Full-text

User click fraud detection method based on Top-Rank-k frequent pattern mining

International Journal of Modern Physics B ◽

10.1142/s0217979219501509 ◽

2019 ◽

Vol 33 (15) ◽

pp. 1950150 ◽

Cited By ~ 1

Author(s):

Lijiao Pan ◽

Shibiao Mu ◽

Yingyan Wang

Keyword(s):

Pattern Mining ◽

Detection Method ◽

Online Advertising ◽

Fraud Detection ◽

Frequent Pattern Mining ◽

Detection Algorithm ◽

Frequent Pattern ◽

Good Efficiency ◽

Click Fraud ◽

Evaluation Score

A user click fraud detection method based on Top-Rank-k frequent pattern mining algorithm is presented to solve the click fraud problem appearing in current online advertising. Firstly, this method combines the click frequency of event samples, calculates the real evaluation score of click stream, and the click stream density function and evaluation score expression under multi-dimensional variables, and further obtains the time complexity of the next user’s click fraud process. Secondly, according to the Top-Rank-k frequent pattern, the process of click fraud detection algorithm is designed, and the click fraud user is analyzed and obtained. The results show that this method has good efficiency and correctness, and is superior to other similar algorithms.

Download Full-text

egoDetect: Visual Detection and Exploration of Anomaly in Social Communication Network

Sensors ◽

10.3390/s20205895 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5895

Author(s):

Jiansu Pu ◽

Jingwen Zhang ◽

Hui Shao ◽

Tingting Zhang ◽

Yunbo Rao

Keyword(s):

Social Networks ◽

Anomaly Detection ◽

Communication Networks ◽

Social Communication ◽

Detection Method ◽

Detection Methods ◽

Visualization System ◽

Egocentric Network ◽

Unsupervised Anomaly Detection ◽

The Relationship

The development of the Internet has made social communication increasingly important for maintaining relationships between people. However, advertising and fraud are also growing incredibly fast and seriously affect our daily life, e.g., leading to money and time losses, trash information, and privacy problems. Therefore, it is very important to detect anomalies in social networks. However, existing anomaly detection methods cannot guarantee the correct rate. Besides, due to the lack of labeled data, we also cannot use the detection results directly. In other words, we still need human analysts in the loop to provide enough judgment for decision making. To help experts analyze and explore the results of anomaly detection in social networks more objectively and effectively, we propose a novel visualization system, egoDetect, which can detect the anomalies in social communication networks efficiently. Based on the unsupervised anomaly detection method, the system can detect the anomaly without training and get the overview quickly. Then we explore an ego’s topology and the relationship between egos and alters by designing a novel glyph based on the egocentric network. Besides, it also provides rich interactions for experts to quickly navigate to the interested users for further exploration. We use an actual call dataset provided by an operator to evaluate our system. The result proves that our proposed system is effective in the anomaly detection of social networks.

Download Full-text

Log-Based Anomaly Detection with the Improved K-Nearest Neighbor

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194020500114 ◽

2020 ◽

Vol 30 (02) ◽

pp. 239-262 ◽

Cited By ~ 1

Author(s):

Bingming Wang ◽

Shi Ying ◽

Guoli Cheng ◽

Rui Wang ◽

Zhe Yang ◽

...

Keyword(s):

Anomaly Detection ◽

Large Scale ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Keyword Search ◽

Mean Shift ◽

Recall Rate ◽

K Nearest Neighbor ◽

Mean Shift Clustering ◽

Negative Effect

Logs play an important role in the maintenance of large-scale systems. The number of logs which indicate normal (normal logs) differs greatly from the number of logs that indicate anomalies (abnormal logs), and the two types of logs have certain differences. To automatically obtain faults by K-Nearest Neighbor (KNN) algorithm, an outlier detection method with high accuracy, is an effective way to detect anomalies from logs. However, logs have the characteristics of large scale and very uneven samples, which will affect the results of KNN algorithm on log-based anomaly detection. Thus, we propose an improved KNN algorithm-based method which uses the existing mean-shift clustering algorithm to efficiently select the training set from massive logs. Then we assign different weights to samples with different distances, which reduces the negative effect of unbalanced distribution of the log samples on the accuracy of KNN algorithm. By comparing experiments on log sets from five supercomputers, the results show that the method we proposed can be effectively applied to log-based anomaly detection, and the accuracy, recall rate and F measure with our method are higher than those of traditional keyword search method.

Download Full-text

Network Anomaly Detection by Using a Time-Decay Closed Frequent Pattern

Information ◽

10.3390/info10080262 ◽

2019 ◽

Vol 10 (8) ◽

pp. 262

Author(s):

Ying Zhao ◽

Junjun Chen ◽

Di Wu ◽

Jian Teng ◽

Nabin Sharma ◽

...

Keyword(s):

Anomaly Detection ◽

Network Traffic ◽

User Behavior ◽

Frequent Pattern ◽

Detection Methods ◽

Frequent Patterns ◽

Time Decay ◽

Network Behavior ◽

Detection Model ◽

Network Anomaly Detection

Anomaly detection of network traffic flows is a non-trivial problem in the field of network security due to the complexity of network traffic. However, most machine learning-based detection methods focus on network anomaly detection but ignore the user anomaly behavior detection. In real scenarios, the anomaly network behavior may harm the user interests. In this paper, we propose an anomaly detection model based on time-decay closed frequent patterns to address this problem. The model mines closed frequent patterns from the network traffic of each user and uses a time-decay factor to distinguish the weight of current and historical network traffic. Because of the dynamic nature of user network behavior, a detection model update strategy is provided in the anomaly detection framework. Additionally, the closed frequent patterns can provide interpretable explanations for anomalies. Experimental results show that the proposed method can detect user behavior anomaly, and the network anomaly detection performance achieved by the proposed method is similar to the state-of-the-art methods and significantly better than the baseline methods.

Download Full-text

Causal cueing system for above ground anomaly detection of explosive hazards using support vector machine localized by K-nearest neighbor

2012 IEEE Symposium on Computational Intelligence for Security and Defence Applications ◽

10.1109/cisda.2012.6291519 ◽

2012 ◽

Cited By ~ 1

Author(s):

Derek T. Anderson ◽

Ozy Sjahputera ◽

Kevin Stone ◽

James M. Keller

Keyword(s):

Support Vector Machine ◽

Anomaly Detection ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor

Download Full-text

Performance evaluation of frequent pattern mining algorithms using web log data for web usage mining

2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) ◽

10.1109/cisp-bmei.2017.8302317 ◽

2017 ◽

Author(s):

Yonas Gashaw ◽

Fang Liu

Keyword(s):

Performance Evaluation ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Web Usage Mining ◽

Frequent Pattern ◽

Log Data ◽

Web Log ◽

Web Usage ◽

Mining Algorithms

Download Full-text

Real-time network anomaly detection architecture based on frequent pattern mining technique

2013 International Conference on Research and Innovation in Information Systems (ICRIIS) ◽

10.1109/icriis.2013.6716742 ◽

2013 ◽

Cited By ~ 1

Author(s):

Aiman Moyaid Said ◽

Dhanapal Durai Dominic ◽

Ibrahima Faye

Keyword(s):

Anomaly Detection ◽

Real Time ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Mining Technique ◽

Network Anomaly Detection

Download Full-text