An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples

2021 ◽  
Vol 15 (3) ◽  
pp. 1-22
Author(s):  
Shi Ying ◽  
Bingming Wang ◽  
Lu Wang ◽  
Qingshan Li ◽  
Yishi Zhao ◽  
...  

Logs that record system abnormal states (anomaly logs) can be regarded as outliers, and the k-Nearest Neighbor (kNN) algorithm has relatively high accuracy in outlier detection methods. Therefore, we use the kNN algorithm to detect anomalies in the log data. However, there are some problems when using the kNN algorithm to detect anomalies, three of which are: excessive vector dimension leads to inefficient kNN algorithm, unlabeled log data cannot support the kNN algorithm, and the imbalance of the number of log data distorts the classification decision of kNN algorithm. In order to solve these three problems, we propose an efficient log anomaly detection method based on an improved kNN algorithm with an automatically labeled sample set. This method first proposes a log parsing method based on N-gram and frequent pattern mining (FPM) method, which reduces the dimension of the log vector converted with Term frequency.Inverse Document Frequency (TF-IDF) technology. Then we use clustering and self-training method to get labeled log data sample set from historical logs automatically. Finally, we improve the kNN algorithm using average weighting technology, which improves the accuracy of the kNN algorithm on unbalanced samples. The method in this article is validated on six log datasets with different types.

2020 ◽  
Vol 2020 ◽  
pp. 1-17
Author(s):  
Bingming Wang ◽  
Shi Ying ◽  
Zhe Yang

Using the k-nearest neighbor (kNN) algorithm in the supervised learning method to detect anomalies can get more accurate results. However, when using kNN algorithm to detect anomaly, it is inefficient at finding k neighbors from large-scale log data; at the same time, log data are imbalanced in quantity, so it is a challenge to select proper k neighbors for different data distributions. In this paper, we propose a log-based anomaly detection method with efficient selection of neighbors and automatic selection of k neighbors. First, we propose a neighbor search method based on minhash and MVP-tree. The minhash algorithm is used to group similar logs into the same bucket, and MVP-tree model is built for samples in each bucket. In this way, we can reduce the effort of distance calculation and the number of neighbor samples that need to be compared, so as to improve the efficiency of finding neighbors. In the process of selecting k neighbors, we propose an automatic method based on the Silhouette Coefficient, which can select proper k neighbors to improve the accuracy of anomaly detection. Our method is verified on six different types of log data to prove its universality and feasibility.


2016 ◽  
Vol 8 (3) ◽  
pp. 327-333 ◽  
Author(s):  
Rimas Ciplinskas ◽  
Nerijus Paulauskas

New and existing methods of cyber-attack detection are constantly being developed and improved because there is a great number of attacks and the demand to protect from them. In prac-tice, current methods of attack detection operates like antivirus programs, i. e. known attacks signatures are created and attacks are detected by using them. These methods have a drawback – they cannot detect new attacks. As a solution, anomaly detection methods are used. They allow to detect deviations from normal network behaviour that may show a new type of attack. This article introduces a new method that allows to detect network flow anomalies by using local outlier factor algorithm. Accom-plished research allowed to identify groups of features which showed the best results of anomaly flow detection according the highest values of precision, recall and F-measure. Kibernetinių atakų gausa ir įvairovė bei siekis nuo jų apsisaugoti verčia nuolat kurti naujus ir tobulinti jau esamus atakų aptikimo metodus. Kaip rodo praktika, dabartiniai atakų atpažinimo metodai iš esmės veikia pagal antivirusinių programų principą, t.y. sudaromi žinomų atakų šablonai, kuriais remiantis yra aptinkamos atakos, tačiau pagrindinis tokių metodų trūkumas – negalėjimas aptikti naujų, dar nežinomų atakų. Šiai problemai spręsti yra pasitelkiami anomalijų aptikimo metodai, kurie leidžia aptikti nukrypimus nuo normalios tinklo būsenos. Straipsnyje yra pateiktas naujas metodas, leidžiantis aptikti kompiuterių tinklo paketų srauto anomalijas taikant lokalių išskirčių faktorių algoritmą. Atliktas tyrimas leido surasti požymių grupes, kurias taikant anomalūs tinklo srautai yra atpažįstami geriausiai, t. y. pasiekiamos didžiausios tikslumo, atkuriamumo ir F-mato reikšmės.


2012 ◽  
Vol 468-471 ◽  
pp. 2504-2509
Author(s):  
Qiang Da Yang ◽  
Zhen Quan Liu

The on-line estimation of some key hard-to-measure process variables by using soft-sensor technique has received extensive concern in industrial production process. The precision of on-line estimation is closely related to the accuracy of soft-sensor model, while the accuracy of soft-sensor model depends strongly on the accuracy of modeling data. Aiming at the special character of the definition for outliers in soft-sensor modeling process, an outlier detection method based on k-nearest neighbor (k-NN) is proposed in this paper. The proposed method can be realized conveniently from data without priori knowledge and assumption of the process. The simulation result and practical application show that the proposed outlier detection method based on k-NN has good detection effect and high application value.


2019 ◽  
Vol 33 (15) ◽  
pp. 1950150 ◽  
Author(s):  
Lijiao Pan ◽  
Shibiao Mu ◽  
Yingyan Wang

A user click fraud detection method based on Top-Rank-k frequent pattern mining algorithm is presented to solve the click fraud problem appearing in current online advertising. Firstly, this method combines the click frequency of event samples, calculates the real evaluation score of click stream, and the click stream density function and evaluation score expression under multi-dimensional variables, and further obtains the time complexity of the next user’s click fraud process. Secondly, according to the Top-Rank-k frequent pattern, the process of click fraud detection algorithm is designed, and the click fraud user is analyzed and obtained. The results show that this method has good efficiency and correctness, and is superior to other similar algorithms.


Sensors ◽  
2020 ◽  
Vol 20 (20) ◽  
pp. 5895
Author(s):  
Jiansu Pu ◽  
Jingwen Zhang ◽  
Hui Shao ◽  
Tingting Zhang ◽  
Yunbo Rao

The development of the Internet has made social communication increasingly important for maintaining relationships between people. However, advertising and fraud are also growing incredibly fast and seriously affect our daily life, e.g., leading to money and time losses, trash information, and privacy problems. Therefore, it is very important to detect anomalies in social networks. However, existing anomaly detection methods cannot guarantee the correct rate. Besides, due to the lack of labeled data, we also cannot use the detection results directly. In other words, we still need human analysts in the loop to provide enough judgment for decision making. To help experts analyze and explore the results of anomaly detection in social networks more objectively and effectively, we propose a novel visualization system, egoDetect, which can detect the anomalies in social communication networks efficiently. Based on the unsupervised anomaly detection method, the system can detect the anomaly without training and get the overview quickly. Then we explore an ego’s topology and the relationship between egos and alters by designing a novel glyph based on the egocentric network. Besides, it also provides rich interactions for experts to quickly navigate to the interested users for further exploration. We use an actual call dataset provided by an operator to evaluate our system. The result proves that our proposed system is effective in the anomaly detection of social networks.


Author(s):  
Bingming Wang ◽  
Shi Ying ◽  
Guoli Cheng ◽  
Rui Wang ◽  
Zhe Yang ◽  
...  

Logs play an important role in the maintenance of large-scale systems. The number of logs which indicate normal (normal logs) differs greatly from the number of logs that indicate anomalies (abnormal logs), and the two types of logs have certain differences. To automatically obtain faults by K-Nearest Neighbor (KNN) algorithm, an outlier detection method with high accuracy, is an effective way to detect anomalies from logs. However, logs have the characteristics of large scale and very uneven samples, which will affect the results of KNN algorithm on log-based anomaly detection. Thus, we propose an improved KNN algorithm-based method which uses the existing mean-shift clustering algorithm to efficiently select the training set from massive logs. Then we assign different weights to samples with different distances, which reduces the negative effect of unbalanced distribution of the log samples on the accuracy of KNN algorithm. By comparing experiments on log sets from five supercomputers, the results show that the method we proposed can be effectively applied to log-based anomaly detection, and the accuracy, recall rate and F measure with our method are higher than those of traditional keyword search method.


Information ◽  
2019 ◽  
Vol 10 (8) ◽  
pp. 262
Author(s):  
Ying Zhao ◽  
Junjun Chen ◽  
Di Wu ◽  
Jian Teng ◽  
Nabin Sharma ◽  
...  

Anomaly detection of network traffic flows is a non-trivial problem in the field of network security due to the complexity of network traffic. However, most machine learning-based detection methods focus on network anomaly detection but ignore the user anomaly behavior detection. In real scenarios, the anomaly network behavior may harm the user interests. In this paper, we propose an anomaly detection model based on time-decay closed frequent patterns to address this problem. The model mines closed frequent patterns from the network traffic of each user and uses a time-decay factor to distinguish the weight of current and historical network traffic. Because of the dynamic nature of user network behavior, a detection model update strategy is provided in the anomaly detection framework. Additionally, the closed frequent patterns can provide interpretable explanations for anomalies. Experimental results show that the proposed method can detect user behavior anomaly, and the network anomaly detection performance achieved by the proposed method is similar to the state-of-the-art methods and significantly better than the baseline methods.


Sign in / Sign up

Export Citation Format

Share Document