A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection

Scientific Programming ◽

10.1155/2020/4365356 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

Bingming Wang ◽

Shi Ying ◽

Zhe Yang

Keyword(s):

Anomaly Detection ◽

Large Scale ◽

Nearest Neighbor ◽

Detection Method ◽

Tree Model ◽

Log Data ◽

Neighbor Search ◽

Different Types ◽

Efficient Selection ◽

Selection Of

Using the k-nearest neighbor (kNN) algorithm in the supervised learning method to detect anomalies can get more accurate results. However, when using kNN algorithm to detect anomaly, it is inefficient at finding k neighbors from large-scale log data; at the same time, log data are imbalanced in quantity, so it is a challenge to select proper k neighbors for different data distributions. In this paper, we propose a log-based anomaly detection method with efficient selection of neighbors and automatic selection of k neighbors. First, we propose a neighbor search method based on minhash and MVP-tree. The minhash algorithm is used to group similar logs into the same bucket, and MVP-tree model is built for samples in each bucket. In this way, we can reduce the effort of distance calculation and the number of neighbor samples that need to be compared, so as to improve the efficiency of finding neighbors. In the process of selecting k neighbors, we propose an automatic method based on the Silhouette Coefficient, which can select proper k neighbors to improve the accuracy of anomaly detection. Our method is verified on six different types of log data to prove its universality and feasibility.

Download Full-text

An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3441448 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-22

Author(s):

Shi Ying ◽

Bingming Wang ◽

Lu Wang ◽

Qingshan Li ◽

Yishi Zhao ◽

...

Keyword(s):

Anomaly Detection ◽

Pattern Mining ◽

Nearest Neighbor ◽

Detection Method ◽

Frequent Pattern ◽

Detection Methods ◽

K Nearest Neighbor ◽

Record System ◽

Log Data ◽

Sample Set

Logs that record system abnormal states (anomaly logs) can be regarded as outliers, and the k-Nearest Neighbor (kNN) algorithm has relatively high accuracy in outlier detection methods. Therefore, we use the kNN algorithm to detect anomalies in the log data. However, there are some problems when using the kNN algorithm to detect anomalies, three of which are: excessive vector dimension leads to inefficient kNN algorithm, unlabeled log data cannot support the kNN algorithm, and the imbalance of the number of log data distorts the classification decision of kNN algorithm. In order to solve these three problems, we propose an efficient log anomaly detection method based on an improved kNN algorithm with an automatically labeled sample set. This method first proposes a log parsing method based on N-gram and frequent pattern mining (FPM) method, which reduces the dimension of the log vector converted with Term frequency.Inverse Document Frequency (TF-IDF) technology. Then we use clustering and self-training method to get labeled log data sample set from historical logs automatically. Finally, we improve the kNN algorithm using average weighting technology, which improves the accuracy of the kNN algorithm on unbalanced samples. The method in this article is validated on six log datasets with different types.

Download Full-text

Image Tampering Detection Method Based on Approximate Nearest Neighbor Search

Laser & Optoelectronics Progress ◽

10.3788/lop57.101102 ◽

2020 ◽

Vol 57 (10) ◽

pp. 101102

Author(s):

王静 Wang Jing ◽

张雨辰 Zhang Yuchen ◽

霍占强 Huo Zhanqiang ◽

贾利琴 Jia Liqin

Keyword(s):

Nearest Neighbor ◽

Detection Method ◽

Nearest Neighbor Search ◽

Tampering Detection ◽

Approximate Nearest Neighbor Search ◽

Approximate Nearest Neighbor ◽

Image Tampering ◽

Neighbor Search ◽

Image Tampering Detection

Download Full-text

A Meta-Algorithm for Improving Top-N Prediction Efficiency of Matrix Factorization Models in Collaborative Filtering

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001420590077 ◽

2019 ◽

Vol 34 (03) ◽

pp. 2059007

Author(s):

A. Murat Yagci ◽

Tevfik Aytekin ◽

Fikret S. Gurgen

Keyword(s):

Collaborative Filtering ◽

Matrix Factorization ◽

Large Scale ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Space Efficiency ◽

Neighbor Search ◽

Prediction Time ◽

Low Dimensional ◽

Prediction Efficiency

Matrix factorization models often reveal the low-dimensional latent structure in high-dimensional spaces while bringing space efficiency to large-scale collaborative filtering problems. Improving training and prediction time efficiencies of these models are also important since an accurate model may raise practical concerns if it is slow to capture the changing dynamics of the system. For the training task, powerful improvements have been proposed especially using SGD, ALS, and their parallel versions. In this paper, we focus on the prediction task and combine matrix factorization with approximate nearest neighbor search methods to improve the efficiency of top-N prediction queries. Our efforts result in a meta-algorithm, MMFNN, which can employ various common matrix factorization models, drastically improve their prediction efficiency, and still perform comparably to standard prediction approaches or sometimes even better in terms of predictive power. Using various batch, online, and incremental matrix factorization models, we present detailed empirical analysis results on many large implicit feedback datasets from different application domains.

Download Full-text

Large-scale high-dimensional nearest neighbor search using flash memory with in-store processing

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) ◽

10.1109/reconfig.2015.7393324 ◽

2015 ◽

Cited By ~ 1

Author(s):

Sang-Woo Jun ◽

Chanwoo Chung ◽

Arvind

Keyword(s):

Flash Memory ◽

Large Scale ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

High Dimensional ◽

Neighbor Search

Download Full-text

Comparison of Various Systems for On-Site Wastewater Treatment

Water Science & Technology ◽

10.2166/wst.1990.0181 ◽

1990 ◽

Vol 22 (3-4) ◽

pp. 41-48 ◽

Cited By ~ 1

Author(s):

Frits A. Fastenau ◽

Jaap H. J. M. van der Graaf ◽

Gerard Martijnse

Keyword(s):

Wastewater Treatment ◽

Large Scale ◽

Research Work ◽

Field Research ◽

Limiting Factors ◽

Land Claims ◽

Rotating Biological Contactor ◽

Advantages And Disadvantages ◽

Different Types ◽

Selection Of

Diffuse pollution, caused by direct discharges from individual houses, small built-up nuclei, farms, camp-sites, etc., for which connection to central wastewater treatment systems is unfeasible, may be significantly reduced by on-site treatment. Based on a large scale research, including intensive field-research work on 14 systems of different types and sizes in a range equal to population equivalents (p.e) of 5 - 200 persons, 8 different types of system were compared. The comparison involved technological features, such as removal efficiency, reliability, operational and maintenance aspects, environmental impacts and land claims, together with economical features showing significant differences. Advantages and disadvantages of each system are highlighted to enable a selection of suitable systems to be made. When no limiting factors are present, it was found that - in general-infiltration systems (infiltration pits; infiltration trenches) have the best features for on-site treatment up to 100 p.e. For larger capacities, or when infiltration is not possible, the rotating biological contactor will be the best solution mainly because of the lower costs.

Download Full-text

Log-Based Anomaly Detection with the Improved K-Nearest Neighbor

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194020500114 ◽

2020 ◽

Vol 30 (02) ◽

pp. 239-262 ◽

Cited By ~ 1

Author(s):

Bingming Wang ◽

Shi Ying ◽

Guoli Cheng ◽

Rui Wang ◽

Zhe Yang ◽

...

Keyword(s):

Anomaly Detection ◽

Large Scale ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Keyword Search ◽

Mean Shift ◽

Recall Rate ◽

K Nearest Neighbor ◽

Mean Shift Clustering ◽

Negative Effect

Logs play an important role in the maintenance of large-scale systems. The number of logs which indicate normal (normal logs) differs greatly from the number of logs that indicate anomalies (abnormal logs), and the two types of logs have certain differences. To automatically obtain faults by K-Nearest Neighbor (KNN) algorithm, an outlier detection method with high accuracy, is an effective way to detect anomalies from logs. However, logs have the characteristics of large scale and very uneven samples, which will affect the results of KNN algorithm on log-based anomaly detection. Thus, we propose an improved KNN algorithm-based method which uses the existing mean-shift clustering algorithm to efficiently select the training set from massive logs. Then we assign different weights to samples with different distances, which reduces the negative effect of unbalanced distribution of the log samples on the accuracy of KNN algorithm. By comparing experiments on log sets from five supercomputers, the results show that the method we proposed can be effectively applied to log-based anomaly detection, and the accuracy, recall rate and F measure with our method are higher than those of traditional keyword search method.

Download Full-text

A systematic capsid evolution approach performed in vivo for the design of AAV vectors with tailored properties and tropism

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1910061116 ◽

2019 ◽

Vol 116 (52) ◽

pp. 27053-27062 ◽

Cited By ~ 25

Author(s):

Marcus Davidsson ◽

Gang Wang ◽

Patrick Aldrin-Kirk ◽

Tiago Cardoso ◽

Sara Nolbrant ◽

...

Keyword(s):

Dopaminergic Neurons ◽

Large Scale ◽

Adeno Associated Virus ◽

Model Based Clustering ◽

Molecular Barcode ◽

In Vivo Screen ◽

Efficient Selection ◽

Selection Of ◽

Random Screening

Adeno-associated virus (AAV) capsid modification enables the generation of recombinant vectors with tailored properties and tropism. Most approaches to date depend on random screening, enrichment, and serendipity. The approach explored here, called BRAVE (barcoded rational AAV vector evolution), enables efficient selection of engineered capsid structures on a large scale using only a single screening round in vivo. The approach stands in contrast to previous methods that require multiple generations of enrichment. With the BRAVE approach, each virus particle displays a peptide, derived from a protein, of known function on the AAV capsid surface, and a unique molecular barcode in the packaged genome. The sequencing of RNA-expressed barcodes from a single-generation in vivo screen allows the mapping of putative binding sequences from hundreds of proteins simultaneously. Using the BRAVE approach and hidden Markov model-based clustering, we present 25 synthetic capsid variants with refined properties, such as retrograde axonal transport in specific subtypes of neurons, as shown for both rodent and human dopaminergic neurons.

Download Full-text

Very large scale nearest neighbor search: ideas, strategies and challenges

International Journal of Multimedia Information Retrieval ◽

10.1007/s13735-013-0046-4 ◽

2013 ◽

Vol 2 (4) ◽

pp. 229-241 ◽

Cited By ~ 5

Author(s):

Erik Gast ◽

Ard Oerlemans ◽

Michael S. Lew

Keyword(s):

Large Scale ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Neighbor Search

Download Full-text

Iterative Selection of Categorical Variables for Log Data Anomaly Detection

10.1007/978-3-030-88418-5_36 ◽

2021 ◽

pp. 757-777

Author(s):

Max Landauer ◽

Georg Höld ◽

Markus Wurzenberger ◽

Florian Skopik ◽

Andreas Rauber

Keyword(s):

Anomaly Detection ◽

Categorical Variables ◽

Log Data ◽

Selection Of

Download Full-text

QuickN: Practical and Secure Nearest Neighbor Search on Encrypted Large-Scale Data

IEEE Transactions on Cloud Computing ◽

10.1109/tcc.2020.3009961 ◽

2020 ◽

pp. 1-1

Author(s):

Boyang Wang ◽

Yantian Hou ◽

Ming Li

Keyword(s):

Large Scale ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Neighbor Search ◽

Large Scale Data ◽

Scale Data

Download Full-text