LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things

Jin Wang; Yangning Tang; Shiming He; Changqing Zhao; Pradip Kumar Sharma; Osama Alfarraj; Amr Tolba

doi:10.3390/s20092451

LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things

Sensors ◽

10.3390/s20092451 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2451 ◽

Cited By ~ 4

Author(s):

Jin Wang ◽

Yangning Tang ◽

Shiming He ◽

Changqing Zhao ◽

Pradip Kumar Sharma ◽

...

Keyword(s):

Feature Extraction ◽

Internet Of Things ◽

Anomaly Detection ◽

Language Processing ◽

Large Scale ◽

Naive Bayes ◽

Computational Cost ◽

Naïve Bayes ◽

Computational Time ◽

Transformation Methods

Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and vectorize the words. However, the computing cost of training word2vec is high. Anomalies in logs are dependent on not only an individual log message but also on the log message sequence. Therefore, the vector of words from word2vec can not be used directly, which needs to be transformed into the vector of log events and further transformed into the vector of log sequences. To reduce computational cost and avoid multiple transformations, in this paper, we propose an offline feature extraction model, named LogEvent2vec, which takes the log event as input of word2vec to extract the relevance between log events and vectorize log events directly. LogEvent2vec can work with any coordinate transformation methods and anomaly detection models. After getting the log event vector, we transform log event vector to log sequence vector by bary or tf-idf and three kinds of supervised models (Random Forests, Naive Bayes, and Neural Networks) are trained to detect the anomalies. We have conducted extensive experiments on a real public log dataset from BlueGene/L (BGL). The experimental results demonstrate that LogEvent2vec can significantly reduce computational time by 30 times and improve accuracy, comparing with word2vec. LogEvent2vec with bary and Random Forest can achieve the best F1-score and LogEvent2vec with tf-idf and Naive Bayes needs the least computational time.

Download Full-text

Klasifikasi Belimbing Menggunakan Naïve Bayes Berdasarkan Fitur Warna RGB

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.17838 ◽

2017 ◽

Vol 11 (1) ◽

pp. 99 ◽

Cited By ~ 1

Author(s):

Fuzy Yustika Manik ◽

Kana Saputra Saragih

Keyword(s):

Feature Extraction ◽

Large Scale ◽

Naive Bayes ◽

Color Image ◽

Image Data ◽

Naïve Bayes ◽

Training Data ◽

Color Analysis ◽

Star Fruit ◽

Processing Techniques

Post harvest issues on star fruit are produced on a large scale or industry is sorting. Currently, star fruit classified by rind color analysis visually human eye. This method does not effective and inefficient. The research aims to classify the starfruit sweetness level by using image processing techniques. Features extraction used is the value of Red, Green and Blue (RGB) to obtain the characteristics of the color image. Then the feature extraction results used to classify the star fruit with Naïve Bayes method. Starfruit image data used 120 consisting of 90 training data and 30 testing data. The results showed the classification accuracy using RGB feature extraction by 80%. The use of RGB as the color feature extraction can not be used entirely as a feature of the image extraction of star fruit.

Download Full-text

Twitter Sentiment Analysis towards COVID-19 Vaccines in the Philippines Using Naïve Bayes

Information ◽

10.3390/info12050204 ◽

2021 ◽

Vol 12 (5) ◽

pp. 204

Author(s):

Charlyn Villavicencio ◽

Julio Jerison Macrohon ◽

X. Alphonse Inbaraj ◽

Jyh-Horng Jeng ◽

Jer-Guang Hsieh

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Data Science ◽

Naive Bayes ◽

The Philippines ◽

Naïve Bayes ◽

Social Networking Site ◽

Bayes Model ◽

The Government ◽

Processing Techniques

A year into the COVID-19 pandemic and one of the longest recorded lockdowns in the world, the Philippines received its first delivery of COVID-19 vaccines on 1 March 2021 through WHO’s COVAX initiative. A month into inoculation of all frontline health professionals and other priority groups, the authors of this study gathered data on the sentiment of Filipinos regarding the Philippine government’s efforts using the social networking site Twitter. Natural language processing techniques were applied to understand the general sentiment, which can help the government in analyzing their response. The sentiments were annotated and trained using the Naïve Bayes model to classify English and Filipino language tweets into positive, neutral, and negative polarities through the RapidMiner data science software. The results yielded an 81.77% accuracy, which outweighs the accuracy of recent sentiment analysis studies using Twitter data from the Philippines.

Download Full-text

Concrete Crack Detection Based on Well-Known Feature Extractor Model and the YOLO_v2 Network

Applied Sciences ◽

10.3390/app11020813 ◽

2021 ◽

Vol 11 (2) ◽

pp. 813

Author(s):

Shuai Teng ◽

Zongchao Liu ◽

Gongfa Chen ◽

Li Cheng

Keyword(s):

Feature Extraction ◽

Crack Detection ◽

Computational Cost ◽

Concrete Structures ◽

Detection Algorithm ◽

Computational Time ◽

Image Size ◽

Important Indicator ◽

Feature Extractor ◽

Model Training

This paper compares the crack detection performance (in terms of precision and computational cost) of the YOLO_v2 using 11 feature extractors, which provides a base for realizing fast and accurate crack detection on concrete structures. Cracks on concrete structures are an important indicator for assessing their durability and safety, and real-time crack detection is an essential task in structural maintenance. The object detection algorithm, especially the YOLO series network, has significant potential in crack detection, while the feature extractor is the most important component of the YOLO_v2. Hence, this paper employs 11 well-known CNN models as the feature extractor of the YOLO_v2 for crack detection. The results confirm that a different feature extractor model of the YOLO_v2 network leads to a different detection result, among which the AP value is 0.89, 0, and 0 for ‘resnet18’, ‘alexnet’, and ‘vgg16’, respectively meanwhile, the ‘googlenet’ (AP = 0.84) and ‘mobilenetv2’ (AP = 0.87) also demonstrate comparable AP values. In terms of computing speed, the ‘alexnet’ takes the least computational time, the ‘squeezenet’ and ‘resnet18’ are ranked second and third respectively; therefore, the ‘resnet18’ is the best feature extractor model in terms of precision and computational cost. Additionally, through the parametric study (influence on detection results of the training epoch, feature extraction layer, and testing image size), the associated parameters indeed have an impact on the detection results. It is demonstrated that: excellent crack detection results can be achieved by the YOLO_v2 detector, in which an appropriate feature extractor model, training epoch, feature extraction layer, and testing image size play an important role.

Download Full-text

Analytical Approach on Indian Classical Raga Measures by Feature Extraction with EM and Naive Bayes

International Journal of Computer Applications ◽

10.5120/18759-0035 ◽

2014 ◽

Vol 107 (6) ◽

pp. 41-46 ◽

Cited By ~ 1

Author(s):

Akhilesh KSharma ◽

Avinash Panwar ◽

Prasun Chakrabarti

Keyword(s):

Feature Extraction ◽

Analytical Approach ◽

Naive Bayes ◽

Naïve Bayes

Download Full-text

An Improved Naive Bayes Classifier for Large Scale Text

Proceedings of the 2018 2nd International Conference on Artificial Intelligence: Technologies and Applications (ICAITA 2018) ◽

10.2991/icaita-18.2018.9 ◽

2018 ◽

Cited By ~ 1

Author(s):

Huaixin Chen ◽

Daocai Fu

Keyword(s):

Large Scale ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Download Full-text

Locally-Adaptive Naïve Bayes Framework Design via Density-Based Clustering for Large Scale Datasets

Handbook of Research on Machine Learning Techniques for Pattern Recognition and Information Security - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-3299-7.ch016 ◽

2021 ◽

pp. 278-292

Author(s):

Faruk Bulut

Keyword(s):

Learning Style ◽

Large Scale ◽

Naive Bayes ◽

Empirical Studies ◽

Classification Performance ◽

Naïve Bayes ◽

Test Point ◽

Framework Design ◽

Locally Adaptive ◽

Generalized Framework

In this chapter, local conditional probabilities of a query point are used in classification rather than consulting a generalized framework containing a conditional probability. In the proposed locally adaptive naïve Bayes (LANB) learning style, a certain amount of local instances, which are close the test point, construct an adaptive probability estimation. In the empirical studies of over the 53 benchmark UCI datasets, more accurate classification performance has been obtained. A total 8.2% increase in classification accuracy has been gained with LANB when compared to the conventional naïve Bayes model. The presented LANB method has outperformed according to the statistical paired t-test comparisons: 31 wins, 14 ties, and 8 losses of all UCI sets.

Download Full-text

Classification of EEG Signal for Detecting Cybersickness through Time Domain Feature Extraction using NaÏve Bayes

2018 International Conference on Computer Engineering, Network and Intelligent Multimedia (CENIM) ◽

10.1109/cenim.2018.8711320 ◽

2018 ◽

Cited By ~ 3

Author(s):

Moch.Asyroful Mawalid ◽

Alfi Zuhriya Khoirunnisa ◽

Mauridhi Hery Purnomo ◽

Adhi Dharma Wibawa

Keyword(s):

Feature Extraction ◽

Time Domain ◽

Naive Bayes ◽

Naïve Bayes ◽

Eeg Signal

Download Full-text

Feature Extraction and Classification of Proteomics Data Using Stationary Wavelet Transform and Naive Bayes Classifier

2010 4th International Conference on Bioinformatics and Biomedical Engineering ◽

10.1109/icbbe.2010.5516610 ◽

2010 ◽

Author(s):

Dan Liu ◽

Yuan-yuan Huang ◽

Chen-xiang Ma

Keyword(s):

Feature Extraction ◽

Wavelet Transform ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Stationary Wavelet Transform ◽

Proteomics Data ◽

Naïve Bayes Classifier

Download Full-text

Opinion mining feature-level using Naive Bayes and feature extraction based analysis dependencies

10.1063/1.4936448 ◽

2015 ◽

Author(s):

Regi Sanda ◽

Z. K. Abdurahman Baizal ◽

Fhira Nhita

Keyword(s):

Feature Extraction ◽

Opinion Mining ◽

Naive Bayes ◽

Naïve Bayes

Download Full-text

Automatic Keyword Annotation System Using Newspapers

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2014.p0340 ◽

2014 ◽

Vol 18 (3) ◽

pp. 340-346 ◽

Cited By ~ 1

Author(s):

Tomoki Takada ◽

◽

Mizuki Arai ◽

Tomohiro Takagi

Keyword(s):

Information Retrieval ◽

Language Processing ◽

High Speed ◽

Naive Bayes ◽

High Accuracy ◽

Naïve Bayes ◽

Annotation System ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Index Terms

Nowadays, an increasingly large amount of information exists on the web. Therefore, a method is needed that enables us to find necessary information quickly because this is becoming increasingly difficult for users. To solve this problem, information retrieval systems like Google and recommendation systems like that on Amazon are used. In this paper, we focus on information retrieval systems. These retrieval systems require index terms, which affect the precision of retrieval. Two methods generally decide index terms. One is analyzing a text using natural language processing and deciding index terms using varying amounts of statistics. The other is someone choosing document keywords as index terms. However, the latter method requires too much time and effort and becomes more impractical as information grows. Therefore, we propose the Nikkei annotator system, which is based on the model of the human brain and learns patterns of past keyword annotation and automatically outputs keywords that users prefer. The purposes of the proposed method are automating manual keyword annotation and achieving high speed and high accuracy keyword annotation. Experimental results showed that the proposed method is more accurate than TFIDF and Naive Bayes in P@5 and P@10. Moreover, these results also showed that the proposed method could annotate about 19 times faster than Naive Bayes.

Download Full-text