Ensemble-Based Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection

Robertas Damaševičius; Algimantas Venčkauskas; Jevgenijus Toldinas; Šarūnas Grigaliūnas

doi:10.3390/electronics10040485

Ensemble-Based Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection

Electronics ◽

10.3390/electronics10040485 ◽

2021 ◽

Vol 10 (4) ◽

pp. 485 ◽

Cited By ~ 1

Author(s):

Robertas Damaševičius ◽

Algimantas Venčkauskas ◽

Jevgenijus Toldinas ◽

Šarūnas Grigaliūnas

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Information Systems ◽

Malware Detection ◽

Ensemble Classification ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Neural Net ◽

Linear Discriminant ◽

Stage Classification

The security of information is among the greatest challenges facing organizations and institutions. Cybercrime has risen in frequency and magnitude in recent years, with new ways to steal, change and destroy information or disable information systems appearing every day. Among the types of penetration into the information systems where confidential information is processed is malware. An attacker injects malware into a computer system, after which he has full or partial access to critical information in the information system. This paper proposes an ensemble classification-based methodology for malware detection. The first-stage classification is performed by a stacked ensemble of dense (fully connected) and convolutional neural networks (CNN), while the final stage classification is performed by a meta-learner. For a meta-learner, we explore and compare 14 classifiers. For a baseline comparison, 13 machine learning methods are used: K-Nearest Neighbors, Linear Support Vector Machine (SVM), Radial basis function (RBF) SVM, Random Forest, AdaBoost, Decision Tree, ExtraTrees, Linear Discriminant Analysis, Logistic, Neural Net, Passive Classifier, Ridge Classifier and Stochastic Gradient Descent classifier. We present the results of experiments performed on the Classification of Malware with PE headers (ClaMP) dataset. The best performance is achieved by an ensemble of five dense and CNN neural networks, and the ExtraTrees classifier as a meta-learner.

Download Full-text

Introduction and Implementation of Machine Learning Algorithms in R

Advances in Business Information Systems and Analytics - Sentiment Analysis and Knowledge Discovery in Contemporary Business ◽

10.4018/978-1-5225-4999-4.ch008 ◽

2019 ◽

pp. 126-147

Author(s):

S. R. Mani Sekhar ◽

G. M. Siddesh

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Support Vector Machine ◽

Discriminant Analysis ◽

Computer Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Linear Discriminant ◽

The Given

Machine learning is one of the important areas in the field of computer science. It helps to provide an optimized solution for the real-world problems by using past knowledge or previous experience data. There are different types of machine learning algorithms present in computer science. This chapter provides the overview of some selected machine learning algorithms such as linear regression, linear discriminant analysis, support vector machine, naive Bayes classifier, neural networks, and decision trees. Each of these methods is illustrated in detail with an example and R code, which in turn assists the reader to generate their own solutions for the given problems.

Download Full-text

Detection of Online Fake News Using Blending Ensemble Learning

Scientific Programming ◽

10.1155/2021/3434458 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Arvin Hansrajh ◽

Timothy T. Adeliyi ◽

Jeanette Wing

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Machine Learning Algorithms ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Fake News ◽

Learning Models ◽

Linear Discriminant ◽

Proposed Model ◽

Machine Learning Models

The exponential growth in fake news and its inherent threat to democracy, public trust, and justice has escalated the necessity for fake news detection and mitigation. Detecting fake news is a complex challenge as it is intentionally written to mislead and hoodwink. Humans are not good at identifying fake news. The detection of fake news by humans is reported to be at a rate of 54% and an additional 4% is reported in the literature as being speculative. The significance of fighting fake news is exemplified during the present pandemic. Consequently, social networks are ramping up the usage of detection tools and educating the public in recognising fake news. In the literature, it was observed that several machine learning algorithms have been applied to the detection of fake news with limited and mixed success. However, several advanced machine learning models are not being applied, although recent studies are demonstrating the efﬁcacy of the ensemble machine learning approach; hence, the purpose of this study is to assist in the automated detection of fake news. An ensemble approach is adopted to help resolve the identified gap. This study proposed a blended machine learning ensemble model developed from logistic regression, support vector machine, linear discriminant analysis, stochastic gradient descent, and ridge regression, which is then used on a publicly available dataset to predict if a news report is true or not. The proposed model will be appraised with the popular classical machine learning models, while performance metrics such as AUC, ROC, recall, accuracy, precision, and f1-score will be used to measure the performance of the proposed model. Results presented showed that the proposed model outperformed other popular classical machine learning models.

Download Full-text

Breast Tumor Classification Using an Ensemble Machine Learning Method

Journal of Imaging ◽

10.3390/jimaging6060039 ◽

2020 ◽

Vol 6 (6) ◽

pp. 39 ◽

Cited By ~ 1

Author(s):

Adel S. Assiri ◽

Saima Nazir ◽

Sergio A. Velastin

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

State Of The Art ◽

Majority Voting ◽

Ensemble Classification ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Breast Cancer Dataset ◽

Machine Learning Classification ◽

Voting Mechanism

Breast cancer is the most common cause of death for women worldwide. Thus, the ability of artificial intelligence systems to detect possible breast cancer is very important. In this paper, an ensemble classification mechanism is proposed based on a majority voting mechanism. First, the performance of different state-of-the-art machine learning classification algorithms were evaluated for the Wisconsin Breast Cancer Dataset (WBCD). The three best classifiers were then selected based on their F3 score. F3 score is used to emphasize the importance of false negatives (recall) in breast cancer classification. Then, these three classifiers, simple logistic regression learning, support vector machine learning with stochastic gradient descent optimization and multilayer perceptron network, are used for ensemble classification using a voting mechanism. We also evaluated the performance of hard and soft voting mechanism. For hard voting, majority-based voting mechanism was used and for soft voting we used average of probabilities, product of probabilities, maximum of probabilities and minimum of probabilities-based voting methods. The hard voting (majority-based voting) mechanism shows better performance with 99.42%, as compared to the state-of-the-art algorithm for WBCD.

Download Full-text

Windows PE Malware Detection Using Ensemble Learning

Informatics ◽

10.3390/informatics8010010 ◽

2021 ◽

Vol 8 (1) ◽

pp. 10

Author(s):

Nureni Ayofe Azeez ◽

Oluwanifise Ebunoluwa Odufuwa ◽

Sanjay Misra ◽

Jonathan Oluranti ◽

Robertas Damaševičius

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Ensemble Learning ◽

Learning Algorithm ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Internet Age ◽

Stage Classification ◽

End Stage

In this Internet age, there are increasingly many threats to the security and safety of users daily. One of such threats is malicious software otherwise known as malware (ransomware, Trojans, viruses, etc.). The effect of this threat can lead to loss or malicious replacement of important information (such as bank account details, etc.). Malware creators have been able to bypass traditional methods of malware detection, which can be time-consuming and unreliable for unknown malware. This motivates the need for intelligent ways to detect malware, especially new malware which have not been evaluated or studied before. Machine learning provides an intelligent way to detect malware and comprises two stages: feature extraction and classification. This study suggests an ensemble learning-based method for malware detection. The base stage classification is done by a stacked ensemble of fully-connected and one-dimensional convolutional neural networks (CNNs), whereas the end-stage classification is done by a machine learning algorithm. For a meta-learner, we analyzed and compared 15 machine learning classifiers. For comparison, five machine learning algorithms were used: naïve Bayes, decision tree, random forest, gradient boosting, and AdaBoosting. The results of experiments made on the Windows Portable Executable (PE) malware dataset are presented. The best results were obtained by an ensemble of seven neural networks and the ExtraTrees classifier as a final-stage classifier.

Download Full-text

Introduction and Implementation of Machine Learning Algorithms in R

Research Anthology on Artificial Intelligence Applications in Security ◽

10.4018/978-1-7998-7705-9.ch004 ◽

2021 ◽

pp. 67-88

Author(s):

S. R. Mani Sekhar ◽

G. M. Siddesh

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Support Vector Machine ◽

Discriminant Analysis ◽

Computer Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Linear Discriminant ◽

The Given

Download Full-text

An Optimum Tea Fermentation Detection Model Based on Deep Convolutional Neural Networks

Data ◽

10.3390/data5020044 ◽

2020 ◽

Vol 5 (2) ◽

pp. 44

Author(s):

Gibson Kimutai ◽

Alexander Ngenzi ◽

Rutabayiro Ngoga Said ◽

Ambrose Kiprop ◽

Anna Förster

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Deep Convolutional Neural Networks ◽

Linear Discriminant ◽

Learning Techniques ◽

Classification Tasks

Tea is one of the most popular beverages in the world, and its processing involves a number of steps which includes fermentation. Tea fermentation is the most important step in determining the quality of tea. Currently, optimum fermentation of tea is detected by tasters using any of the following methods: monitoring change in color of tea as fermentation progresses and tasting and smelling the tea as fermentation progresses. These manual methods are not accurate. Consequently, they lead to a compromise in the quality of tea. This study proposes a deep learning model dubbed TeaNet based on Convolution Neural Networks (CNN). The input data to TeaNet are images from the tea Fermentation and Labelme datasets. We compared the performance of TeaNet with other standard machine learning techniques: Random Forest (RF), K-Nearest Neighbor (KNN), Decision Tree (DT), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), and Naive Bayes (NB). TeaNet was more superior in the classification tasks compared to the other machine learning techniques. However, we will confirm the stability of TeaNet in the classification tasks in our future studies when we deploy it in a tea factory in Kenya. The research also released a tea fermentation dataset that is available for use by the community.

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text

Machine Learning for Sensorless Temperature Estimation of a BLDC Motor

Sensors ◽

10.3390/s21144655 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4655

Author(s):

Dariusz Czerwinski ◽

Jakub Gęca ◽

Krzysztof Kolano

Keyword(s):

Machine Learning ◽

Temperature Measurement ◽

Stochastic Gradient Descent ◽

Estimation Accuracy ◽

Coefficient Of Determination ◽

Percentage Error ◽

Support Vector ◽

Bldc Motor ◽

Temperature Estimation ◽

Motor Operation

In this article, the authors propose two models for BLDC motor winding temperature estimation using machine learning methods. For the purposes of the research, measurements were made for over 160 h of motor operation, and then, they were preprocessed. The algorithms of linear regression, ElasticNet, stochastic gradient descent regressor, support vector machines, decision trees, and AdaBoost were used for predictive modeling. The ability of the models to generalize was achieved by hyperparameter tuning with the use of cross-validation. The conducted research led to promising results of the winding temperature estimation accuracy. In the case of sensorless temperature prediction (model 1), the mean absolute percentage error MAPE was below 4.5% and the coefficient of determination R2 was above 0.909. In addition, the extension of the model with the temperature measurement on the casing (model 2) allowed reducing the error value to about 1% and increasing R2 to 0.990. The results obtained for the first proposed model show that the overheating protection of the motor can be ensured without direct temperature measurement. In addition, the introduction of a simple casing temperature measurement system allows for an estimation with accuracy suitable for compensating the motor output torque changes related to temperature.

Download Full-text

414 Deep Neural Networks: A Survey Tool for Obstructive Sleep Apnea Prediction

SLEEP ◽

10.1093/sleep/zsab072.413 ◽

2021 ◽

Vol 44 (Supplement_2) ◽

pp. A164-A164

Author(s):

Pahnwat Taweesedt ◽

JungYoon Kim ◽

Jaehyun Park ◽

Jangwoon Park ◽

Munish Sharma ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Obstructive Sleep Apnea ◽

Sleep Apnea ◽

Deep Neural Networks ◽

Support Vector ◽

Learning Models ◽

Obstructive Sleep ◽

Screening Questionnaires ◽

Machine Learning Models

Abstract Introduction Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder with an estimation of one billion people. Full-night polysomnography is considered the gold standard for OSA diagnosis. However, it is time-consuming, expensive and is not readily available in many parts of the world. Many screening questionnaires and scores have been proposed for OSA prediction with high sensitivity and low specificity. The present study is intended to develop models with various machine learning techniques to predict the severity of OSA by incorporating features from multiple questionnaires. Methods Subjects who underwent full-night polysomnography in Torr sleep center, Texas and completed 5 OSA screening questionnaires/scores were included. OSA was diagnosed by using Apnea-Hypopnea Index ≥ 5. We trained five different machine learning models including Deep Neural Networks with the scaled principal component analysis (DNN-PCA), Random Forest (RF), Adaptive Boosting classifier (ABC), and K-Nearest Neighbors classifier (KNC) and Support Vector Machine Classifier (SVMC). Training:Testing subject ratio of 65:35 was used. All features including demographic data, body measurement, snoring and sleepiness history were obtained from 5 OSA screening questionnaires/scores (STOP-BANG questionnaires, Berlin questionnaires, NoSAS score, NAMES score and No-Apnea score). Performance parametrics were used to compare between machine learning models. Results Of 180 subjects, 51.5 % of subjects were male with mean (SD) age of 53.6 (15.1). One hundred and nineteen subjects were diagnosed with OSA. Area Under the Receiver Operating Characteristic Curve (AUROC) of DNN-PCA, RF, ABC, KNC, SVMC, STOP-BANG questionnaire, Berlin questionnaire, NoSAS score, NAMES score, and No-Apnea score were 0.85, 0.68, 0.52, 0.74, 0.75, 0.61, 0.63, 0,61, 0.58 and 0,58 respectively. DNN-PCA showed the highest AUROC with sensitivity of 0.79, specificity of 0.67, positive-predictivity of 0.93, F1 score of 0.86, and accuracy of 0.77. Conclusion Our result showed that DNN-PCA outperforms OSA screening questionnaires, scores and other machine learning models. Support (if any):

Download Full-text

Classification of sleep apnea using EMD-based features and PSO-trained neural networks

Biomedical Engineering / Biomedizinische Technik ◽

10.1515/bmt-2021-0025 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Sajjad Afrakhteh ◽

Ahmad Ayatollahi ◽

Fatemeh Soltani

Keyword(s):

Neural Networks ◽

Sleep Apnea ◽

Back Propagation ◽

Pso Algorithm ◽

Bp Algorithm ◽

Support Vector ◽

Intrinsic Mode Functions ◽

Linear Discriminant ◽

Mode Decomposition ◽

Obstructive Sleep

Abstract In this study, we propose a method for detecting obstructive sleep apnea (OSA) based on the features extracted from empirical mode decomposition (EMD) and the neural networks trained by particle swarm optimization (PSO) in the classification phase. After extracting the features from the intrinsic mode functions (IMF) of each heart rate variability (HRV) signal of each segment, these features were applied to the input of popular classifiers such as multi-layer perceptron neural networks (MLPNN), Naïve Bayes, linear discriminant analysis (LDA), k-nearest neighborhood (KNN), and support vector machines (SVM) were applied. The results show that the MLPNN learned with back propagation (BP) algorithm has a diagnostic accuracy of less than 90%, and this may be due to being derivative based property of the BP algorithm, which causes trapping in the local minima. For Improving MLPNN’s performance, we used the PSO algorithm instead of the BP method in training part. Therefore, the MLPNN’s accuracy improved from 89.36 to 97.66% after the application of the PSO algorithm. The proposed method has also reached to 97.78 and 97.96% in sensitivity and specificity, respectively. So, it can be concluded that the proposed method achieves better or comparable results when compared with the previous works in this field.

Download Full-text