An Aggregated Mutual Information Based Feature Selection with Machine Learning Methods for Enhancing IoT Botnet Attack Detection

Mohammed Al-Sarem; Faisal Saeed; Eman H. Alkhammash; Norah Saleh Alghamdi

doi:10.3390/s22010185

An Aggregated Mutual Information Based Feature Selection with Machine Learning Methods for Enhancing IoT Botnet Attack Detection

Sensors ◽

10.3390/s22010185 ◽

2021 ◽

Vol 22 (1) ◽

pp. 185

Author(s):

Mohammed Al-Sarem ◽

Faisal Saeed ◽

Eman H. Alkhammash ◽

Norah Saleh Alghamdi

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Mutual Information ◽

Intrusion Detection ◽

Intrusion Detection Systems ◽

Computational Time ◽

Support Vector ◽

Learning Methods ◽

Detection Systems ◽

Machine Learning Methods

Due to the wide availability and usage of connected devices in Internet of Things (IoT) networks, the number of attacks on these networks is continually increasing. A particularly serious and dangerous type of attack in the IoT environment is the botnet attack, where the attackers can control the IoT systems to generate enormous networks of “bot” devices for generating malicious activities. To detect this type of attack, several Intrusion Detection Systems (IDSs) have been proposed for IoT networks based on machine learning and deep learning methods. As the main characteristics of IoT systems include their limited battery power and processor capacity, maximizing the efficiency of intrusion detection systems for IoT networks is still a research challenge. It is important to provide efficient and effective methods that use lower computational time and have high detection rates. This paper proposes an aggregated mutual information-based feature selection approach with machine learning methods to enhance detection of IoT botnet attacks. In this study, the N-BaIoT benchmark dataset was used to detect botnet attack types using real traffic data gathered from nine commercial IoT devices. The dataset includes binary and multi-class classifications. The feature selection method incorporates Mutual Information (MI) technique, Principal Component Analysis (PCA) and ANOVA f-test at finely-granulated detection level to select the relevant features for improving the performance of IoT Botnet classifiers. In the classification step, several ensemble and individual classifiers were used, including Random Forest (RF), XGBoost (XGB), Gaussian Naïve Bayes (GNB), k-Nearest Neighbor (k-NN), Logistic Regression (LR) and Support Vector Machine (SVM). The experimental results showed the efficiency and effectiveness of the proposed approach, which outperformed other techniques using various evaluation metrics.

Download Full-text

A Review of Intrusion Detection Systems: Datasets and machine learning methods

10.1145/3454127.3456576 ◽

2021 ◽

Author(s):

Aouatif ARQANE ◽

Omar Boutkhoum ◽

Hicham Boukhriss ◽

Abdelmajid El Moutaouakkil

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Intrusion Detection Systems ◽

Learning Methods ◽

Detection Systems ◽

Machine Learning Methods

Download Full-text

Modeling Realistic Adversarial Attacks against Network Intrusion Detection Systems

Digital Threats: Research and Practice ◽

10.1145/3469659 ◽

2021 ◽

Author(s):

Giovanni Apruzzese ◽

Mauro Andreolini ◽

Luca Ferretti ◽

Mirco Marchetti ◽

Michele Colajanni

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Machine Learning Algorithms ◽

Intrusion Detection Systems ◽

Network Intrusion Detection ◽

Learning Methods ◽

Detection Systems ◽

Machine Learning Methods ◽

Network Intrusion ◽

Network Intrusion Detection Systems

The incremental diffusion of machine learning algorithms in supporting cybersecurity is creating novel defensive opportunities but also new types of risks. Multiple researches have shown that machine learning methods are vulnerable to adversarial attacks that create tiny perturbations aimed at decreasing the effectiveness of detecting threats. We observe that existing literature assumes threat models that are inappropriate for realistic cybersecurity scenarios because they consider opponents with complete knowledge about the cyber detector or that can freely interact with the target systems. By focusing on Network Intrusion Detection Systems based on machine learning methods, we identify and model the real capabilities and circumstances that are necessary for an attacker to carry out a feasible and successful adversarial attack. We then apply our model to several adversarial attacks proposed in literature and highlight the limits and merits that can result in actual adversarial attacks. The contributions of this paper can help hardening defensive systems by letting cyber defenders address the most critical and real issues, and can benefit researchers by allowing them to devise novel forms of adversarial attacks based on realistic threat models.

Download Full-text

Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset

Journal Of Big Data ◽

10.1186/s40537-020-00379-6 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Sydney M. Kasongo ◽

Yanxia Sun

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Computer Networks ◽

Feature Selection Method ◽

Selection Method ◽

Feature Reduction ◽

Intrusion Detection Systems ◽

Support Vector ◽

Test Accuracy ◽

Detection Systems

AbstractComputer networks intrusion detection systems (IDSs) and intrusion prevention systems (IPSs) are critical aspects that contribute to the success of an organization. Over the past years, IDSs and IPSs using different approaches have been developed and implemented to ensure that computer networks within enterprises are secure, reliable and available. In this paper, we focus on IDSs that are built using machine learning (ML) techniques. IDSs based on ML methods are effective and accurate in detecting networks attacks. However, the performance of these systems decreases for high dimensional data spaces. Therefore, it is crucial to implement an appropriate feature extraction method that can prune some of the features that do not possess a great impact in the classification process. Moreover, many of the ML based IDSs suffer from an increase in false positive rate and a low detection accuracy when the models are trained on highly imbalanced datasets. In this paper, we present an analysis the UNSW-NB15 intrusion detection dataset that will be used for training and testing our models. Moreover, we apply a filter-based feature reduction technique using the XGBoost algorithm. We then implement the following ML approaches using the reduced feature space: Support Vector Machine (SVM), k-Nearest-Neighbour (kNN), Logistic Regression (LR), Artificial Neural Network (ANN) and Decision Tree (DT). In our experiments, we considered both the binary and multiclass classification configurations. The results demonstrated that the XGBoost-based feature selection method allows for methods such as the DT to increase its test accuracy from 88.13 to 90.85% for the binary classification scheme.

Download Full-text

Modified Mutual Information-based Feature Selection for Intrusion Detection Systems in Decision Tree Learning

Journal of Computers ◽

10.4304/jcp.9.7.1542-1546 ◽

2014 ◽

Vol 9 (7) ◽

Cited By ~ 2

Author(s):

Jingping Song ◽

Zhiliang Zhu ◽

Peter Scully ◽

Chris Price

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection Systems ◽

Decision Tree Learning ◽

Detection Systems ◽

Selection For

Download Full-text

Mutual information-based feature selection for intrusion detection systems

Journal of Network and Computer Applications ◽

10.1016/j.jnca.2011.01.002 ◽

2011 ◽

Vol 34 (4) ◽

pp. 1184-1199 ◽

Cited By ~ 183

Author(s):

Fatemeh Amiri ◽

MohammadMahdi Rezaei Yousefi ◽

Caro Lucas ◽

Azadeh Shakery ◽

Nasser Yazdani

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Intrusion Detection ◽

Intrusion Detection Systems ◽

Detection Systems ◽

Selection For

Download Full-text

ANOMALY DETECTION USING MACHINE LEARNING APPROACHES

Azerbaijan Journal of High Performance Computing ◽

10.32010/26166127.2020.3.2.196.206 ◽

2020 ◽

Vol 3 (2) ◽

pp. 196-206

Author(s):

Mausumi Das Nath ◽

◽

Tapalina Bhattasali

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Vital Role ◽

Machine Learning Algorithms ◽

Intrusion Detection Systems ◽

Abnormal Behavior ◽

Support Vector ◽

Learning Approaches ◽

Detection Systems ◽

Internet Users

Due to the enormous usage of the Internet, users share resources and exchange voluminous amounts of data. This increases the high risk of data theft and other types of attacks. Network security plays a vital role in protecting the electronic exchange of data and attempts to avoid disruption concerning finances or disrupted services due to the unknown proliferations in the network. Many Intrusion Detection Systems (IDS) are commonly used to detect such unknown attacks and unauthorized access in a network. Many approaches have been put forward by the researchers which showed satisfactory results in intrusion detection systems significantly which ranged from various traditional approaches to Artificial Intelligence (AI) based approaches.AI based techniques have gained an edge over other statistical techniques in the research community due to its enormous benefits. Procedures can be designed to display behavior learned from previous experiences. Machine learning algorithms are used to analyze the abnormal instances in a particular network. Supervised learning is essential in terms of training and analyzing the abnormal behavior in a network. In this paper, we propose a model of Naïve Bayes and SVM (Support Vector Machine) to detect anomalies and an ensemble approach to solve the weaknesses and to remove the poor detection results

Download Full-text

Empirical Evaluation of Noise Influence on Supervised Machine Learning Algorithms Using Intrusion Detection Datasets

Security and Communication Networks ◽

10.1155/2021/8836057 ◽

2021 ◽

Vol 2021 ◽

pp. 1-28

Author(s):

Khalid M. Al-Gethami ◽

Mousa T. Al-Akhras ◽

Mohammed Alawairdhi

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Extreme Values ◽

Empirical Evaluation ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Intrusion Detection Systems ◽

Support Vector ◽

Detection Systems ◽

Ensembles Of Classifiers

Optimizing the detection of intrusions is becoming more crucial due to the continuously rising rates and ferocity of cyber threats and attacks. One of the popular methods to optimize the accuracy of intrusion detection systems (IDSs) is by employing machine learning (ML) techniques. However, there are many factors that affect the accuracy of the ML-based IDSs. One of these factors is noise, which can be in the form of mislabelled instances, outliers, or extreme values. Determining the extent effect of noise helps to design and build more robust ML-based IDSs. This paper empirically examines the extent effect of noise on the accuracy of the ML-based IDSs by conducting a wide set of different experiments. The used ML algorithms are decision tree (DT), random forest (RF), support vector machine (SVM), artificial neural networks (ANNs), and Naïve Bayes (NB). In addition, the experiments are conducted on two widely used intrusion datasets, which are NSL-KDD and UNSW-NB15. Moreover, the paper also investigates the use of these ML algorithms as base classifiers with two ensembles of classifiers learning methods, which are bagging and boosting. The detailed results and findings are illustrated and discussed in this paper.

Download Full-text

Effects of Feature Selection and Normalization on Network Intrusion Detection

10.36227/techrxiv.12480425.v2 ◽

2020 ◽

Author(s):

Mubarak Albarka Umar ◽

Chen Zhanfang

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection Systems ◽

Computational Time ◽

Network Intrusion Detection ◽

Detection Systems ◽

Defense Systems ◽

Network Intrusion ◽

Depth Analysis ◽

Negative Impacts

<div><br></div><div><p> The rapid rise of cyberattacks and the gradual failing of traditional defense systems and approaches led to the use of Machine Learning (ML) techniques aiming to build more efficient and reliable Intrusion Detection Systems (IDSs). However, the advent of larger IDS datasets brought about negative impacts on the performance and computational time of ML-based IDSs. To overcome such issues, many researchers utilized data preprocessing techniques such as feature selection and normalization. While most of these researchers reported the success of these preprocessing techniques on a shallow level, very few studies are performed on their effects on a wider scale. Furthermore, the performance of an IDS model is subject to not only the preprocessing techniques used but also the dataset and the ML algorithm used, which most of the existing studies on preprocessing techniques give little emphasis on. Thus, this study provides an in-depth analysis of the effects of feature selection and normalization on various IDS models built using four separate IDS datasets and five different ML algorithms. Wrapper-based decision tree and min-max are used in feature selection and normalization respectively. The models are evaluated and compared using popular evaluation metrics in IDS. The study found normalization to be more important than feature selection in improving performance and computational time of models on both datasets, while feature selection on UNSW-NB15 failed to reduce models computational time, and in the case of models built using NSL-KDD, it decreases their performance. The study also reveals that, compared to the UNSW-NB15 dataset, the NSL-KDD dataset is less complex and unsuitable for building reliable modern-day IDS models. Furthermore, the best performance on both datasets is achieved by Random Forest with accuracy of 99.75% and 98.51% on NSL-KDD and UNSW-NB15 respectively. </p></div>

Download Full-text

A deep learning methods for intrusion detection systems based machine learning in MANET

Proceedings of the 4th International Conference on Smart City Applications - SCA '19 ◽

10.1145/3368756.3369021 ◽

2019 ◽

Author(s):

Safaa Laqtib ◽

Khalid El Yassini ◽

Moulay Lahcen Hasnaoui

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Intrusion Detection ◽

Intrusion Detection Systems ◽

Learning Methods ◽

Detection Systems

Download Full-text

A System to automate the development of anomaly-based network intrusion detection model

Journal of Physics Conference Series ◽

10.1088/1742-6596/2089/1/012006 ◽

2021 ◽

Vol 2089 (1) ◽

pp. 012006

Author(s):

B Padmaja ◽

K Sai Sravan ◽

E Krishna Rao Patro ◽

G Chandra Sekhar

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Real Time ◽

Network Traffic ◽

Intrusion Detection Systems ◽

Network Intrusion Detection ◽

Support Vector ◽

The Internet ◽

Detection Systems ◽

Network Intrusion

Abstract Cyber security is the major concern in today’s world. Over the past couple of decades, the internet has grown to such an extent that almost every individual living on this planet has the access to the internet today. This can be viewed as one of the major achievements in the human race, but on the flip side of the coin, this gave rise to a lot of security issues for every individual or the company that is accessing the web through the internet. Hackers have become active and are always monitoring the networks to grab every possible opportunity to attack a system and make the best fortune out of its vulnerabilities. To safeguard people’s and organization’s privacy in this cyberspace, different network intrusion detection systems have been developed to detect the hacker’s presence in the networks. These systems fall under signature based and anomaly based intrusion detection systems. This paper deals with using anomaly based intrusion detection technique to develop an automation system to both train and test supervised machine learning models, which is developed to classify real time network traffic as to whether it is malicious or not. Currently the best models by considering both detection success rate and the false positives rate are Artificial Neural Networks(ANN) followed by Support Vector Machines(SVM). In this paper, it is verified that Artificial Neural Network (ANN) based machine learning with wrapper feature selection outperforms support vector machine (SVM) technique while classifying network traffic as harmful or harmless. Initially to evaluate the performance of the system, NSL-KDD dataset is used to train and test the SVM and ANN models and finally classify real time network traffic using these models. This system can be used to carry out model building automatically on the new datasets and also for classifying the behaviour of the provided dataset without having to code.

Download Full-text