An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques

Taher Al-Shehari; Rakan A. Alsowail

doi:10.3390/e23101258

An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques

Entropy ◽

10.3390/e23101258 ◽

2021 ◽

Vol 23 (10) ◽

pp. 1258

Author(s):

Taher Al-Shehari ◽

Rakan A. Alsowail

Keyword(s):

Machine Learning ◽

Detection System ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Sensitive Period ◽

Insider Threat ◽

Leakage Detection ◽

Insider Threats ◽

Insider Attack ◽

Proposed Model

Insider threats are malicious acts that can be carried out by an authorized employee within an organization. Insider threats represent a major cybersecurity challenge for private and public organizations, as an insider attack can cause extensive damage to organization assets much more than external attacks. Most existing approaches in the field of insider threat focused on detecting general insider attack scenarios. However, insider attacks can be carried out in different ways, and the most dangerous one is a data leakage attack that can be executed by a malicious insider before his/her leaving an organization. This paper proposes a machine learning-based model for detecting such serious insider threat incidents. The proposed model addresses the possible bias of detection results that can occur due to an inappropriate encoding process by employing the feature scaling and one-hot encoding techniques. Furthermore, the imbalance issue of the utilized dataset is also addressed utilizing the synthetic minority oversampling technique (SMOTE). Well known machine learning algorithms are employed to detect the most accurate classifier that can detect data leakage events executed by malicious insiders during the sensitive period before they leave an organization. We provide a proof of concept for our model by applying it on CMU-CERT Insider Threat Dataset and comparing its performance with the ground truth. The experimental results show that our model detects insider data leakage events with an AUC-ROC value of 0.99, outperforming the existing approaches that are validated on the same dataset. The proposed model provides effective methods to address possible bias and class imbalance issues for the aim of devising an effective insider data leakage detection system.

Download Full-text

Adaptive Ensemble Multi-Agent Based Intrusion Detection Model

Developing Advanced Web Services through P2P Computing and Autonomous Agents - Advances in Web Technologies and Engineering ◽

10.4018/978-1-61520-973-6.ch003 ◽

2010 ◽

pp. 36-48 ◽

Cited By ~ 1

Author(s):

Tarek Helmy

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Machine Learning Algorithms ◽

Agent Based ◽

Detection Model ◽

Detection Analysis ◽

Proposed Model ◽

Multi Agent

The system that monitors the events occurring in a computer system or a network and analyzes the events for sign of intrusions is known as intrusion detection system. The performance of the intrusion detection system can be improved by combing anomaly and misuse analysis. This chapter proposes an ensemble multi-agent-based intrusion detection model. The proposed model combines anomaly, misuse, and host-based detection analysis. The agents in the proposed model use rules to check for intrusions, and adopt machine learning algorithms to recognize unknown actions, to update or create new rules automatically. Each agent in the proposed model encapsulates a specific classification technique, and gives its belief about any packet event in the network. These agents collaborate to determine the decision about any event, have the ability to generalize, and to detect novel attacks. Empirical results indicate that the proposed model is efficient, and outperforms other intrusion detection models.

Download Full-text

A Review of Insider Threat Detection: Classification, Machine Learning Techniques, Datasets, Open Challenges, and Recommendations

Applied Sciences ◽

10.3390/app10155208 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5208

Author(s):

Mohammed Nasser Al-Mhiqani ◽

Rabiah Ahmad ◽

Z. Zainal Abidin ◽

Warusia Yassin ◽

Aslinda Hassan ◽

...

Keyword(s):

Machine Learning ◽

Conceptual Understanding ◽

Machine Learning Techniques ◽

Insider Threat ◽

Threat Detection ◽

Insider Threats ◽

Fast Detection ◽

Detection Systems ◽

Learning Techniques ◽

Security Property

Insider threat has become a widely accepted issue and one of the major challenges in cybersecurity. This phenomenon indicates that threats require special detection systems, methods, and tools, which entail the ability to facilitate accurate and fast detection of a malicious insider. Several studies on insider threat detection and related areas in dealing with this issue have been proposed. Various studies aimed to deepen the conceptual understanding of insider threats. However, there are many limitations, such as a lack of real cases, biases in making conclusions, which are a major concern and remain unclear, and the lack of a study that surveys insider threats from many different perspectives and focuses on the theoretical, technical, and statistical aspects of insider threats. The survey aims to present a taxonomy of contemporary insider types, access, level, motivation, insider profiling, effect security property, and methods used by attackers to conduct attacks and a review of notable recent works on insider threat detection, which covers the analyzed behaviors, machine-learning techniques, dataset, detection methodology, and evaluation metrics. Several real cases of insider threats have been analyzed to provide statistical information about insiders. In addition, this survey highlights the challenges faced by other researchers and provides recommendations to minimize obstacles.

Download Full-text

Insider Threat Detection Using Supervised Machine Learning Algorithms on an Extremely Imbalanced Dataset

International Journal of Cyber Warfare and Terrorism ◽

10.4018/ijcwt.2020040101 ◽

2020 ◽

Vol 10 (2) ◽

pp. 1-26

Author(s):

Naghmeh Moradpoor Sheykhkanloo ◽

Adam Hall

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Machine Learning Algorithms ◽

Third Party ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Insider Threat ◽

Threat Detection ◽

Imbalanced Dataset ◽

The Impact

An insider threat can take on many forms and fall under different categories. This includes malicious insider, careless/unaware/uneducated/naïve employee, and the third-party contractor. Machine learning techniques have been studied in published literature as a promising solution for such threats. However, they can be biased and/or inaccurate when the associated dataset is hugely imbalanced. Therefore, this article addresses the insider threat detection on an extremely imbalanced dataset which includes employing a popular balancing technique known as spread subsample. The results show that although balancing the dataset using this technique did not improve performance metrics, it did improve the time taken to build the model and the time taken to test the model. Additionally, the authors realised that running the chosen classifiers with parameters other than the default ones has an impact on both balanced and imbalanced scenarios, but the impact is significantly stronger when using the imbalanced dataset.

Download Full-text

Design and Development of an Efficient Network Intrusion Detection System Using Machine Learning Techniques

Wireless Communications and Mobile Computing ◽

10.1155/2021/9974270 ◽

2021 ◽

Vol 2021 ◽

pp. 1-35

Author(s):

Thomas Rincy N ◽

Roopam Gupta

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Detection System ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

Hybrid Network ◽

Machine Learning Techniques ◽

Feature Subset ◽

Network Connection ◽

Network Intrusion

Today’s internets are made up of nearly half a million different networks. In any network connection, identifying the attacks by their types is a difficult task as different attacks may have various connections, and their number may vary from a few to hundreds of network connections. To solve this problem, a novel hybrid network IDS called NID-Shield is proposed in the manuscript that classifies the dataset according to different attack types. Furthermore, the attack names found in attack types are classified individually helping considerably in predicting the vulnerability of individual attacks in various networks. The hybrid NID-Shield NIDS applies the efficient feature subset selection technique called CAPPER and distinct machine learning methods. The UNSW-NB15 and NSL-KDD datasets are utilized for the evaluation of metrics. Machine learning algorithms are applied for training the reduced accurate and highly merit feature subsets obtained from CAPPER and then assessed by the cross-validation method for the reduced attributes. Various performance metrics show that the hybrid NID-Shield NIDS applied with the CAPPER approach achieves a good accuracy rate and low FPR on the UNSW-NB15 and NSL-KDD datasets and shows good performance results when analyzed with various approaches found in existing literature studies.

Download Full-text

Cyberattacks Detection in IoT-Based Smart City Applications Using Machine Learning Techniques

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17249347 ◽

2020 ◽

Vol 17 (24) ◽

pp. 9347 ◽

Cited By ~ 1

Author(s):

Md Mamunur Rashid ◽

Joarder Kamruzzaman ◽

Mohammad Mehedi Hassan ◽

Tasadduq Imam ◽

Steven Gordon

Keyword(s):

Machine Learning ◽

Smart City ◽

Service Providers ◽

Detection System ◽

Smart Cities ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Increased Risk ◽

Iot Devices

In recent years, the widespread deployment of the Internet of Things (IoT) applications has contributed to the development of smart cities. A smart city utilizes IoT-enabled technologies, communications and applications to maximize operational efficiency and enhance both the service providers’ quality of services and people’s wellbeing and quality of life. With the growth of smart city networks, however, comes the increased risk of cybersecurity threats and attacks. IoT devices within a smart city network are connected to sensors linked to large cloud servers and are exposed to malicious attacks and threats. Thus, it is important to devise approaches to prevent such attacks and protect IoT devices from failure. In this paper, we explore an attack and anomaly detection technique based on machine learning algorithms (LR, SVM, DT, RF, ANN and KNN) to defend against and mitigate IoT cybersecurity threats in a smart city. Contrary to existing works that have focused on single classifiers, we also explore ensemble methods such as bagging, boosting and stacking to enhance the performance of the detection system. Additionally, we consider an integration of feature selection, cross-validation and multi-class classification for the discussed domain, which has not been well considered in the existing literature. Experimental results with the recent attack dataset demonstrate that the proposed technique can effectively identify cyberattacks and the stacking ensemble model outperforms comparable models in terms of accuracy, precision, recall and F1-Score, implying the promise of stacking in this domain.

Download Full-text

Parallel processing using big data and machine learning techniques for intrusion detection

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v9.i3.pp553-560 ◽

2020 ◽

Vol 9 (3) ◽

pp. 553

Author(s):

Alaeddine Boukhalfa ◽

Nabil Hmina ◽

Habiba Chaoni

Keyword(s):

Machine Learning ◽

Big Data ◽

Processing Time ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Detection Accuracy ◽

New Approach ◽

Learning Techniques ◽

Proposed Model ◽

Multiple Devices

Currently, information technology is used in all the life domains, multiple devices produce data and transfer them across the network, these transfers are not always secured, they can contain new menaces invisible by the current security devices. Moreover, the large amount and variety of the exchanged data cause difficulties related to the detection time. To solve these issues, we suggest in this paper, a new approach based on storing the large amount and variety of network traffic data employing Big Data techniques, and analyzing these data with Machine Learning algorithms, in a distributed and parallel way, in order to detect new hidden intrusions with less processing time. According to the results of the experiments, the detection accuracy of the Machine Learning methods reaches 99.9 %, and their processing time has been reduced considerably by applying them in a parallel and distributed way, which proves that our proposed model is effective for the detection of new intrusions.

Download Full-text

Disease Identification in Chilli Leaves using Machine Learning Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1061.1291s319 ◽

2019 ◽

Vol 9 (1S3) ◽

pp. 325-329

Keyword(s):

Neural Network ◽

Machine Learning ◽

Detection System ◽

Network Models ◽

Detection Algorithm ◽

Machine Learning Techniques ◽

Neural Network Models ◽

The Past ◽

Learning Techniques ◽

Proposed Model

Crop diseases reduce the yield of the crop or may even kill it. Over the past two years, as per the I.C.A.R, the production of chilies in the state of Goa has reduced drastically due to the presence of virus. Most of the plants flower very less or stop flowering completely. In rare cases when a plant manages to flower, the yield is substantially low. Proposed model detects the presence of disease in crops by examining the symptoms. The model uses an object detection algorithm and supervised image recognition and feature extraction using convolutional neural network to classify crops as infected or healthy. Google machine learning libraries, TensorFlow and Keras are used to build neural network models. An Android application is developed around the model for the ease of using the disease detection system.

Download Full-text

Optimization of IDS using Filter-Based Feature Selection and Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b8278.1210220 ◽

2020 ◽

Vol 10 (2) ◽

pp. 96-102

Author(s):

Neha Sharma ◽

Harsh Vardhan Bhandari ◽

Narendra Singh Yadav ◽

Harsh Vardhan Jonathan Shroff

Keyword(s):

Machine Learning ◽

Secure Communication ◽

Detection System ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Data Set ◽

Normal Behavior ◽

Active Processes ◽

High Level ◽

Use Of Internet

Nowadays it is imperative to maintain a high level of security to ensure secure communication of information between various institutions and organizations. With the growing use of internet over the years, the number of attacks over the internet have escalated. A powerful Intrusion Detection System (IDS) is required to ensure the security of a network. The aim of an IDS is to monitor the active processes in a network and to detect any deviation from the normal behavior of the system. When it comes to machine learning, optimization is the process of obtaining the maximum accuracy from a model. Optimization is vital for IDSs in order to predict a wide variety of attacks with utmost accuracy. The effectiveness of an IDS is dependent on its ability to correctly predict and classify any anomaly faced by a computer system. During the last two decades, KDD_CUP_99 has been the most widely used data set to evaluate the performance of such systems. In this study, we will apply different Machine Learning techniques on this data set and see which technique yields the best results.

Download Full-text

Adaptive Ensemble Multi-Agent Based Intrusion Detection Model

Machine Learning ◽

10.4018/978-1-60960-818-7.ch317 ◽

2012 ◽

pp. 647-659

Author(s):

Tarek Helmy

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Machine Learning Algorithms ◽

Agent Based ◽

Detection Model ◽

Detection Analysis ◽

Proposed Model ◽

Multi Agent

Download Full-text

Multi-level host-based intrusion detection system for Internet of things

Journal of Cloud Computing Advances Systems and Applications ◽

10.1186/s13677-020-00206-6 ◽

2020 ◽

Vol 9 (1) ◽

Author(s):

Robin Gassais ◽

Naser Ezzati-Jivan ◽

Jose M. Fernandez ◽

Daniel Aloise ◽

Michel R. Dagenais

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

Intrusion Detection ◽

Detection System ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Network Activity ◽

Machine Learning Techniques ◽

Smart Devices ◽

Automation System

AbstractThe growth of the Internet of things (IoT) has ushered in a new area of inter-connectivity and innovation in the home. Many devices, once separate, can now be interacted with remotely, improving efficiency and organization. This, however, comes at the cost of rising security vulnerabilities. Vendors are competing to create and release quickly innovative connected objects, without focusing on the security issues. As a consequence, attacks involving smart devices, or targeting them, are proliferating, creating threats to user’s privacy and even their physical security. Additionally, the heterogeneous technologies involved in IoT make attempts to develop protection on smart devices much harder. Most of the intrusion detection systems developed for those platforms are based on network activity. However, on many systems, intrusions cannot easily or reliably be detected from network traces. We propose a novel host-based automated framework for intrusion detection. Our work combines user space and kernel space information and machine learning techniques to detect various kinds of intrusions in smart devices. Our solution use tracing techniques to automatically get devices behavior, process this data into numeric arrays to train several machine learning algorithms, and raise alerts whenever an intrusion is found. We implemented several machine learning algorithms, including deep learning ones, to achieve high detection capabilities, while adding little overhead on the monitored devices. We tested our solution within a realistic home automation system with actual threats.

Download Full-text