A Combined Static and Dynamic Analysis Approach to Detect Malicious Browser Extensions

Security and Communication Networks ◽

10.1155/2018/7087239 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16

Author(s):

Yao Wang ◽

Wandong Cai ◽

Pin Lyu ◽

Wei Shao

Keyword(s):

Machine Learning ◽

False Positive Rate ◽

Feature Selection Method ◽

Machine Learning Techniques ◽

Security Risk ◽

Test Set ◽

Static And Dynamic Analysis ◽

Detection Model ◽

Validation Set ◽

Browser Extensions

Ill-intentioned browser extensions pose an emergent security risk and have become one of the most common attack vectors on the Internet due to their wide popularity and high privilege. Once installed, malicious extensions are executed and attempt to compromise a victim’s browser. To detect malicious browser extensions, security researchers have put forward several techniques. These techniques primarily concentrate on the usage of API calls by malicious extensions, imposing restricted policies for extensions, and monitoring extension’s activities. In this paper, we propose a machine-learning-based approach to detect malicious extensions. We apply static and dynamic techniques to analyse an extension for extracting features. The analysis process extracts features from the source codes including JavaScript codes, HTML pages, and CSS files and the execution activities of an extension. To guarantee the robustness of the features, a feature selection method is then applied to retain the most relevant features while discarding low-correlated features. The detection models based on machine-learning techniques are subsequently constructed by leveraging these features. As can be seen from evaluation results, our detection model, containing over 4,600 labelled extension samples, is able to detect malicious extensions with an accuracy of 96.52% in validation set and 95.18% in test set, with a false positive rate of 2.38% in validation set and 3.66% in test set.

Download Full-text

Detecting Website Defacements Based on Machine Learning Techniques and Attack Signatures

Computers ◽

10.3390/computers8020035 ◽

2019 ◽

Vol 8 (2) ◽

pp. 35 ◽

Cited By ~ 2

Author(s):

Xuan Dau Hoang ◽

Ngoc Tuong Nguyen

Keyword(s):

Machine Learning ◽

Web Applications ◽

False Positive Rate ◽

Training Data ◽

Machine Learning Techniques ◽

Web Pages ◽

Government Organizations ◽

Detection Model ◽

Learning Techniques ◽

Positive Rate

Defacement attacks have long been considered one of prime threats to websites and web applications of companies, enterprises, and government organizations. Defacement attacks can bring serious consequences to owners of websites, including immediate interruption of website operations and damage of the owner reputation, which may result in huge financial losses. Many solutions have been researched and deployed for monitoring and detection of website defacement attacks, such as those based on checksum comparison, diff comparison, DOM tree analysis, and complicated algorithms. However, some solutions only work on static websites and others demand extensive computing resources. This paper proposes a hybrid defacement detection model based on the combination of the machine learning-based detection and the signature-based detection. The machine learning-based detection first constructs a detection profile using training data of both normal and defaced web pages. Then, it uses the profile to classify monitored web pages into either normal or attacked. The machine learning-based component can effectively detect defacements for both static pages and dynamic pages. On the other hand, the signature-based detection is used to boost the model’s processing performance for common types of defacements. Extensive experiments show that our model produces an overall accuracy of more than 99.26% and a false positive rate of about 0.27%. Moreover, our model is suitable for implementation of a real-time website defacement monitoring system because it does not demand extensive computing resources.

Download Full-text

An Ensemble-Based Malware Detection Model Using Minimum Feature Set

MENDEL ◽

10.13164/mendel.2019.2.001 ◽

2019 ◽

Vol 25 (2) ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Ivan Zelinka ◽

Eslam Amer

Keyword(s):

Machine Learning ◽

False Positive Rate ◽

Malware Detection ◽

Machine Learning Techniques ◽

Detection Methods ◽

Detection Model ◽

Learning Techniques ◽

Proposed Model ◽

Positive Rate ◽

Minimum Number

Current commercial antivirus detection engines still rely on signature-based methods. However, with the huge increase in the number of new malware, current detection methods become not suitable. In this paper, we introduce a malware detection model based on ensemble learning. The model is trained using the minimum number of signification features that are extracted from the file header. Evaluations show that the ensemble models slightly outperform individual classification models. Experimental evaluations show that our model can predict unseen malware with an accuracy rate of 0.998 and with a false positive rate of 0.002. The paper also includes a comparison between the performance of the proposed model and with different machine learning techniques. We are emphasizing the use of machine learning based approaches to replace conventional signature-based methods.

Download Full-text

Detecting DDoS Attacks in IoT Environment

International Journal of Information Security and Privacy ◽

10.4018/ijisp.2021040108 ◽

2021 ◽

Vol 15 (2) ◽

pp. 145-180

Author(s):

Yasmine Labiod ◽

Abdelaziz Amara Korba ◽

Nacira Ghoualmi-Zine

Keyword(s):

Machine Learning ◽

Cyber Security ◽

Detection System ◽

False Positive Rate ◽

Cyber Attacks ◽

Machine Learning Techniques ◽

Ddos Attacks ◽

High Detection Rate ◽

Detection Techniques ◽

Detection Model

With the great potential of internet of things (IoT) infrastructure in different domains, cyber-attacks are also rising commensurately. Distributed denials of service (DDoS) attacks are one of the cyber security threats. This paper will focus on DDoS attacks by adding the design of an intrusion detection system (IDS) tailored to IoT systems. Moreover, machine learning techniques will be investigated to distinguish the data representing flows of network traffic, which include both normal and DDoS traffic. In addition, these techniques will be used to help make a refined detection model for identifying different types of DDoS attacks. Furthermore, the performance of machine learning-based proposed solution is validated using N-BaIoT dataset and compared through different evaluation metrics. The experimental results show that the proposed IDS not only detects DDoS attacks types but also has a high detection rate and low false positive rate, which argues the usefulness of the proposed approach in comparison with several existing DDoS attacks detection techniques.

Download Full-text

Machine Learning Based Assembly of Fragments of Ancient Papyrus

Journal on Computing and Cultural Heritage ◽

10.1145/3460961 ◽

2021 ◽

Vol 14 (3) ◽

pp. 1-21

Author(s):

Roy Abitbol ◽

Ilan Shimshoni ◽

Jonathan Ben-Dov

Keyword(s):

Machine Learning ◽

Spatial Information ◽

Real Life ◽

Dead Sea ◽

Machine Learning Techniques ◽

Automated Classification ◽

Learning Techniques ◽

Test Batch ◽

Validation Set ◽

Local Edge

The task of assembling fragments in a puzzle-like manner into a composite picture plays a significant role in the field of archaeology as it supports researchers in their attempt to reconstruct historic artifacts. In this article, we propose a method for matching and assembling pairs of ancient papyrus fragments containing mostly unknown scriptures. Papyrus paper is manufactured from papyrus plants and therefore portrays typical thread patterns resulting from the plant’s stems. The proposed algorithm is founded on the hypothesis that these thread patterns contain unique local attributes such that nearby fragments show similar patterns reflecting the continuations of the threads. We posit that these patterns can be exploited using image processing and machine learning techniques to identify matching fragments. The algorithm and system which we present support the quick and automated classification of matching pairs of papyrus fragments as well as the geometric alignment of the pairs against each other. The algorithm consists of a series of steps and is based on deep-learning and machine learning methods. The first step is to deconstruct the problem of matching fragments into a smaller problem of finding thread continuation matches in local edge areas (squares) between pairs of fragments. This phase is solved using a convolutional neural network ingesting raw images of the edge areas and producing local matching scores. The result of this stage yields very high recall but low precision. Thus, we utilize these scores in order to conclude about the matching of entire fragments pairs by establishing an elaborate voting mechanism. We enhance this voting with geometric alignment techniques from which we extract additional spatial information. Eventually, we feed all the data collected from these steps into a Random Forest classifier in order to produce a higher order classifier capable of predicting whether a pair of fragments is a match. Our algorithm was trained on a batch of fragments which was excavated from the Dead Sea caves and is dated circa the 1st century BCE. The algorithm shows excellent results on a validation set which is of a similar origin and conditions. We then tried to run the algorithm against a real-life set of fragments for which we have no prior knowledge or labeling of matches. This test batch is considered extremely challenging due to its poor condition and the small size of its fragments. Evidently, numerous researchers have tried seeking matches within this batch with very little success. Our algorithm performance on this batch was sub-optimal, returning a relatively large ratio of false positives. However, the algorithm was quite useful by eliminating 98% of the possible matches thus reducing the amount of work needed for manual inspection. Indeed, experts that reviewed the results have identified some positive matches as potentially true and referred them for further investigation.

Download Full-text

IoT Dataset Validation Using Machine Learning Techniques for Traffic Anomaly Detection

Electronics ◽

10.3390/electronics10222857 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2857

Author(s):

Laura Vigoya ◽

Diego Fernandez ◽

Victor Carneiro ◽

Francisco Nóvoa

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

False Positive Rate ◽

Machine Learning Techniques ◽

Support Vector ◽

High Detection Rate ◽

Security Vulnerabilities ◽

Smart Systems ◽

Learning Techniques ◽

Positive Rate

With advancements in engineering and science, the application of smart systems is increasing, generating a faster growth of the IoT network traffic. The limitations due to IoT restricted power and computing devices also raise concerns about security vulnerabilities. Machine learning-based techniques have recently gained credibility in a successful application for the detection of network anomalies, including IoT networks. However, machine learning techniques cannot work without representative data. Given the scarcity of IoT datasets, the DAD emerged as an instrument for knowing the behavior of dedicated IoT-MQTT networks. This paper aims to validate the DAD dataset by applying Logistic Regression, Naive Bayes, Random Forest, AdaBoost, and Support Vector Machine to detect traffic anomalies in IoT. To obtain the best results, techniques for handling unbalanced data, feature selection, and grid search for hyperparameter optimization have been used. The experimental results show that the proposed dataset can achieve a high detection rate in all the experiments, providing the best mean accuracy of 0.99 for the tree-based models, with a low false-positive rate, ensuring effective anomaly detection.

Download Full-text

IntruDTree: A Machine Learning-Based Cyber Security Intrusion Detection Model

10.20944/preprints202004.0481.v1 ◽

2020 ◽

Author(s):

Iqbal H. Sarker ◽

Yoosef B. Abushark ◽

Fawaz Alsolami ◽

Asif Irshad Khan

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Cyber Security ◽

Intrusion Detection System ◽

Detection System ◽

Machine Learning Techniques ◽

Support Vector ◽

Security Model ◽

K Nearest Neighbor ◽

Detection Model

Cyber security has recently received enormous attention in today’s security concerns, due to the popularity of the Internet-of-Things (IoT), the tremendous growth of computer networks, and the huge number of relevant applications. Thus, detecting various cyber-attacks or anomalies in a network and building an effective intrusion detection system that performs an essential role in today’s security is becoming more important. Artificial intelligence, particularly machine learning techniques, can be used for building such a data-driven intelligent intrusion detection system. In order to achieve this goal, in this paper, we present an Intrusion Detection Tree (“IntruDTree”) machine-learning-based security model that first takes into account the ranking of security features according to their importance and then build a tree-based generalized intrusion detection model based on the selected important features. This model is not only effective in terms of prediction accuracy for unseen test cases but also minimizes the computational complexity of the model by reducing the feature dimensions. Finally, the effectiveness of our IntruDTree model was examined by conducting experiments on cybersecurity datasets and computing the precision, recall, fscore, accuracy, and ROC values to evaluate. We also compare the outcome results of IntruDTree model with several traditional popular machine learning methods such as the naive Bayes classifier, logistic regression, support vector machines, and k-nearest neighbor, to analyze the effectiveness of the resulting security model.

Download Full-text

Network traffic analysis using Machine Learning Techniques in IoT Network

International Journal of Software Innovation ◽

10.4018/ijsi.289172 ◽

2021 ◽

Vol 9 (4) ◽

pp. 0-0

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Network Traffic ◽

Traffic Analysis ◽

Machine Learning Techniques ◽

Support Vector ◽

Static And Dynamic Analysis ◽

Network Traffic Analysis ◽

Cyber Threats ◽

Network Topologies

Internet of things devices are not very intelligent and resource-constrained; thus, they are vulnerable to cyber threats. Cyber threats would become potentially harmful and lead to infecting the machines, disrupting the network topologies, and denying services to their legitimate users. Artificial intelligence-driven methods and advanced machine learning-based network investigation prevent the network from malicious traffics. In this research, a support vector machine learning technique was used to classify normal and abnormal traffic. Network traffic analysis has been done to detect and prevent the network from malicious traffic. Static and dynamic analysis of malware has been done. Mininet emulator was selected for network design, VMware fusion for creating a virtual environment, hosting OS was Ubuntu Linux, network topology was a tree topology. Wireshark was used to open an existing pcap file that contains network traffic. The support vector machine classifier demonstrated the best performance with 99% accuracy.

Download Full-text

Using Machine Learning to Identify True Somatic Variants from Next-Generation Sequencing

Clinical Chemistry ◽

10.1373/clinchem.2019.308213 ◽

2019 ◽

Vol 66 (1) ◽

pp. 239-246 ◽

Cited By ~ 1

Author(s):

Chao Wu ◽

Xiaonan Zhao ◽

Mark Welsh ◽

Kellianne Costello ◽

Kajia Cao ◽

...

Keyword(s):

Machine Learning ◽

Next Generation Sequencing ◽

Clinical Laboratory ◽

Next Generation ◽

Single Nucleotide Variants ◽

Test Set ◽

Clinical Laboratories ◽

Bona Fide ◽

Validation Set ◽

Generation Sequencing

Abstract BACKGROUND Molecular profiling has become essential for tumor risk stratification and treatment selection. However, cancer genome complexity and technical artifacts make identification of real variants a challenge. Currently, clinical laboratories rely on manual screening, which is costly, subjective, and not scalable. We present a machine learning–based method to distinguish artifacts from bona fide single-nucleotide variants (SNVs) detected by next-generation sequencing from nonformalin-fixed paraffin-embedded tumor specimens. METHODS A cohort of 11278 SNVs identified through clinical sequencing of tumor specimens was collected and divided into training, validation, and test sets. Each SNV was manually inspected and labeled as either real or artifact as part of clinical laboratory workflow. A 3-class (real, artifact, and uncertain) model was developed on the training set, fine-tuned with the validation set, and then evaluated on the test set. Prediction intervals reflecting the certainty of the classifications were derived during the process to label “uncertain” variants. RESULTS The optimized classifier demonstrated 100% specificity and 97% sensitivity over 5587 SNVs of the test set. Overall, 1252 of 1341 true-positive variants were identified as real, 4143 of 4246 false-positive calls were deemed artifacts, whereas only 192 (3.4%) SNVs were labeled as “uncertain,” with zero misclassification between the true positives and artifacts in the test set. CONCLUSIONS We presented a computational classifier to identify variant artifacts detected from tumor sequencing. Overall, 96.6% of the SNVs received definitive labels and thus were exempt from manual review. This framework could improve quality and efficiency of the variant review process in clinical laboratories.

Download Full-text

Research on Distributed Intrusion Detection Model Based on Information Fusion

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.121-122.528 ◽

2010 ◽

Vol 121-122 ◽

pp. 528-533

Author(s):

Ping Du ◽

Wei Xu

Keyword(s):

Intrusion Detection ◽

Information Fusion ◽

Detection System ◽

False Positive Rate ◽

Prototype System ◽

Security Risk ◽

Source Information ◽

Detection Model ◽

Distributed Intrusion Detection ◽

Positive Rate

The research actuality of Intrusion Detection System(IDS) were analyzed, Due to the defects of IDS such as high positive rate of IDS and incapable of effective detection of dispersed coordinated attacks on the time and space, the ideas of the multi-source information fusion were introduced in the paper, a multi-level IDS reasoning framework and prototype system were presented. The prototype adds analysis engine to the existing IDS Sensor, We used Bayesian Network as a tool for multi-source information fusion, and we used goal-tree to analyze the attempts of coordinated attacks and quantify the security risk of system. Compared to the existing IDS, the prototype is more integrated and more capable in finding coordinated attacks with lower false positive rate.

Download Full-text

Development of a Food Image Recognition Algorithm Using Machine Learning

10.21203/rs.3.rs-15859/v1 ◽

2020 ◽

Author(s):

Serge Assaad ◽

Lawrence Carin ◽

Anthony Joseph Viera

Keyword(s):

Machine Learning ◽

Food Choices ◽

Recognition Algorithm ◽

Machine Learning Techniques ◽

Operating Characteristics ◽

Food Items ◽

Trade Offs ◽

Learning Techniques ◽

Food Image ◽

Validation Set

Abstract Background: Researchers and consumers have limited options for objectively collecting or tracking data related to food choices. Objective: To develop and pilot test an algorithm that could accurately categorize food items from a meal photograph. Methods: We used a dataset of 7721 meal photographs taken by patrons in a cafeteria setting. We designed 22 broad categories recognizable by image that are parents of the original 1239 types of items in the photographs. We split the dataset into 3 mutually exclusive subsets: a training set (5250 images), a validation set (1312 images), and a test set (1159 images). Using a convolutional neural network and standard machine learning techniques, we tested the operating characteristics of the algorithm. Results: Salad recognition had the lowest specificity (0.74), while multiple categories had specificities close to 1.0 (e.g. cereals, pastries, sushi, yogurt). Areas under the ROC curve (AUCs), reflecting trade-offs between sensitivity and specificity, ranged from 0.73 (for yogurt) to 0.97 (for sushi). Conclusions: This work provides proof-of-concept for an algorithm that can categorize food items from a meal photograph.

Download Full-text