Detecting Website Defacements Based on Machine Learning Techniques and Attack Signatures

Xuan Dau Hoang; Ngoc Tuong Nguyen

doi:10.3390/computers8020035

Detecting Website Defacements Based on Machine Learning Techniques and Attack Signatures

Computers ◽

10.3390/computers8020035 ◽

2019 ◽

Vol 8 (2) ◽

pp. 35 ◽

Cited By ~ 2

Author(s):

Xuan Dau Hoang ◽

Ngoc Tuong Nguyen

Keyword(s):

Machine Learning ◽

Web Applications ◽

False Positive Rate ◽

Training Data ◽

Machine Learning Techniques ◽

Web Pages ◽

Government Organizations ◽

Detection Model ◽

Learning Techniques ◽

Positive Rate

Defacement attacks have long been considered one of prime threats to websites and web applications of companies, enterprises, and government organizations. Defacement attacks can bring serious consequences to owners of websites, including immediate interruption of website operations and damage of the owner reputation, which may result in huge financial losses. Many solutions have been researched and deployed for monitoring and detection of website defacement attacks, such as those based on checksum comparison, diff comparison, DOM tree analysis, and complicated algorithms. However, some solutions only work on static websites and others demand extensive computing resources. This paper proposes a hybrid defacement detection model based on the combination of the machine learning-based detection and the signature-based detection. The machine learning-based detection first constructs a detection profile using training data of both normal and defaced web pages. Then, it uses the profile to classify monitored web pages into either normal or attacked. The machine learning-based component can effectively detect defacements for both static pages and dynamic pages. On the other hand, the signature-based detection is used to boost the model’s processing performance for common types of defacements. Extensive experiments show that our model produces an overall accuracy of more than 99.26% and a false positive rate of about 0.27%. Moreover, our model is suitable for implementation of a real-time website defacement monitoring system because it does not demand extensive computing resources.

Download Full-text

An Ensemble-Based Malware Detection Model Using Minimum Feature Set

MENDEL ◽

10.13164/mendel.2019.2.001 ◽

2019 ◽

Vol 25 (2) ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Ivan Zelinka ◽

Eslam Amer

Keyword(s):

Machine Learning ◽

False Positive Rate ◽

Malware Detection ◽

Machine Learning Techniques ◽

Detection Methods ◽

Detection Model ◽

Learning Techniques ◽

Proposed Model ◽

Positive Rate ◽

Minimum Number

Current commercial antivirus detection engines still rely on signature-based methods. However, with the huge increase in the number of new malware, current detection methods become not suitable. In this paper, we introduce a malware detection model based on ensemble learning. The model is trained using the minimum number of signification features that are extracted from the file header. Evaluations show that the ensemble models slightly outperform individual classification models. Experimental evaluations show that our model can predict unseen malware with an accuracy rate of 0.998 and with a false positive rate of 0.002. The paper also includes a comparison between the performance of the proposed model and with different machine learning techniques. We are emphasizing the use of machine learning based approaches to replace conventional signature-based methods.

Download Full-text

IoT Dataset Validation Using Machine Learning Techniques for Traffic Anomaly Detection

Electronics ◽

10.3390/electronics10222857 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2857

Author(s):

Laura Vigoya ◽

Diego Fernandez ◽

Victor Carneiro ◽

Francisco Nóvoa

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

False Positive Rate ◽

Machine Learning Techniques ◽

Support Vector ◽

High Detection Rate ◽

Security Vulnerabilities ◽

Smart Systems ◽

Learning Techniques ◽

Positive Rate

With advancements in engineering and science, the application of smart systems is increasing, generating a faster growth of the IoT network traffic. The limitations due to IoT restricted power and computing devices also raise concerns about security vulnerabilities. Machine learning-based techniques have recently gained credibility in a successful application for the detection of network anomalies, including IoT networks. However, machine learning techniques cannot work without representative data. Given the scarcity of IoT datasets, the DAD emerged as an instrument for knowing the behavior of dedicated IoT-MQTT networks. This paper aims to validate the DAD dataset by applying Logistic Regression, Naive Bayes, Random Forest, AdaBoost, and Support Vector Machine to detect traffic anomalies in IoT. To obtain the best results, techniques for handling unbalanced data, feature selection, and grid search for hyperparameter optimization have been used. The experimental results show that the proposed dataset can achieve a high detection rate in all the experiments, providing the best mean accuracy of 0.99 for the tree-based models, with a low false-positive rate, ensuring effective anomaly detection.

Download Full-text

Early Weed Detection Using Image Processing and Machine Learning Techniques in an Australian Chilli Farm

Agriculture ◽

10.3390/agriculture11050387 ◽

2021 ◽

Vol 11 (5) ◽

pp. 387

Author(s):

Nahina Islam ◽

Md Mamunur Rashid ◽

Santoso Wibowo ◽

Cheng-Yuan Xu ◽

Ahsan Morshed ◽

...

Keyword(s):

Machine Learning ◽

False Positive Rate ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Weed Detection ◽

Learning Techniques ◽

Positive Rate ◽

Uav Images

This paper explores the potential of machine learning algorithms for weed and crop classification from UAV images. The identification of weeds in crops is a challenging task that has been addressed through orthomosaicing of images, feature extraction and labelling of images to train machine learning algorithms. In this paper, the performances of several machine learning algorithms, random forest (RF), support vector machine (SVM) and k-nearest neighbours (KNN), are analysed to detect weeds using UAV images collected from a chilli crop field located in Australia. The evaluation metrics used in the comparison of performance were accuracy, precision, recall, false positive rate and kappa coefficient. MATLAB is used for simulating the machine learning algorithms; and the achieved weed detection accuracies are 96% using RF, 94% using SVM and 63% using KNN. Based on this study, RF and SVM algorithms are efficient and practical to use, and can be implemented easily for detecting weed from UAV images.

Download Full-text

Detection of Drive-by Download Attacks Using Machine Learning Approach

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch082 ◽

2020 ◽

pp. 1598-1611

Author(s):

Monther Aldwairi ◽

Musaab Hasan ◽

Zayed Balbahaith

Keyword(s):

Machine Learning ◽

False Positive Rate ◽

Detection Accuracy ◽

Web Pages ◽

Financial Loss ◽

Detection Model ◽

Detection Systems ◽

Novel Approach ◽

Positive Rate ◽

Using Data

Drive-by download refers to attacks that automatically download malwares to user's computer without his knowledge or consent. This type of attack is accomplished by exploiting web browsers and plugins vulnerabilities. The damage may include data leakage leading to financial loss. Traditional antivirus and intrusion detection systems are not efficient against such attacks. Researchers proposed plenty of detection approaches mostly passive blacklisting. However, a few proposed dynamic classification techniques, which suffer from clear shortcomings. In this paper, we propose a novel approach to detect drive-by download infected web pages based on extracted features from their source code. We test 23 different machine learning classifiers using data set of 5435 webpages and based on the detection accuracy we selected the top five to build our detection model. The approach is expected to serve as a base for implementing and developing anti drive-by download programs. We develop a graphical user interface program to allow the end user to examine the URL before visiting the website. The Bagged Trees classifier exhibited the highest accuracy of 90.1% and reported 96.24% true positive and 26.07% false positive rate.

Download Full-text

Challenges for Tractogram Filtering

Mathematics and Visualization - Anisotropy Across Fields and Scales ◽

10.1007/978-3-030-56215-1_7 ◽

2021 ◽

pp. 149-168

Author(s):

Daniel Jörgens ◽

Maxime Descoteaux ◽

Rodrigo Moreno

Keyword(s):

Machine Learning ◽

White Matter ◽

False Positive Rate ◽

Machine Learning Techniques ◽

Post Processing ◽

Learning Techniques ◽

Processing Step ◽

Positive Rate ◽

Neural Fiber ◽

Modern Machine

AbstractTractography aims at describing the most likely neural fiber paths in white matter. A general issue of current tractography methods is their large false-positive rate. An approach to deal with this problem is tractogram filtering in which anatomically implausible streamlines are discarded as a post-processing step after tractography. In this chapter, we review the main approaches and methods from literature that are relevant for the application of tractogram filtering. Moreover, we give a perspective on the central challenges for the development of new methods, including modern machine learning techniques, in this field in the next few years.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

Artificially Generated Training Data-sets for Supervised Machine Learning Techniques in Magnetic Resonance Imaging: An Example in Myocardial Segmentation

2019 Computing in Cardiology Conference (CinC) ◽

10.22489/cinc.2019.220 ◽

2019 ◽

Author(s):

Christos Xanthis ◽

Kostas Haris ◽

Dimitrios Filos ◽

Anthony Aletras

Keyword(s):

Magnetic Resonance Imaging ◽

Machine Learning ◽

Magnetic Resonance ◽

Training Data ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Sets ◽

Resonance Imaging ◽

Learning Techniques ◽

Myocardial Segmentation

Download Full-text

Training a deep learning model for single-cell segmentation without manual annotation

Scientific Reports ◽

10.1038/s41598-021-03299-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Nizam Ud Din ◽

Ji Yu

Keyword(s):

Machine Learning ◽

Training Data ◽

Machine Learning Techniques ◽

Cell Segmentation ◽

Learning Techniques ◽

Signal Characteristics ◽

Human Operators ◽

Deep Learning Model ◽

Microscopy Images ◽

Segmentation Models

AbstractAdvances in the artificial neural network have made machine learning techniques increasingly more important in image analysis tasks. Recently, convolutional neural networks (CNN) have been applied to the problem of cell segmentation from microscopy images. However, previous methods used a supervised training paradigm in order to create an accurate segmentation model. This strategy requires a large amount of manually labeled cellular images, in which accurate segmentations at pixel level were produced by human operators. Generating training data is expensive and a major hindrance in the wider adoption of machine learning based methods for cell segmentation. Here we present an alternative strategy that trains CNNs without any human-labeled data. We show that our method is able to produce accurate segmentation models, and is applicable to both fluorescence and bright-field images, and requires little to no prior knowledge of the signal characteristics.

Download Full-text

A Novel Approach for Computer-Aided Diagnosis for Distinction Between Benign and Malignant of Lung Nodules Based on Machine Learning Techniques

Handbook of Research on Information Security in Biomedical Signal Processing - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-5152-2.ch014 ◽

2018 ◽

pp. 281-290

Author(s):

Shashidhara Bola

Keyword(s):

False Positive Rate ◽

Lung Nodule ◽

Machine Learning Techniques ◽

Lung Nodules ◽

Data Set ◽

Novel Approach ◽

Learning Techniques ◽

Nodule Shape ◽

Positive Rate ◽

Aided Diagnosis

A new method is proposed to classify the lung nodules as benign and malignant. The method is based on analysis of lung nodule shape, contour, and texture for better classification. The data set consists of 39 lung nodules of 39 patients which contain 19 benign and 20 malignant nodules. Lung regions are segmented based on morphological operators and lung nodules are detected based on shape and area features. The proposed algorithm was tested on LIDC (lung image database consortium) datasets and the results were found to be satisfactory. The performance of the method for distinction between benign and malignant was evaluated by the use of receiver operating characteristic (ROC) analysis. The method achieved area under the ROC curve was 0.903 which reduces the false positive rate.

Download Full-text