DDOS Botnets Attacks Detection in Anomaly Traffic: A Comparative Study.

Ahmed A. Elsherif; Arwa A. Aldaej

doi:10.26735/zrxn1433

DDOS Botnets Attacks Detection in Anomaly Traffic: A Comparative Study.

Journal of Information Security and Cybercrimes Research ◽

10.26735/zrxn1433 ◽

2020 ◽

Vol 3 (1) ◽

pp. 64-74

Author(s):

Ahmed A. Elsherif ◽

Arwa A. Aldaej

Keyword(s):

Random Forest ◽

Decision Tree ◽

Numerical Data ◽

Classification Model ◽

Support Vector ◽

Classification Algorithms ◽

Traffic Classification ◽

Significant Drop ◽

Random Forest Classification ◽

Ddos Attack

One of the major challenges that faces the acceptance and growth rate of business and governmental sites is a Botnet-based DDoS attack. A flooding DDoS strikes a victim machine by means of sending a vast amount of malicious traffic, causing a significant drop in the service quality (QoS) in IoT devices. Nonetheless, it is not that easy to detect and tackle flooding DDoS attacks, owing to the significant number of attacking machines, the usage of source-address spoofing, and the common areas shared between legitimate and malicious traffic. New kinds of attacks are identified daily, and some remain undiscovered, accordingly, this paper aims to improve the traffic classification algorithm of network traffic, that hackers use to try to be ambiguous or misleading. A recorded simulated traffic was used for both samples; normal and DDoS attack traffic, approximately 104.000 cases of each, where both datasets -which were created for this study- represent the input data in order to create a classification model, to be used as a tool to mitigate the risk of being attacked. The next step is putting datasets in a format suitable for classification. This process is done through preprocessing techniques, to convert categorical data into numerical data. A classification process is applied to capture datasets, to create a classification model, by using five classification algorithms which are; Decision Tree, Support Vector Machine, Naive Bayes, K-Neighbours and Random Forest. The core code used for classification is the python code, which is controlled by a user interface. The highest prediction, precision and accuracy are obtained using the Decision Tree and Random Forest classification algorithms, which also have the lowest processing time.

Download Full-text

Evaluación de algoritmos de clasificación en la plataforma Google Earth Engine para la identificación y detección de cambios de construcciones rurales y periurbanas a partir de imágenes de alta resolución

Revista de Teledetección ◽

10.4995/raet.2021.15026 ◽

2021 ◽

pp. 71

Author(s):

Alejandro Coca-Castro ◽

Maycol A. Zaraza-Aguilera ◽

Yilsey T. Benavides-Miranda ◽

Yeimy M. Montilla-Montilla ◽

Heidy B. Posada-Fandiño ◽

...

Keyword(s):

Random Forest ◽

Rural Areas ◽

Disaster Response ◽

Google Earth ◽

Classification Model ◽

Support Vector ◽

Classification Algorithms ◽

Urban Rural ◽

High Resolution Images ◽

Google Earth Engine

<p>Building change detection based on remote sensing imagery is a key task for land management and planning e.g., detection of illegal settlements, updating land records and disaster response. Under the post- classification comparison approach, this research aimed to evaluate the feasibility of several classification algorithms to identify and capture buildings and their change between two time steps using very-high resolution images (<1 m/pixel) across rural areas and urban/rural perimeter boundaries. Through an App implemented on the Google Earth Engine (GEE) platform, we selected two study areas in Colombia with different images and input data. In total, eight traditional classification algorithms, three unsupervised (K-means, X-Means y Cascade K-Means) and five supervised (Random Forest, Support Vector Machine, Naive Bayes, GMO maximum Entropy and Minimum distance) available at GEE were trained. Additionally, a deep neural network named Feature Pyramid Networks (FPN) was added and trained using a pre-trained model, EfficientNetB3 model. Three evaluation zones per study area were proposed to quantify the performance of the algorithms through the Intersection over Union (IoU) metric. This metric, with a range between 0 and 1, represents the degree of overlapping between two regions, where the higher agreement the higher IoU values. The results indicate that the models configured with the FPN network have the best performance followed by the traditional supervised algorithms. The performance differences were specific to the study area. For the rural area, the best FPN configuration obtained an IoU averaged for both time steps of 0.4, being this four times higher than the best supervised model, Support Vector Machines using a linear kernel with an average IoU of 0.1. Regarding the setting of urban/rural perimeter boundaries, this difference was less marked, having an average IoU of 0.53 in comparison to 0.38 obtained by the best supervised classification model, in this case Random Forest. The results are relevant for institutions tracking the dynamics of building areas from cloud computing platfo future assessments of classifiers in likewise platforms in other contexts.</p>

Download Full-text

Comparison of Classification Methods used in Machine Learning for Dysgraphia Identification

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i11.6142 ◽

2021 ◽

Vol 12 (11) ◽

pp. 1886-1891

Author(s):

Sarthika Dutt, Et. al.

Keyword(s):

Machine Learning ◽

Random Forest ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbors ◽

Feature Selection Technique ◽

Random Forest Classification ◽

Visual Spatial ◽

Forest Classification

Dysgraphia is a disorder that affects writing skills. Dysgraphia Identification at an early age of a child's development is a difficult task. It can be identified using problematic skills associated with Dysgraphia difficulty. In this study motor ability, space knowledge, copying skill, Visual Spatial Response are some of the features included for Dysgraphia identification. The features that affect Dysgraphia disability are analyzed using a feature selection technique EN (Elastic Net). The significant features are classified using machine learning techniques. The classification models compared are KNN (K-Nearest Neighbors), Naïve Bayes, Decision tree, Random Forest, SVM (Support Vector Machine) on the Dysgraphia dataset. Results indicate the highest performance of the Random forest classification model for Dysgraphia identification.

Download Full-text

Extraction of Arecanut Planting Distribution Based on the Feature Space Optimization of PlanetScope Imagery

Agriculture ◽

10.3390/agriculture11040371 ◽

2021 ◽

Vol 11 (4) ◽

pp. 371

Author(s):

Yu Jin ◽

Jiawei Guo ◽

Huichun Ye ◽

Jinling Zhao ◽

Wenjiang Huang ◽

...

Keyword(s):

Random Forest ◽

Satellite Imagery ◽

Feature Space ◽

Kappa Coefficient ◽

Classification Model ◽

Support Vector ◽

Textural Feature ◽

Monitoring Accuracy ◽

Areca Catechu ◽

High Level

The remote sensing extraction of large areas of arecanut (Areca catechu L.) planting plays an important role in investigating the distribution of arecanut planting area and the subsequent adjustment and optimization of regional planting structures. Satellite imagery has previously been used to investigate and monitor the agricultural and forestry vegetation in Hainan. However, the monitoring accuracy is affected by the cloudy and rainy climate of this region, as well as the high level of land fragmentation. In this paper, we used PlanetScope imagery at a 3 m spatial resolution over the Hainan arecanut planting area to investigate the high-precision extraction of the arecanut planting distribution based on feature space optimization. First, spectral and textural feature variables were selected to form the initial feature space, followed by the implementation of the random forest algorithm to optimize the feature space. Arecanut planting area extraction models based on the support vector machine (SVM), BP neural network (BPNN), and random forest (RF) classification algorithms were then constructed. The overall classification accuracies of the SVM, BPNN, and RF models optimized by the RF features were determined as 74.82%, 83.67%, and 88.30%, with Kappa coefficients of 0.680, 0.795, and 0.853, respectively. The RF model with optimized features exhibited the highest overall classification accuracy and kappa coefficient. The overall accuracy of the SVM, BPNN, and RF models following feature optimization was improved by 3.90%, 7.77%, and 7.45%, respectively, compared with the corresponding unoptimized classification model. The kappa coefficient also improved. The results demonstrate the ability of PlanetScope satellite imagery to extract the planting distribution of arecanut. Furthermore, the RF is proven to effectively optimize the initial feature space, composed of spectral and textural feature variables, further improving the extraction accuracy of the arecanut planting distribution. This work can act as a theoretical and technical reference for the agricultural and forestry industries.

Download Full-text

Chronic Kidney Disease for Collaborative Healthcare Data Analytics using Random Forest Classification Algorithms

2021 International Conference on Computer Communication and Informatics (ICCCI) ◽

10.1109/iccci50826.2021.9402574 ◽

2021 ◽

Author(s):

V. Shanmugarajeshwari ◽

M. Ilayaraja

Keyword(s):

Chronic Kidney Disease ◽

Random Forest ◽

Kidney Disease ◽

Data Analytics ◽

Classification Algorithms ◽

Random Forest Classification ◽

Healthcare Data ◽

Forest Classification

Download Full-text

Predicting Parkinson’s Disease Medication Regimen Using Sensor Technology

10.21203/rs.3.rs-198765/v1 ◽

2021 ◽

Author(s):

Jeremy Watts ◽

Anahita Khojandi ◽

Rama Vasudevan ◽

Fatta B. Nahab ◽

Ritesh Ramdhani

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Random Forest ◽

Classification Model ◽

Sensor Data ◽

Sensor Technology ◽

Medication Regimen ◽

Levodopa Dose ◽

Subjective Data ◽

Random Forest Classification

Abstract Parkinson’s disease (PD) medication treatment planning is generally based on subjective data through in-office, physicianpatient interactions. The Personal KinetiGraphTM (PKG) has shown promise in enabling objective, continuous remote health monitoring for Parkinson’s patients. In this proof-of-concept study, we propose to use objective sensor data from the PKG and apply machine learning to subtype patients based on levodopa regimens and response. We apply k-means clustering to a dataset of with-in-subject Parkinson’s medication changes—clinically assessed by the PKG and Hoehn & Yahr (H&Y) staging. A random forest classification model was then used to predict patients’ cluster allocation based on their respective PKG data and demographic information. Clinically relevant clusters were developed based on longitudinal dopaminergic regimens—partitioned by levodopa dose, administration frequency, and total levodopa equivalent daily dose—with the PKG increasing cluster granularity compared to the H&Y staging. A random forest classifier was able to accurately classify subjects of the two most demographically similar clusters with an accuracy of 87:9 ±1:3

Download Full-text

Multi-Class Taxonomy of Well Integrity Anomalies Applying Inductive Learning Algorithms: Analytical Approach for Artificial-Lift Wells

10.2118/206129-ms ◽

2021 ◽

Author(s):

Mostafa Sa'eed Yakoot ◽

Adel Mohamed Salem Ragab ◽

Omar Mahmoud

Keyword(s):

Decision Tree ◽

Confusion Matrix ◽

Learning Algorithms ◽

Oil And Gas Industry ◽

Classification Model ◽

Gradient Boosting ◽

Support Vector ◽

Risk Category ◽

Well Integrity ◽

Extreme Gradient Boosting

Abstract Well integrity has become a crucial field with increased focus and being published intensively in industry researches. It is important to maintain the integrity of the individual well to ensure that wells operate as expected for their designated life (or higher) with all risks kept as low as reasonably practicable, or as specified. Machine learning (ML) and artificial intelligence (AI) models are used intensively in oil and gas industry nowadays. ML concept is based on powerful algorithms and robust database. Developing an efficient classification model for well integrity (WI) anomalies is now feasible because of having enormous number of well failures and well barrier integrity tests, and analyses in the database. Circa 9000 dataset points were collected from WI tests performed for 800 wells in Gulf of Suez, Egypt for almost 10 years. Moreover, those data have been quality-controlled and quality-assured by experienced engineers. The data contain different forms of WI failures. The contributing parameter set includes a total of 23 barrier elements. Data were structured and fed into 11 different ML algorithms to build an automated systematic tool for calculating imposed risk category of any well. Comparison analysis for the deployed models was performed to infer the best predictive model that can be relied on. 11 models include both supervised and ensemble learning algorithms such as random forest, support vector machine (SVM), decision tree and scalable boosting techniques. Out of 11 models, the results showed that extreme gradient boosting (XGB), categorical boosting (CatBoost), and decision tree are the most reliable algorithms. Moreover, novel evaluation metrics for confusion matrix of each model have been introduced to overcome the problem of existing metrics which don't consider domain knowledge during model evaluation. The innovated model will help to utilize company resources efficiently and dedicate personnel efforts to wells with the high-risk. As a result, progressive improvements on business, safety, environment, and performance of the business. This paper would be a milestone in the design and creation of the Well Integrity Database Management Program through the combination of integrity and ML.

Download Full-text

CPT Data Interpretation Employing Different Machine Learning Techniques

Geosciences ◽

10.3390/geosciences11070265 ◽

2021 ◽

Vol 11 (7) ◽

pp. 265

Author(s):

Stefan Rauter ◽

Franz Tschuchnigg

Keyword(s):

Machine Learning ◽

Grain Size ◽

Random Forest ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Models ◽

Cone Penetration ◽

Tip Resistance ◽

Machine Learning Models

The classification of soils into categories with a similar range of properties is a fundamental geotechnical engineering procedure. At present, this classification is based on various types of cost- and time-intensive laboratory and/or in situ tests. These soil investigations are essential for each individual construction site and have to be performed prior to the design of a project. Since Machine Learning could play a key role in reducing the costs and time needed for a suitable site investigation program, the basic ability of Machine Learning models to classify soils from Cone Penetration Tests (CPT) is evaluated. To find an appropriate classification model, 24 different Machine Learning models, based on three different algorithms, are built and trained on a dataset consisting of 1339 CPT. The applied algorithms are a Support Vector Machine, an Artificial Neural Network and a Random Forest. As input features, different combinations of direct cone penetration test data (tip resistance qc, sleeve friction fs, friction ratio Rf, depth d), combined with “defined”, thus, not directly measured data (total vertical stresses σv, effective vertical stresses σ’v and hydrostatic pore pressure u0), are used. Standard soil classes based on grain size distributions and soil classes based on soil behavior types according to Robertson are applied as targets. The different models are compared with respect to their prediction performance and the required learning time. The best results for all targets were obtained with models using a Random Forest classifier. For the soil classes based on grain size distribution, an accuracy of about 75%, and for soil classes according to Robertson, an accuracy of about 97–99%, was reached.

Download Full-text

Network traffic classification — A comparative study of two common decision tree methods: C4.5 and Random forest

2014 2nd International Conference on Electronic Design (ICED) ◽

10.1109/iced.2014.7015800 ◽

2014 ◽

Cited By ~ 6

Author(s):

Alhamza Munther ◽

Alabass Alalousi ◽

Shahrul Nizam ◽

Rozmie R. Othman ◽

Mohammed Anbar

Keyword(s):

Random Forest ◽

Decision Tree ◽

Comparative Study ◽

Network Traffic ◽

Traffic Classification ◽

Network Traffic Classification ◽

Tree Methods ◽

Decision Tree Methods

Download Full-text

Machine Learning Framework to Predict Chronic Kidney Disease using Ensemble Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d9107.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 1-6

Keyword(s):

Machine Learning ◽

Chronic Kidney Disease ◽

Random Forest ◽

Kidney Disease ◽

Decision Tree ◽

Performance Metrics ◽

Weighted Average ◽

Gradient Boosting ◽

Support Vector ◽

The Individual

Chronic Kidney Disease (CKD) is a worldwide concern that influences roughly 10% of the grown-up population on the world. For most of the people the early diagnosis of CKD is often not possible. Therefore, the utilization of present-day Computer aided supported strategies is important to help the conventional CKD finding framework to be progressively effective and precise. In this project, six modern machine learning techniques namely Multilayer Perceptron Neural Network, Support Vector Machine, Naïve Bayes, K-Nearest Neighbor, Decision Tree, Logistic regression were used and then to enhance the performance of the model Ensemble Algorithms such as ADABoost, Gradient Boosting, Random Forest, Majority Voting, Bagging and Weighted Average were used on the Chronic Kidney Disease dataset from the UCI Repository. The model was tuned finely to get the best hyper parameters to train the model. The performance metrics used to evaluate the model was measured using Accuracy, Precision, Recall, F1-score, Mathew`s Correlation Coefficient and ROC-AUC curve. The experiment was first performed on the individual classifiers and then on the Ensemble classifiers. The ensemble classifier like Random Forest and ADABoost performed better with 100% Accuracy, Precision and Recall when compared to the individual classifiers with 99.16% accuracy, 98.8% Precision and 100% Recall obtained from Decision Tree Algorithm

Download Full-text

Detecting Fake News Tweets from Twitter

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/08428 ◽

2021 ◽

Vol 23 (08) ◽

pp. 532-537

Author(s):

Cherlakola Abhinav Reddy ◽

◽

Sai Nitesh Gadiraju ◽

Dr. Samala Nagaraj ◽

◽

...

Keyword(s):

Support Vector Machines ◽

Random Forest ◽

Decision Tree ◽

Random Forest Classifier ◽

Support Vector ◽

Online Media ◽

Fake News ◽

Breaking News ◽

Learning Techniques ◽

Vector Machines

Online media has progressively obtained integral to the route billions of individuals experience news and occasions, frequently bypassing writers—the conventional guardians of breaking news. Occasions,in reality, make a relating spike of posts (tweets) on Twitter. This projects a great deal of significance on the validity of data found via online media stages like Twitter. We have utilized different managed learning techniques like Naïve Bayes, Decision Trees, and Support Vector Machines on the information to separate tweets among genuine and counterfeit news. For our AI models, we have utilized tweet and client highlights as our indicators. We accomplished a precision of 88% utilizing the Random Forest classifier and 88% utilizing the Decision tree. Notwithstanding, we accept that breaking down client records would build the accuracy of our models.

Download Full-text