Machine Learning-Based Detection for Cyber Security Attacks on Connected and Autonomous Vehicles

Qiyi He; Xiaolin Meng; Rong Qu; Ruijie Xi

doi:10.3390/math8081311

Machine Learning-Based Detection for Cyber Security Attacks on Connected and Autonomous Vehicles

Mathematics ◽

10.3390/math8081311 ◽

2020 ◽

Vol 8 (8) ◽

pp. 1311

Author(s):

Qiyi He ◽

Xiaolin Meng ◽

Rong Qu ◽

Ruijie Xi

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Cyber Security ◽

Unified Modeling Language ◽

Attack Detection ◽

Machine Learning Algorithms ◽

Training Data ◽

Cyber Attack ◽

Tree Model ◽

Data Set

Connected and Autonomous Vehicle (CAV)-related initiatives have become some of the fastest expanding in recent years, and have started to affect the daily lives of people. More and more companies and research organizations have announced their initiatives, and some have started CAV road trials. Governments around the world have also introduced policies to support and accelerate the deployments of CAVs. Along these, issues such as CAV cyber security have become predominant, forming an essential part of the complications of CAV deployment. There is, however, no universally agreed upon or recognized framework for CAV cyber security. In this paper, following the UK CAV cyber security principles, we propose a UML (Unified Modeling Language)-based CAV cyber security framework, and based on which we classify the potential vulnerabilities of CAV systems. With this framework, a new CAV communication cyber-attack data set (named CAV-KDD) is generated based on the widely tested benchmark data set KDD99. This data set focuses on the communication-based CAV cyber-attacks. Two classification models are developed, using two machine learning algorithms, namely Decision Tree and Naive Bayes, based on the CAV-KDD training data set. The accuracy, precision and runtime of these two models when identifying each type of communication-based attacks are compared and analysed. It is found that the Decision Tree model requires a shorter runtime, and is more appropriate for CAV communication attack detection.

Download Full-text

Can Short and Partial Observations Reduce Model Error and Facilitate Machine Learning Prediction?

Entropy ◽

10.3390/e22101075 ◽

2020 ◽

Vol 22 (10) ◽

pp. 1075

Author(s):

Nan Chen

Keyword(s):

Machine Learning ◽

Model Error ◽

Machine Learning Algorithms ◽

Training Data ◽

Conditional Sampling ◽

Data Set ◽

Partial Observations ◽

Sampling Algorithm ◽

Highly Nonlinear ◽

Non Gaussian

Predicting complex nonlinear turbulent dynamical systems is an important and practical topic. However, due to the lack of a complete understanding of nature, the ubiquitous model error may greatly affect the prediction performance. Machine learning algorithms can overcome the model error, but they are often impeded by inadequate and partial observations in predicting nature. In this article, an efficient and dynamically consistent conditional sampling algorithm is developed, which incorporates the conditional path-wise temporal dependence into a two-step forward-backward data assimilation procedure to sample multiple distinct nonlinear time series conditioned on short and partial observations using an imperfect model. The resulting sampled trajectories succeed in reducing the model error and greatly enrich the training data set for machine learning forecasts. For a rich class of nonlinear and non-Gaussian systems, the conditional sampling is carried out by solving a simple stochastic differential equation, which is computationally efficient and accurate. The sampling algorithm is applied to create massive training data of multiscale compressible shallow water flows from highly nonlinear and indirect observations. The resulting machine learning prediction significantly outweighs the imperfect model forecast. The sampling algorithm also facilitates the machine learning forecast of a highly non-Gaussian climate phenomenon using extremely short observations.

Download Full-text

Identification of Leukemia Subtypes from Microscopic Images Using Convolutional Neural Network

Diagnostics ◽

10.3390/diagnostics9030104 ◽

2019 ◽

Vol 9 (3) ◽

pp. 104 ◽

Cited By ~ 11

Author(s):

Ahmed ◽

Yigit ◽

Isik ◽

Alpkocak

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Leukemia Data

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.

Download Full-text

Future Prediction of Diabetics using XG Booster Classifiers

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5144.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2128-2132

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

The Body ◽

Machine Learning Algorithms ◽

Support Vector ◽

Common Disease ◽

Data Set ◽

Glucose Content

Diabetes is a most common disease that occurs to most of the humans now a day. The predictions for this disease are proposed through machine learning techniques. Through this method the risk factors of this disease are identified and can be prevented from increasing. Early prediction in such disease can be controlled and save human’s life. For the early predictions of this disease we collect data set having 8 attributes diabetic of 200 patients. The patients’ sugar level in the body is tested by the features of patient’s glucose content in the body and according to the age. The main Machine learning algorithms are Support vector machine (SVM), naive bayes (NB), K nearest neighbor (KNN) and Decision Tree (DT). In the exiting the Naive Bayes the accuracy levels are 66% but in the Decision tree the accuracy levels are 70 to 71%. The accuracy levels of the patients are not proper in range. But in XG boost classifiers even after the Naïve Bayes 74 Percentage and in Decision tree the accuracy levels are 89 to 90%. In the proposed system the accuracy ranges are shown properly and this is only used mostly. A dataset of 729 patients can be stored in Mongo DB and in that 129 patients repots are taken for the prediction purpose and the remaining are used for training. The training datasets are used for the prediction purposes.

Download Full-text

Rotor Unbalance Kind and Severity Identification by Current Signature Analysis with Adaptative Update to Multiclass Machine Learning Algorithms

Studies in Engineering and Technology ◽

10.11114/set.v8i1.5213 ◽

2021 ◽

Vol 8 (1) ◽

pp. 28

Author(s):

S. L. Ávila ◽

H. M. Schaberle ◽

S. Youssef ◽

F. S. Pacheco ◽

C. A. Penz

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Signature Analysis ◽

Data Set ◽

Learning Techniques ◽

Environmental Variations ◽

Current Signature

The health of a rotating electric machine can be evaluated by monitoring electrical and mechanical parameters. As more information is available, it easier can become the diagnosis of the machine operational condition. We built a laboratory test bench to study rotor unbalance issues according to ISO standards. Using the electric stator current harmonic analysis, this paper presents a comparison study among Support-Vector Machines, Decision Tree classifies, and One-vs-One strategy to identify rotor unbalance kind and severity problem – a nonlinear multiclass task. Moreover, we propose a methodology to update the classifier for dealing better with changes produced by environmental variations and natural machinery usage. The adaptative update means to update the training data set with an amount of recent data, saving the entire original historical data. It is relevant for engineering maintenance. Our results show that the current signature analysis is appropriate to identify the type and severity of the rotor unbalance problem. Moreover, we show that machine learning techniques can be effective for an industrial application.

Download Full-text

An Efficient Classifier for U2R, R2L, DoS Attack

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a1942.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 644-647

Keyword(s):

Machine Learning ◽

Network Security ◽

Learning Algorithms ◽

Research Area ◽

Attack Detection ◽

Machine Learning Algorithms ◽

The Internet ◽

Detection Accuracy ◽

Cyber Attack ◽

Detection Systems

The internet has become an irreplaceable communicating and informative tool in the current world. With the ever-growing importance and massive use of the internet today, there has been interesting from researchers to find the perfect Cyber Attack Detection Systems (CADSs) or rather referred to as Intrusion Detection Systems (IDSs) to protect against the vulnerabilities of network security. CADS presently exist in various variants but can be largely categorized into two broad classifications; signature-based detection and anomaly detection CADSs, based on their approaches to recognize attack packets.The signature-based CADS use the well-known signatures or fingerprints of the attack packets to signal the entry across the gateways of secured networks. Signature-based CADS can only recognize threats that use the known signature, new attacks with unknown signatures can, therefore, strike without notice. Alternatively, anomaly-based CADS are enabled to detect any abnormal traffic within the network and report. There are so many ways of identifying anomalies and different machine learning algorithms are introduced to counter such threats. Most systems, however, fall short of complete attack prevention in the real world due system administration and configuration, system complexity and abuse of authorized access. Several scholars and researchers have achieved a significant milestone in the development of CADS owing to the importance of computer and network security. This paper reviews the current trends of CADS analyzing the efficiency or level of detection accuracy of the machine learning algorithms for cyber-attack detection with an aim to point out to the best. CADS is a developing research area that continues to attract several researchers due to its critical objective.

Download Full-text

Crucial Role of Data Analytics in the Prevention and Detection of Cyber Security Attacks

Advances in Digital Crime, Forensics, and Cyber Terrorism - Confluence of AI, Machine, and Deep Learning in Cyber Forensics ◽

10.4018/978-1-7998-4900-1.ch004 ◽

2021 ◽

pp. 67-80

Author(s):

Charulatha B. S. ◽

A. Neela Madheswari ◽

Shanthi K. ◽

Chamundeswari Arumugam

Keyword(s):

Business Process ◽

Cyber Security ◽

Data Analytics ◽

User Behavior ◽

Relevant Information ◽

Cyber Attacks ◽

Social Needs ◽

Machine Learning Algorithms ◽

Cyber Attack ◽

Data Set

Data analytics plays a major role in retrieving relevant information in addition to avoiding unwanted data, missed values, good visualization and interpretation, decision making in any business, or social needs. Many organizations are affected by cyber-attacks in their business at a greater frequency when they get exposure to the internet. Cyber-attacks are plenty, and tracking them is really difficult work. The entry of cyber-attack may be through different events in the business process. Detecting the attack is laborious and collecting the data is still a hard task. The detection of the source of attack for the various events in the business process as well as the tracking the corresponding data needs an investigation procedure. This chapter concentrates on applying machine learning algorithms to study the user behavior in the process to detect network anomalies. The data from KDD'99 data set is collected and analyzed using decision tree, isolation forest, bagging classifier, and Adaboost classifier algorithms.

Download Full-text

Spatial–Temporal Analysis of Land Cover Change at the Bento Rodrigues Dam Disaster Area Using Machine Learning Techniques

Remote Sensing ◽

10.3390/rs11212548 ◽

2019 ◽

Vol 11 (21) ◽

pp. 2548

Author(s):

Dong Luo ◽

Douglas G. Goodin ◽

Marcellus M. Caldas

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Land Cover ◽

Decision Tree ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Disaster Area ◽

Mine Sites

Disasters are an unpredictable way to change land use and land cover. Improving the accuracy of mapping a disaster area at different time is an essential step to analyze the relationship between human activity and environment. The goals of this study were to test the performance of different processing procedures and examine the effect of adding normalized difference vegetation index (NDVI) as an additional classification feature for mapping land cover changes due to a disaster. Using Landsat ETM+ and OLI images of the Bento Rodrigues mine tailing disaster area, we created two datasets, one with six bands, and the other one with six bands plus the NDVI. We used support vector machine (SVM) and decision tree (DT) algorithms to build classifier models and validated models performance using 10-fold cross-validation, resulting in accuracies higher than 90%. The processed results indicated that the accuracy could reach or exceed 80%, and the support vector machine had a better performance than the decision tree. We also calculated each land cover type’s sensitivity (true positive rate) and found that Agriculture, Forest and Mine sites had higher values but Bareland and Water had lower values. Then, we visualized land cover maps in 2000 and 2017 and found out the Mine sites areas have been expanded about twice of the size, but Forest decreased 12.43%. Our findings showed that it is feasible to create a training data pool and use machine learning algorithms to classify a different year’s Landsat products and NDVI can improve the vegetation covered land classification. Furthermore, this approach can provide a venue to analyze land pattern change in a disaster area over time.

Download Full-text

Selección de tutores académicos en la educación superior usando árboles de decisión

REOP - Revista Española de Orientación y Psicopedagogía ◽

10.5944/reop.vol.29.num.1.2018.23297 ◽

2018 ◽

Vol 29 (1) ◽

pp. 108

Author(s):

Argelia B. Urbina Nájera ◽

Jorge De la Calleja

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Learning Community ◽

Machine Learning Algorithms ◽

Voluntary Participation ◽

Educación Superior ◽

Decision Tree Algorithm ◽

Data Set ◽

Educacion Superior ◽

Real World Problems

RESUMEN En este documento se presenta un método para mejorar el proceso de tutoría académica en la educación superior. El método incluye a identificación de las habilidades principales de los tutores de forma automática utilizando el algoritmo árboles de decisión, uno de los algoritmos más utilizados en la comunidad de aprendizaje automático para resolver problemas del mundo real con gran precisión. En el estudio, el algoritmo arboles de decisión fue capaz de identificar las habilidades y afinidades entre estudiantes y tutores. Los experimentos se llevaron a cabo utilizando un conjunto de datos de 277 estudiantes y 19 tutores, mismos que fueron seleccionados por muestreo aleatorio simple y participación voluntaria en el caso de los tutores. Los resultados preliminares muestran que los atributos más importantes para los tutores son la comunicación, la autodirección y las habilidades digitales. Al mismo tiempo, se presenta un proceso de tutoría en el que la asignación del tutor se basa en estos atributos, asumiendo que puede ayudar a fortalecer las habilidades de los estudiantes que demanda la sociedad actual. De la misma forma, el árbol de decisión obtenido se puede utilizar para agrupar a tutores y estudiantes basados en sus habilidades y afinidades personales utilizando otros algoritmos de aprendizaje automático. La aplicación del proceso de tutoría sugerido podría dar la pauta para ver el proceso de tutoría de manera individual sin vincularla a procesos de desempeño académico o deserción escolar.ABSTRACTIn this paper, we present a method for the tutoring process in order to improve academic tutoring in higher education. The method includes identifying the main skills of tutors in an automated manner using decision trees, one of the most used algorithms in the machine learning community for solving several real-world problems with high accuracy. In our study, the decision tree algorithm was able to identify those skills and personal affinities between students and tutors. Experiments were carried out using a data set of 277 students and 19 tutors, which were selected by random sampling and voluntary participation, respectively. Preliminary results show that the most important attributes for tutors are communication, self-direction and digital skills. At the same time, we introduce a tutoring process where the tutor assignment is based on these attributes, assuming that it can help to strengthen the student's skills demanded by today's society. In the same way, the decision tree obtained can be used to create cluster of tutors and clusters of students based on their personal abilities and affinities using other machine learning algorithms. The application of the suggested tutoring process could set the tone to see the tutoring process individually without linking it to processes of academic performance or school dropout.

Download Full-text

Performance of Machine Learning Algorithms and Diversity in Data

MATEC Web of Conferences ◽

10.1051/matecconf/201821004019 ◽

2018 ◽

Vol 210 ◽

pp. 04019 ◽

Cited By ~ 1

Author(s):

Hyontai SUG

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Real World ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Real World Data ◽

Random Data ◽

Data Set ◽

World Data

Recent world events in go games between human and artificial intelligence called AlphaGo showed the big advancement in machine learning technologies. While AlphaGo was trained using real world data, AlphaGo Zero was trained using massive random data, and the fact that AlphaGo Zero won AlphaGo completely revealed that diversity and size in training data is important for better performance for the machine learning algorithms, especially in deep learning algorithms of neural networks. On the other hand, artificial neural networks and decision trees are widely accepted machine learning algorithms because of their robustness in errors and comprehensibility respectively. In this paper in order to prove that diversity and size in data are important factors for better performance of machine learning algorithms empirically, the two representative algorithms are used for experiment. A real world data set called breast tissue was chosen, because the data set consists of real numbers that is very good property for artificial random data generation. The result of the experiment proved the fact that the diversity and size of data are very important factors for better performance.

Download Full-text

Deep Neural Network for Multi-Class Prediction of Student Performance in Educational Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2155.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 5073-5081

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Student Performance ◽

Activation Function ◽

Machine Learning Algorithms ◽

Training Data ◽

Fine Tuning ◽

Academic Excellence ◽

Data Set

Prediction of student performance is the significant part in processing the educational data. Machine learning algorithms are leading the role in this process. Deep learning is one of the important concepts of machine learning algorithm. In this paper, we applied the deep learning technique for prediction of the academic excellence of the students using R Programming. Keras and Tensorflow libraries utilized for making the model using neural network on the Kaggle dataset. The data is separated into testing data training data set. Plot the neural network model using neuralnet method and created the Deep Learning model using two hidden layers using ReLu activation function and one output layer using softmax activation function. After fine tuning process until the stable changes; this model produced accuracy as 85%.

Download Full-text