Analysis of flight delays in aviation system using different classification algorithms and feature selection methods

A. B. A. Anderson; A. J. Sanjeev Kumar; A. B. Arockia Christopher

doi:10.1017/aer.2019.72

Analysis of flight delays in aviation system using different classification algorithms and feature selection methods

The Aeronautical Journal ◽

10.1017/aer.2019.72 ◽

2019 ◽

Vol 123 (1267) ◽

pp. 1415-1436 ◽

Cited By ~ 1

Author(s):

A. B. A. Anderson ◽

A. J. Sanjeev Kumar ◽

A. B. Arockia Christopher

Keyword(s):

Data Mining ◽

Feature Selection ◽

Classification Model ◽

System Level ◽

Support Vector ◽

Flight Delays ◽

Data Mining Techniques ◽

Mining Methods ◽

Artificial Neural Network Ann ◽

Aircraft System

ABSTRACTData mining is a process of finding correlations and collecting and analysing a huge amount of data in a database to discover patterns or relationships. Flight delay creates significant problems in the present aviation system. Data mining techniques are desired for analysing the performance in which micro-level causes propagate to make system-level patterns of delay. Analysing flight delays is very difficult – both when looking from a historical view as well as when estimating delays with forecast demand. This paper proposes using Decision Tree (DT), Support Vector Machine (SVM), Naive Bayesian (NB), K-nearest neighbour (KNN) and Artificial Neural Network (ANN) to study and analyse delays among aircrafts. The performance of different data mining methods is found in the different regions of the updated datasets on these classifiers. Finally, the result shows a significant variation in the performance of different data mining methods and feature selection for this problem. This paper aims to deal with how data mining techniques can be used to understand difficult aircraft system delays in aviation. Our aim is to develop a classification model for studying and reducing delay using different data mining methods and, in this manner, to show that DT has a greater classification accuracy. The different feature selectors are used in this study in order to reduce the number of initial attributes. Our results clearly demonstrate the value of DT for analysing and visualising how system-level effects happen from subsystem-level causes.

Download Full-text

Email Worm Detection Using Data Mining

Techniques and Applications for Advanced Information Privacy and Security ◽

10.4018/978-1-60566-210-7.ch002 ◽

2011 ◽

pp. 20-34

Author(s):

Mohammad M. Masud ◽

Latifur Khan ◽

Bhavani Thuraisingham

Keyword(s):

Data Mining ◽

Feature Selection ◽

Principal Component ◽

Classification Model ◽

Support Vector ◽

Two Phase ◽

Feature Selection Technique ◽

Worm Detection ◽

Phase Selection ◽

Using Data

This chapter applies data mining techniques to detect email worms. Email messages contain a number of different features such as the total number of words in message body/subject, presence/absence of binary attachments, type of attachments, and so on. The goal is to obtain an efficient classification model based on these features. The solution consists of several steps. First, the number of features is reduced using two different approaches: feature-selection and dimension-reduction. This step is necessary to reduce noise and redundancy from the data. The feature-selection technique is called Two-phase Selection (TPS), which is a novel combination of decision tree and greedy selection algorithm. The dimensionreduction is performed by Principal Component Analysis. Second, the reduced data is used to train a classifier. Different classification techniques have been used, such as Support Vector Machine (SVM), Naïve Bayes and their combination. Finally, the trained classifiers are tested on a dataset containing both known and unknown types of worms. These results have been compared with published results. It is found that the proposed TPS selection along with SVM classification achieves the best accuracy in detecting both known and unknown types of worms.

Download Full-text

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text

The Spinning Quality Control Management Based on Decision Making by Data Mining Techniques

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v7i1.25 ◽

2018 ◽

Vol 7 (1) ◽

pp. 72

Author(s):

Khalid AA Abakar ◽

Chongwen Yu

Keyword(s):

Data Mining ◽

Kernel Functions ◽

Support Vector ◽

Ann Model ◽

Data Mining Techniques ◽

Yarn Quality ◽

Yarn Properties ◽

Svm Model ◽

Rbf Kernel

This work demonstrated the possibility of using the data mining techniques such as artificial neural networks (ANN) and support vector machine (SVM) based model to predict the quality of the spinning yarn parameters. Three different kernel functions were used as SVM kernel functions which are Polynomial and Radial Basis Function (RBF) and Pearson VII Function-based Universal Kernel (PUK) and ANN model were used as data mining techniques to predict yarn properties. In this paper, it was found that the SVM model based on Person VII kernel function (PUK) have the same performance in prediction of spinning yarn quality in comparison with SVM based RBF kernel. The comparison with the ANN model showed that the two SVM models give a better prediction performance than an ANN model.

Download Full-text

A Comparative Study on Heart Disease Prediction Using Data Mining Techniques and Feature Selection

2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST) ◽

10.1109/icrest51555.2021.9331158 ◽

2021 ◽

Author(s):

Farzana Tasnim ◽

Sultana Umme Habiba

Keyword(s):

Data Mining ◽

Feature Selection ◽

Heart Disease ◽

Comparative Study ◽

Disease Prediction ◽

Data Mining Techniques ◽

Using Data

Download Full-text

A Hybrid Swarm and Gravitation-based feature selection algorithm for handwritten Indic script classification problem

Complex & Intelligent Systems ◽

10.1007/s40747-020-00237-1 ◽

2021 ◽

Author(s):

Ritam Guha ◽

Manosij Ghosh ◽

Pawan Kumar Singh ◽

Ram Sarkar ◽

Mita Nasipuri

Keyword(s):

Feature Selection ◽

Character Recognition ◽

Optical Character Recognition ◽

Classification Problem ◽

Classification Model ◽

Support Vector ◽

Intermediate Step ◽

Hybrid Swarm ◽

Feature Vectors ◽

Indic Script

AbstractIn any multi-script environment, handwritten script classification is an unavoidable pre-requisite before the document images are fed to their respective Optical Character Recognition (OCR) engines. Over the years, this complex pattern classification problem has been solved by researchers proposing various feature vectors mostly having large dimensions, thereby increasing the computation complexity of the whole classification model. Feature Selection (FS) can serve as an intermediate step to reduce the size of the feature vectors by restricting them only to the essential and relevant features. In the present work, we have addressed this issue by introducing a new FS algorithm, called Hybrid Swarm and Gravitation-based FS (HSGFS). This algorithm has been applied over three feature vectors introduced in the literature recently—Distance-Hough Transform (DHT), Histogram of Oriented Gradients (HOG), and Modified log-Gabor (MLG) filter Transform. Three state-of-the-art classifiers, namely, Multi-Layer Perceptron (MLP), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM), are used to evaluate the optimal subset of features generated by the proposed FS model. Handwritten datasets at block, text line, and word level, consisting of officially recognized 12 Indic scripts, are prepared for experimentation. An average improvement in the range of 2–5% is achieved in the classification accuracy by utilizing only about 75–80% of the original feature vectors on all three datasets. The proposed method also shows better performance when compared to some popularly used FS models. The codes used for implementing HSGFS can be found in the following Github link: https://github.com/Ritam-Guha/HSGFS.

Download Full-text

CLASSIFICATION OF HIGH-DIMENSIONAL MICROARRAY DATA WITH A TWO-STEP PROCEDURE VIA A WILCOXON CRITERION AND MULTILAYER PERCEPTRON

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026811002969 ◽

2011 ◽

Vol 10 (01) ◽

pp. 1-14

Author(s):

VLADIMIR NIKULIN ◽

TIAN-HSIANG HUANG ◽

GEOFFREY J. MCLACHLAN

Keyword(s):

Data Mining ◽

Feature Selection ◽

High Dimensional ◽

Second Step ◽

Support Vector ◽

Step Procedure ◽

Leave One Out ◽

Natural Combination ◽

Feature Selection Techniques

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.

Download Full-text

A data mining approach to the diagnosis of failure modes for two serial fastened sandwich composite plates

Journal of Composite Materials ◽

10.1177/0021998316679720 ◽

2016 ◽

Vol 51 (20) ◽

pp. 2853-2862 ◽

Cited By ~ 2

Author(s):

Serkan Ballı

Keyword(s):

Data Mining ◽

Random Forest ◽

Failure Modes ◽

Composite Plates ◽

Study Data ◽

Sandwich Composite ◽

Support Vector ◽

Geometrical Parameters ◽

Mining Methods

The aim of this study is to diagnose and classify the failure modes for two serial fastened sandwich composite plates using data mining techniques. The composite material used in the study was manufactured using glass fiber reinforced layer and aluminum sheets. Obtained results of previous experimental study for sandwich composite plates, which were mechanically fastened with two serial pins or bolts were used for classification of failure modes. Furthermore, experimental data from previous study consists of different geometrical parameters for various applied preload moments as 0 (pinned), 2, 3, 4, and 5 Nm (bolted). In this study, data mining methods were applied by using these geometrical parameters and pinned/bolted joint configurations. Therefore, three geometrical parameters and 100 test data were used for classification by utilizing support vector machine, Naive Bayes, K-Nearest Neighbors, Logistic Regression, and Random Forest methods. According to experiments, Random Forest method achieved better results than others and it was appropriate for diagnosing and classification of the failure modes. Performances of all data mining methods used were discussed in terms of accuracy and error ratios.

Download Full-text

Detection of financial statement fraud and feature selection using data mining techniques

Decision Support Systems ◽

10.1016/j.dss.2010.11.006 ◽

2011 ◽

Vol 50 (2) ◽

pp. 491-500 ◽

Cited By ~ 174

Author(s):

P. Ravisankar ◽

V. Ravi ◽

G. Raghava Rao ◽

I. Bose

Keyword(s):

Data Mining ◽

Feature Selection ◽

Financial Statement ◽

Financial Statement Fraud ◽

Data Mining Techniques ◽

Using Data

Download Full-text

Hybrid classification model to detect advanced intrusions using data mining techniques

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.4.10031 ◽

2018 ◽

Vol 7 (2.4) ◽

pp. 10

Author(s):

V Mala ◽

K Meena

Keyword(s):

Data Mining ◽

Learning Algorithm ◽

Detection System ◽

Classification Model ◽

Detection Methods ◽

Data Mining Techniques ◽

Detection Systems ◽

Intruder Detection ◽

Hybrid Classification ◽

Using Data

Traditional signature based approach fails in detecting advanced malwares like stuxnet, flame, duqu etc. Signature based comparison and correlation are not up to the mark in detecting such attacks. Hence, there is crucial to detect these kinds of attacks as early as possible. In this research, a novel data mining based approach were applied to detect such attacks. The main innovation lies on Misuse signature detection systems based on supervised learning algorithm. In learning phase, labeled examples of network packets systems calls are (gave) provided, on or after which algorithm can learn about the attack which is fast and reliable to known. In order to detect advanced attacks, unsupervised learning methodologies were employed to detect the presence of zero day/ new attacks. The main objective is to review, different intruder detection methods. To study the role of Data Mining techniques used in intruder detection system. Hybrid –classification model is utilized to detect advanced attacks.

Download Full-text

A Survey on Phishing Detection and The Importance of Feature Selection In Data Mining Classification Algorithms

Issue 4 - Journal of Science and Technology ◽

10.46243/jst.2020.v5.i6.pp11-18 ◽

2020 ◽

pp. 11-18

Keyword(s):

Data Mining ◽

Feature Selection ◽

Support Vector ◽

Classification Algorithms ◽

End User ◽

Preparation Methods ◽

Survey Paper ◽

Vector Machines ◽

Feature Selection Techniques ◽

Phishing Detection

: In this era of Internet, the issue of security of information is at its peak. One of the main threats in this cyber world is phishing attacks which is an email or website fraud method that targets the genuine webpage or an email and hacks it without the consent of the end user. There are various techniques which help to classify whether the website or an email is legitimate or fake. The major contributors in the process of detection of these phishing frauds include the classification algorithms, feature selection techniques or dataset preparation methods and the feature extraction that plays an important role in detection as well as in prevention of these attacks. This Survey Paper studies the effect of all these contributors and the approaches that are applied in the study conducted on the recent papers. Some of the classification algorithms that are implemented includes Decision tree, Random Forest , Support Vector Machines, Logistic Regression , Lazy K Star, Naive Bayes and J48 etc.

Download Full-text