Relevant SMS Spam Feature Selection Using Wrapper Approach and XGBoost Algorithm

Diyari Jalal Mussa; Noor Ghazi M. Jameel

doi:10.24017/science.2019.2.11

Relevant SMS Spam Feature Selection Using Wrapper Approach and XGBoost Algorithm

Kurdistan Journal of Applied Research ◽

10.24017/science.2019.2.11 ◽

2019 ◽

Vol 4 (2) ◽

pp. 110-120

Author(s):

Diyari Jalal Mussa ◽

Noor Ghazi M. Jameel

Keyword(s):

Feature Selection ◽

Structural Features ◽

Selection Method ◽

Web Pages ◽

Good Representation ◽

Forward Selection ◽

Wrapper Approach ◽

And Performance ◽

Sequential Forward Selection ◽

Selection Algorithms

In recent years with the widely usage of mobile devices, the problem of SMS Spam increased dramatically. Receiving those undesired messages continuously can cause frustration to users. And sometimes it can be harmful, by sending SMS messages containing fake web pages in order to steal users’ confidential information. Besides spasm number of hazardous actions, there is a limited number of spam filtering software. According to this paper, XGBoost algorithm used for handling SMS spam detection problem. Number of structural features was collected from previous studies. 15 structural features were extracted from Tiago’s dataset, which is the most frequently used dataset by researchers. For selecting the optimal relevant features, two different types of wrapper feature selection algorithms were used in order to reduce and select best relevant features. The accuracy and performance obtained by the selected features via sequential backward selection method was better comparing to sequential forward selection method. The extracted nine optimal features can be a good representation of a spam SMS message. Additionally, the classification accuracy obtained by the proposed method using nine optimal features with XGBoost algorithm is 98.64 using 10-fold cross validation.

Download Full-text

A Correlation - Sequential Forward Selection Based Feature Selection Method for Healthcare Data Analysis

2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON) ◽

10.1109/gucon48875.2020.9231205 ◽

2020 ◽

Author(s):

Priyanka Saha ◽

Srabani Patikar ◽

Sarmistha Neogy

Keyword(s):

Feature Selection ◽

Data Analysis ◽

Feature Selection Method ◽

Selection Method ◽

Forward Selection ◽

Healthcare Data ◽

Sequential Forward Selection

Download Full-text

Feature Selection and Analysis EEG Signals with Sequential Forward Selection Algorithm and Different Classifiers

2020 28th Signal Processing and Communications Applications Conference (SIU) ◽

10.1109/siu49456.2020.9302482 ◽

2020 ◽

Author(s):

Sule Bekiryazici ◽

Ahmet Demir ◽

Gunes Yilmaz

Keyword(s):

Feature Selection ◽

Selection Algorithm ◽

Forward Selection ◽

Eeg Signals ◽

Sequential Forward Selection

Download Full-text

A HYBRID FEATURE SELECTION METHOD FOR TEXT CATEGORIZATION

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488507004492 ◽

2007 ◽

Vol 15 (02) ◽

pp. 133-151 ◽

Cited By ~ 2

Author(s):

E. MONTAÑÉS ◽

J. R. QUEVEDO ◽

E. F. COMBARRO ◽

I. DÍAZ ◽

J. RANILLA

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Hybrid Approach ◽

Feature Selection Method ◽

Selection Method ◽

Fast Method ◽

Evaluation Function ◽

Wrapper Approach ◽

Wrapper Method ◽

Filtering Approach

Feature Selection is an important task within Text Categorization, where irrelevant or noisy features are usually present, causing a lost in the performance of the classifiers. Feature Selection in Text Categorization has usually been performed using a filtering approach based on selecting the features with highest score according to certain measures. Measures of this kind come from the Information Retrieval, Information Theory and Machine Learning fields. However, wrapper approaches are known to perform better in Feature Selection than filtering approaches, although they are time-consuming and sometimes infeasible, especially in text domains. However a wrapper that explores a reduced number of feature subsets and that uses a fast method as evaluation function could overcome these difficulties. The wrapper presented in this paper satisfies these properties. Since exploring a reduced number of subsets could result in less promising subsets, a hybrid approach, that combines the wrapper method and some scoring measures, allows to explore more promising feature subsets. A comparison among some scoring measures, the wrapper method and the hybrid approach is performed. The results reveal that the hybrid approach outperforms both the wrapper approach and the scoring measures, particularly for corpora whose features are less scattered over the categories.

Download Full-text

Generalized Sequential Forward Selection Method for Channel Selection in EEG Signals for Classification of Left or Right Hand Movement in BCI

2019 9th International Conference on Computer and Knowledge Engineering (ICCKE) ◽

10.1109/iccke48569.2019.8965159 ◽

2019 ◽

Cited By ~ 2

Author(s):

Moein Radman ◽

Ali Chaibakhsh ◽

Nader Nariman-zadeh ◽

Huiguang He

Keyword(s):

Hand Movement ◽

Channel Selection ◽

Selection Method ◽

Forward Selection ◽

Eeg Signals ◽

Right Hand ◽

Sequential Forward Selection

Download Full-text

A Pedestrian Recognition and Locating Method under the Condition of Surveillance

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.602-605.1666 ◽

2014 ◽

Vol 602-605 ◽

pp. 1666-1669

Author(s):

Xiao Qing Wu ◽

Xiang Long ◽

Xiong Yang

Keyword(s):

Neural Network ◽

Edge Detection ◽

Detection Method ◽

Image Sequence ◽

Selection Method ◽

Forward Selection ◽

Scatter Matrix ◽

Som Neural Network ◽

Edge Detection Method ◽

Sequential Forward Selection

In our previous work, we proposed a motion edge detection method to extract the contour of the pedestrian in an image sequence. In order to locate and recognize the pedestrian in an image after its contour was extracted, we propose the BLBP method to describe the binary texture of the contour of the pedestrian in the image, and use the BLBP histogram to get the recognition feature of the pedestrian. And then we use the scatter matrix and the sequential forward selection method to select useful features, and use the SOM neural network to perform the recognition work. At the last part of this paper, some results of our experiments are illustrated there, which shows that our method is satisfactory.

Download Full-text

Feature selection using Sequential Forward Selection and classification applying Artificial Metaplasticity Neural Network

IECON 2010 - 36th Annual Conference on IEEE Industrial Electronics Society ◽

10.1109/iecon.2010.5675075 ◽

2010 ◽

Cited By ~ 44

Author(s):

A. Marcano-Cedeno ◽

J. Quintanilla-Dominguez ◽

M. G. Cortina-Januchs ◽

D. Andina

Keyword(s):

Neural Network ◽

Feature Selection ◽

Forward Selection ◽

Sequential Forward Selection

Download Full-text

Feature Selection with Sequential Forward Selection Algorithm from Emotion Estimation based on EEG Signals

Sakarya University Journal of Science ◽

10.16984/saufenbilder.501799 ◽

2019 ◽

pp. 1096-1105 ◽

Cited By ~ 1

Author(s):

Talha Burak ALAKUŞ ◽

İbrahim TÜRKOĞLU

Keyword(s):

Feature Selection ◽

Selection Algorithm ◽

Forward Selection ◽

Eeg Signals ◽

Sequential Forward Selection ◽

Emotion Estimation

Download Full-text

Diagnosis Of Heart Disease Using K-Nearest Neighbor Method Based On Forward Selection

Journal of Applied Intelligent System ◽

10.33633/jais.v4i2.2749 ◽

2020 ◽

Vol 4 (2) ◽

pp. 39-47

Author(s):

Junta Zeniarja ◽

Anisatawalanita Ukhifahdhina ◽

Abu Salam

Keyword(s):

Data Mining ◽

Feature Selection ◽

Heart Disease ◽

Early Diagnosis ◽

Nearest Neighbor ◽

Selection Method ◽

K Nearest Neighbor ◽

Forward Selection ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Heart is one of the essential organs that assume a significant part in the human body. However, heart can also cause diseases that affect the death. World Health Organization (WHO) data from 2012 showed that all deaths from cardiovascular disease (vascular) 7.4 million (42.3%) were caused by heart disease. Increased cases of heart disease require a step as an early prevention and prevention efforts by making early diagnosis of heart disease. In this research will be done early diagnosis of heart disease by using data mining process in the form of classification. The algorithm used is K-Nearest Neighbor algorithm with Forward Selection method. The K-Nearest Neighbor algorithm is used for classification in order to obtain a decision result from the diagnosis of heart disease, while the forward selection is used as a feature selection whose purpose is to increase the accuracy value. Forward selection works by removing some attributes that are irrelevant to the classification process. In this research the result of accuracy of heart disease diagnosis with K-Nearest Neighbor algorithm is 73,44%, while result of K-Nearest Neighbor algorithm accuracy with feature selection method 78,66%. It is clear that the incorporation of the K-Nearest Neighbor algorithm with the forward selection method has improved the accuracy result. Keywords - K-Nearest Neighbor, Classification, Heart Disease, Forward Selection, Data Mining

Download Full-text

Feature Selection Method Based on Feature’s Classification Bias and Performance

Algorithms and Architectures for Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-27122-4_16 ◽

2015 ◽

pp. 227-240

Author(s):

Jun Wang ◽

Jinmao Wei ◽

Lu Zhang

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

And Performance

Download Full-text

Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/5529389 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yaoxin Wang ◽

Yingjie Xu ◽

Zhenyu Yang ◽

Xiaoqing Liu ◽

Qi Dai

Keyword(s):

Feature Selection ◽

Random Forest ◽

Feature Selection Method ◽

Structural Features ◽

Selection Method ◽

Prediction Methods ◽

Information Redundancy ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class

Many combinations of protein features are used to improve protein structural class prediction, but the information redundancy is often ignored. In order to select the important features with strong classification ability, we proposed a recursive feature selection with random forest to improve protein structural class prediction. We evaluated the proposed method with four experiments and compared it with the available competing prediction methods. The results indicate that the proposed feature selection method effectively improves the efficiency of protein structural class prediction. Only less than 5% features are used, but the prediction accuracy is improved by 4.6-13.3%. We further compared different protein features and found that the predicted secondary structural features achieve the best performance. This understanding can be used to design more powerful prediction methods for the protein structural class.

Download Full-text