scholarly journals Relevant SMS Spam Feature Selection Using Wrapper Approach and XGBoost Algorithm

2019 ◽  
Vol 4 (2) ◽  
pp. 110-120
Author(s):  
Diyari Jalal Mussa ◽  
Noor Ghazi M. Jameel

In recent years with the widely usage of mobile devices, the problem of SMS Spam increased dramatically. Receiving those undesired messages continuously can cause frustration to users. And sometimes it can be harmful, by sending SMS messages containing fake web pages in order to steal users’ confidential information. Besides spasm number of hazardous actions, there is a limited number of spam filtering software. According to this paper, XGBoost algorithm used for handling SMS spam detection problem. Number of structural features was collected from previous studies. 15 structural features were extracted from Tiago’s dataset, which is the most frequently used dataset by researchers. For selecting the optimal relevant features, two different types of wrapper feature selection algorithms were used in order to reduce and select best relevant features. The accuracy and performance obtained by the selected features via sequential backward selection method was better comparing to sequential forward selection method. The extracted nine optimal features can be a good representation of a spam SMS message. Additionally, the classification accuracy obtained by the proposed method using nine optimal features with XGBoost algorithm is 98.64 using 10-fold cross validation.

Author(s):  
E. MONTAÑÉS ◽  
J. R. QUEVEDO ◽  
E. F. COMBARRO ◽  
I. DÍAZ ◽  
J. RANILLA

Feature Selection is an important task within Text Categorization, where irrelevant or noisy features are usually present, causing a lost in the performance of the classifiers. Feature Selection in Text Categorization has usually been performed using a filtering approach based on selecting the features with highest score according to certain measures. Measures of this kind come from the Information Retrieval, Information Theory and Machine Learning fields. However, wrapper approaches are known to perform better in Feature Selection than filtering approaches, although they are time-consuming and sometimes infeasible, especially in text domains. However a wrapper that explores a reduced number of feature subsets and that uses a fast method as evaluation function could overcome these difficulties. The wrapper presented in this paper satisfies these properties. Since exploring a reduced number of subsets could result in less promising subsets, a hybrid approach, that combines the wrapper method and some scoring measures, allows to explore more promising feature subsets. A comparison among some scoring measures, the wrapper method and the hybrid approach is performed. The results reveal that the hybrid approach outperforms both the wrapper approach and the scoring measures, particularly for corpora whose features are less scattered over the categories.


2014 ◽  
Vol 602-605 ◽  
pp. 1666-1669
Author(s):  
Xiao Qing Wu ◽  
Xiang Long ◽  
Xiong Yang

In our previous work, we proposed a motion edge detection method to extract the contour of the pedestrian in an image sequence. In order to locate and recognize the pedestrian in an image after its contour was extracted, we propose the BLBP method to describe the binary texture of the contour of the pedestrian in the image, and use the BLBP histogram to get the recognition feature of the pedestrian. And then we use the scatter matrix and the sequential forward selection method to select useful features, and use the SOM neural network to perform the recognition work. At the last part of this paper, some results of our experiments are illustrated there, which shows that our method is satisfactory.


2020 ◽  
Vol 4 (2) ◽  
pp. 39-47
Author(s):  
Junta Zeniarja ◽  
Anisatawalanita Ukhifahdhina ◽  
Abu Salam

Heart is one of the essential organs that assume a significant part in the human body. However, heart can also cause diseases that affect the death. World Health Organization (WHO) data from 2012 showed that all deaths from cardiovascular disease (vascular) 7.4 million (42.3%) were caused by heart disease. Increased cases of heart disease require a step as an early prevention and prevention efforts by making early diagnosis of heart disease. In this research will be done early diagnosis of heart disease by using data mining process in the form of classification. The algorithm used is K-Nearest Neighbor algorithm with Forward Selection method. The K-Nearest Neighbor algorithm is used for classification in order to obtain a decision result from the diagnosis of heart disease, while the forward selection is used as a feature selection whose purpose is to increase the accuracy value. Forward selection works by removing some attributes that are irrelevant to the classification process. In this research the result of accuracy of heart disease diagnosis with K-Nearest Neighbor algorithm is 73,44%, while result of K-Nearest Neighbor algorithm accuracy with feature selection method 78,66%. It is clear that the incorporation of the K-Nearest Neighbor algorithm with the forward selection method has improved the accuracy result. Keywords - K-Nearest Neighbor, Classification, Heart Disease, Forward Selection, Data Mining


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yaoxin Wang ◽  
Yingjie Xu ◽  
Zhenyu Yang ◽  
Xiaoqing Liu ◽  
Qi Dai

Many combinations of protein features are used to improve protein structural class prediction, but the information redundancy is often ignored. In order to select the important features with strong classification ability, we proposed a recursive feature selection with random forest to improve protein structural class prediction. We evaluated the proposed method with four experiments and compared it with the available competing prediction methods. The results indicate that the proposed feature selection method effectively improves the efficiency of protein structural class prediction. Only less than 5% features are used, but the prediction accuracy is improved by 4.6-13.3%. We further compared different protein features and found that the predicted secondary structural features achieve the best performance. This understanding can be used to design more powerful prediction methods for the protein structural class.


Sign in / Sign up

Export Citation Format

Share Document