scholarly journals A Feature Subset Selection Algorithm Automatic Recommendation Method

2013 ◽  
Vol 47 ◽  
pp. 1-34 ◽  
Author(s):  
G. Wang ◽  
Q. Song ◽  
H. Sun ◽  
X. Zhang ◽  
B. Xu ◽  
...  

Many feature subset selection (FSS) algorithms have been proposed, but not all of them are appropriate for a given feature selection problem. At the same time, so far there is rarely a good way to choose appropriate FSS algorithms for the problem at hand. Thus, FSS algorithm automatic recommendation is very important and practically useful. In this paper, a meta learning based FSS algorithm automatic recommendation method is presented. The proposed method first identifies the data sets that are most similar to the one at hand by the k-nearest neighbor classification algorithm, and the distances among these data sets are calculated based on the commonly-used data set characteristics. Then, it ranks all the candidate FSS algorithms according to their performance on these similar data sets, and chooses the algorithms with best performance as the appropriate ones. The performance of the candidate FSS algorithms is evaluated by a multi-criteria metric that takes into account not only the classification accuracy over the selected features, but also the runtime of feature selection and the number of selected features. The proposed recommendation method is extensively tested on 115 real world data sets with 22 well-known and frequently-used different FSS algorithms for five representative classifiers. The results show the effectiveness of our proposed FSS algorithm recommendation method.

Author(s):  
Guansong Pang ◽  
Longbing Cao ◽  
Ling Chen ◽  
Huan Liu

This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a top-k outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i.e., abnormal behaviors are not independent - they bond together) in constructing a noise-resilient outlier scoring function to produce a reliable outlier ranking in each iteration. We show that HOUR (i) retains a 2-approximation outlier ranking to the optimal one; and (ii) significantly outperforms five state-of-the-art competitors on 15 real-world data sets with different noise levels in terms of AUC and/or P@n. The source code of HOUR is available at https://sites.google.com/site/gspangsite/sourcecode.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255307
Author(s):  
Fujun Wang ◽  
Xing Wang

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.


Author(s):  
Amit Saxena ◽  
John Wang

This paper presents a two-phase scheme to select reduced number of features from a dataset using Genetic Algorithm (GA) and testing the classification accuracy (CA) of the dataset with the reduced feature set. In the first phase of the proposed work, an unsupervised approach to select a subset of features is applied. GA is used to select stochastically reduced number of features with Sammon Error as the fitness function. Different subsets of features are obtained. In the second phase, each of the reduced features set is applied to test the CA of the dataset. The CA of a data set is validated using supervised k-nearest neighbor (k-nn) algorithm. The novelty of the proposed scheme is that each reduced feature set obtained in the first phase is investigated for CA using the k-nn classification with different Minkowski metric i.e. non-Euclidean norms instead of conventional Euclidean norm (L2). Final results are presented in the paper with extensive simulations on seven real and one synthetic, data sets. It is revealed from the proposed investigation that taking different norms produces better CA and hence a scope for better feature subset selection.


2016 ◽  
Vol 2016 ◽  
pp. 1-6 ◽  
Author(s):  
Gürcan Yavuz ◽  
Doğan Aydin

Optimal feature subset selection is an important and a difficult task for pattern classification, data mining, and machine intelligence applications. The objective of the feature subset selection is to eliminate the irrelevant and noisy feature in order to select optimum feature subsets and increase accuracy. The large number of features in a dataset increases the computational complexity thus leading to performance degradation. In this paper, to overcome this problem, angle modulation technique is used to reduce feature subset selection problem to four-dimensional continuous optimization problem instead of presenting the problem as a high-dimensional bit vector. To present the effectiveness of the problem presentation with angle modulation and to determine the efficiency of the proposed method, six variants of Artificial Bee Colony (ABC) algorithms employ angle modulation for feature selection. Experimental results on six high-dimensional datasets show that Angle Modulated ABC algorithms improved the classification accuracy with fewer feature subsets.


2021 ◽  
pp. 08-16
Author(s):  
Mohamed Abdel Abdel-Basset ◽  
◽  
◽  
Mohamed Elhoseny

In the current epidemic situations, people are facing several mental disorders related to Depression, Anxiety, and Stress (DAS). Numerous scales are developed for computing the levels for DAS, and DAS-21 is one among them. At the same time, machine learning (ML) models are applied widely to resolve the classification problem efficiently, and feature selection (FS) approaches can be designed to improve the classifier results. In this aspect, this paper develops an intelligent feature selection with ML-based risk management (IFSML-RM) for DAS prediction. The IFSML-RM technique follows a two-stage process: quantum elephant herd optimization-based FS (QEHO-FS) and decision tree (DT) based classification. The QEHO algorithm utilizes the input data to select a valuable subset of features at the primary level. Then, the chosen features are fed into the DT classifier to determine the existence or non-existence of DAS. A detailed experimentation process is carried out on the benchmark dataset, and the experimental results showcased the betterment of the IFSML-RM technique in terms of different performance measures.


Author(s):  
Basabi Chakraborty

Selecting an optimum subset of features from a large set of features is an important pre- processing step for pattern classification, data mining, or machine learning applications. Feature subset selection basically comprises of defining a criterion function for evaluation of the feature subset and developing a search strategy to find the best feature subset from a large number of feature subsets. Lots of mathematical and statistical techniques have been proposed so far. Recently biologically inspired computing is gaining popularity for solving real world problems for their more flexibility compared to traditional statistical or mathematical techniques. In this chapter, the role of Particle Swarm Optimization (PSO), one of the recently developed bio-inspired evolutionary computational (EC) approaches in designing algorithms for producing optimal feature subset from a large feature set, is examined. A state of the art review on Particle Swarm Optimization algorithms and its hybrids with other soft computing techniques for feature subset selection are presented followed by author’s proposals of PSO based algorithms. Simple simulation experiments with benchmark data sets and their results are shown to evaluate their respective effectiveness and comparative performance in selecting best feature subset from a set of features.


Sign in / Sign up

Export Citation Format

Share Document