A Feature Subset Selection Algorithm Automatic Recommendation Method

Journal of Artificial Intelligence Research ◽

10.1613/jair.3831 ◽

2013 ◽

Vol 47 ◽

pp. 1-34 ◽

Cited By ~ 34

Author(s):

G. Wang ◽

Q. Song ◽

H. Sun ◽

X. Zhang ◽

B. Xu ◽

...

Keyword(s):

Feature Selection ◽

Subset Selection ◽

Feature Subset Selection ◽

Data Sets ◽

Feature Subset ◽

Similar Data ◽

K Nearest Neighbor ◽

Real World Data ◽

Feature Selection Problem ◽

Data Set

Many feature subset selection (FSS) algorithms have been proposed, but not all of them are appropriate for a given feature selection problem. At the same time, so far there is rarely a good way to choose appropriate FSS algorithms for the problem at hand. Thus, FSS algorithm automatic recommendation is very important and practically useful. In this paper, a meta learning based FSS algorithm automatic recommendation method is presented. The proposed method first identifies the data sets that are most similar to the one at hand by the k-nearest neighbor classification algorithm, and the distances among these data sets are calculated based on the commonly-used data set characteristics. Then, it ranks all the candidate FSS algorithms according to their performance on these similar data sets, and chooses the algorithms with best performance as the appropriate ones. The performance of the candidate FSS algorithms is evaluated by a multi-criteria metric that takes into account not only the classification accuracy over the selected features, but also the runtime of feature selection and the number of selected features. The proposed recommendation method is extensively tested on 115 real world data sets with 22 well-known and frequently-used different FSS algorithms for five representative classifiers. The results show the effectiveness of our proposed FSS algorithm recommendation method.

Download Full-text

Empirical evaluation of feature subset selection based on a real-world data set

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2004.03.005 ◽

2004 ◽

Vol 17 (3) ◽

pp. 285-288 ◽

Cited By ~ 5

Author(s):

Petra Perner ◽

Chid Apte

Keyword(s):

Real World ◽

Empirical Evaluation ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Real World Data ◽

Data Set ◽

World Data

Download Full-text

Empirical Evaluation of Feature Subset Selection Based on a Real-World Data Set

Principles of Data Mining and Knowledge Discovery - Lecture Notes in Computer Science ◽

10.1007/3-540-45372-5_68 ◽

2000 ◽

pp. 575-580 ◽

Cited By ~ 7

Author(s):

Petra Perner ◽

Chid Apte

Keyword(s):

Real World ◽

Empirical Evaluation ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Real World Data ◽

Data Set ◽

World Data

Download Full-text

Learning Homophily Couplings from Non-IID Data for Joint Feature Selection and Noise-Resilient Outlier Detection

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/360 ◽

2017 ◽

Cited By ~ 11

Author(s):

Guansong Pang ◽

Longbing Cao ◽

Ling Chen ◽

Huan Liu

Keyword(s):

Feature Selection ◽

Outlier Detection ◽

Scoring Function ◽

Feature Subset Selection ◽

Data Sets ◽

Feature Subset ◽

Real World Data ◽

Wrapper Approach ◽

Evaluation Measure ◽

Abnormal Behaviors

This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a top-k outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i.e., abnormal behaviors are not independent - they bond together) in constructing a noise-resilient outlier scoring function to produce a reliable outlier ranking in each iteration. We show that HOUR (i) retains a 2-approximation outlier ranking to the optimal one; and (ii) significantly outperforms five state-of-the-art competitors on 15 real-world data sets with different noise levels in terms of AUC and/or P@n. The source code of HOUR is available at https://sites.google.com/site/gspangsite/sourcecode.

Download Full-text

A novel feature selection algorithm based on damping oscillation theory

PLoS ONE ◽

10.1371/journal.pone.0255307 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255307

Author(s):

Fujun Wang ◽

Xing Wang

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Euclidean Distance ◽

Oscillation Theory ◽

Feature Subset Selection ◽

Support Vector ◽

Data Sets ◽

Feature Subset ◽

Selection Algorithm ◽

Filter Model

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.

Download Full-text

Dimensionality Reduction with Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy

Exploring Advances in Interdisciplinary Data Mining and Analytics ◽

10.4018/978-1-61350-474-1.ch006 ◽

2011 ◽

pp. 91-109

Author(s):

Amit Saxena ◽

John Wang

Keyword(s):

Classification Accuracy ◽

Nearest Neighbor ◽

Fitness Function ◽

Synthetic Data ◽

Feature Subset Selection ◽

Second Phase ◽

Data Sets ◽

Feature Subset ◽

K Nearest Neighbor ◽

Two Phase

This paper presents a two-phase scheme to select reduced number of features from a dataset using Genetic Algorithm (GA) and testing the classification accuracy (CA) of the dataset with the reduced feature set. In the first phase of the proposed work, an unsupervised approach to select a subset of features is applied. GA is used to select stochastically reduced number of features with Sammon Error as the fitness function. Different subsets of features are obtained. In the second phase, each of the reduced features set is applied to test the CA of the dataset. The CA of a data set is validated using supervised k-nearest neighbor (k-nn) algorithm. The novelty of the proposed scheme is that each reduced feature set obtained in the first phase is investigated for CA using the k-nn classification with different Minkowski metric i.e. non-Euclidean norms instead of conventional Euclidean norm (L2). Final results are presented in the paper with extensive simulations on seven real and one synthetic, data sets. It is revealed from the proposed investigation that taking different norms produces better CA and hence a scope for better feature subset selection.

Download Full-text

Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

Advances in Intelligent Systems and Computing - 14th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2019) ◽

10.1007/978-3-030-20055-8_24 ◽

2019 ◽

pp. 251-260 ◽

Cited By ~ 1

Author(s):

Antonio J. Tallón-Ballesteros ◽

Luís Cavique ◽

Simon Fong

Keyword(s):

Feature Selection ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Low Dimensionality ◽

Correlation Based Feature Selection

Download Full-text

Angle Modulated Artificial Bee Colony Algorithms for Feature Selection

Applied Computational Intelligence and Soft Computing ◽

10.1155/2016/9569161 ◽

2016 ◽

Vol 2016 ◽

pp. 1-6 ◽

Cited By ~ 7

Author(s):

Gürcan Yavuz ◽

Doğan Aydin

Keyword(s):

Feature Selection ◽

Artificial Bee Colony ◽

Continuous Optimization ◽

Subset Selection ◽

Machine Intelligence ◽

Feature Subset Selection ◽

High Dimensional ◽

Feature Subset ◽

Bee Colony ◽

Angle Modulation

Optimal feature subset selection is an important and a difficult task for pattern classification, data mining, and machine intelligence applications. The objective of the feature subset selection is to eliminate the irrelevant and noisy feature in order to select optimum feature subsets and increase accuracy. The large number of features in a dataset increases the computational complexity thus leading to performance degradation. In this paper, to overcome this problem, angle modulation technique is used to reduce feature subset selection problem to four-dimensional continuous optimization problem instead of presenting the problem as a high-dimensional bit vector. To present the effectiveness of the problem presentation with angle modulation and to determine the efficiency of the proposed method, six variants of Artificial Bee Colony (ABC) algorithms employ angle modulation for feature selection. Experimental results on six high-dimensional datasets show that Angle Modulated ABC algorithms improved the classification accuracy with fewer feature subsets.

Download Full-text

Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets

Knowledge and Information Systems ◽

10.1007/s10115-005-0199-4 ◽

2005 ◽

Vol 9 (1) ◽

pp. 91-108 ◽

Cited By ~ 3

Author(s):

Hiroshi Mamitsuka

Keyword(s):

High Dimensional Data ◽

Subset Selection ◽

Feature Subset Selection ◽

High Dimensional ◽

Data Sets ◽

Feature Subset ◽

Query Learning ◽

Selection For

Download Full-text

Intelligent Feature Subset Selection with Machine Learning based Risk Management for DAS Prediction

10.54216/jcim.080101 ◽

2021 ◽

pp. 08-16

Author(s):

Mohamed Abdel Abdel-Basset ◽

◽

Mohamed Elhoseny

Keyword(s):

Machine Learning ◽

Risk Management ◽

Feature Selection ◽

Subset Selection ◽

Classification Problem ◽

Feature Subset Selection ◽

Feature Subset ◽

Time Machine ◽

Primary Level ◽

Stage Process

In the current epidemic situations, people are facing several mental disorders related to Depression, Anxiety, and Stress (DAS). Numerous scales are developed for computing the levels for DAS, and DAS-21 is one among them. At the same time, machine learning (ML) models are applied widely to resolve the classification problem efficiently, and feature selection (FS) approaches can be designed to improve the classifier results. In this aspect, this paper develops an intelligent feature selection with ML-based risk management (IFSML-RM) for DAS prediction. The IFSML-RM technique follows a two-stage process: quantum elephant herd optimization-based FS (QEHO-FS) and decision tree (DT) based classification. The QEHO algorithm utilizes the input data to select a valuable subset of features at the primary level. Then, the chosen features are fed into the DT classifier to determine the existence or non-existence of DAS. A detailed experimentation process is carried out on the benchmark dataset, and the experimental results showcased the betterment of the IFSML-RM technique in terms of different performance measures.

Download Full-text

Particle Swarm Optimization Algorithm and its Hybrid Variants for Feature Subset Selection

Handbook of Research on Computational Intelligence for Engineering, Science, and Business ◽

10.4018/978-1-4666-2518-1.ch017 ◽

2013 ◽

pp. 449-466

Author(s):

Basabi Chakraborty

Keyword(s):

Particle Swarm Optimization ◽

Particle Swarm ◽

Subset Selection ◽

Feature Subset Selection ◽

Data Sets ◽

Feature Subset ◽

Swarm Optimization ◽

Machine Learning Applications ◽

Mathematical Techniques ◽

Optimal Feature Subset

Selecting an optimum subset of features from a large set of features is an important pre- processing step for pattern classification, data mining, or machine learning applications. Feature subset selection basically comprises of defining a criterion function for evaluation of the feature subset and developing a search strategy to find the best feature subset from a large number of feature subsets. Lots of mathematical and statistical techniques have been proposed so far. Recently biologically inspired computing is gaining popularity for solving real world problems for their more flexibility compared to traditional statistical or mathematical techniques. In this chapter, the role of Particle Swarm Optimization (PSO), one of the recently developed bio-inspired evolutionary computational (EC) approaches in designing algorithms for producing optimal feature subset from a large feature set, is examined. A state of the art review on Particle Swarm Optimization algorithms and its hybrids with other soft computing techniques for feature subset selection are presented followed by author’s proposals of PSO based algorithms. Simple simulation experiments with benchmark data sets and their results are shown to evaluate their respective effectiveness and comparative performance in selecting best feature subset from a set of features.

Download Full-text