Feature Subset Selection Problem using Wrapper Approach in Supervised Learning

Feature subset selection is a process to choose a set of relevant features from a high dimensionality dataset to improve the performance of classifiers. The meaningful words extracted from data forms a set of features for sentiment analysis. Many evolutionary algorithms, like the Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), have been applied to feature subset selection problem and computational performance can still be improved. This research presents a solution to feature subset selection problem for classification of sentiments using ensemble-based classifiers. It consists of a hybrid technique of minimum redundancy and maximum relevance (mRMR) and Forest Optimization Algorithm (FOA)-based feature selection. Ensemble-based classification is implemented to optimize the results of individual classifiers. The Forest Optimization Algorithm as a feature selection technique has been applied to various classification datasets from the UCI machine learning repository. The classifiers used for ensemble methods for UCI repository datasets are the k-Nearest Neighbor (k-NN) and Naïve Bayes (NB). For the classification of sentiments, 15–20% improvement has been recorded. The dataset used for classification of sentiments is Blitzer’s dataset consisting of reviews of electronic products. The results are further improved by ensemble of k-NN, NB, and Support Vector Machine (SVM) with an accuracy of 95% for the classification of sentiment tasks.

Download Full-text

The minimum feature subset selection problem

Journal of Computer Science and Technology ◽

10.1007/bf02951333 ◽

1997 ◽

Vol 12 (2) ◽

pp. 145-153 ◽

Cited By ~ 3

Author(s):

Bin Chen ◽

Jiarong Hong ◽

Yadong Wang

Keyword(s):

Subset Selection ◽

Feature Subset Selection ◽

Selection Problem ◽

Feature Subset

Download Full-text

IMPROVING INCREMENTAL WRAPPER-BASED SUBSET SELECTION VIA REPLACEMENT AND EARLY STOPPING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001411008804 ◽

2011 ◽

Vol 25 (05) ◽

pp. 605-625 ◽

Cited By ~ 19

Author(s):

PABLO BERMEJO ◽

JOSE A. GAMEZ ◽

JOSE M. PUERTA

Keyword(s):

Subset Selection ◽

Adaptive Threshold ◽

Feature Subset Selection ◽

High Dimensional ◽

Feature Subset ◽

Early Stopping ◽

Stopping Sets ◽

Wrapper Approach ◽

Selected Subset ◽

High Dimensional Datasets

This paper deals with the problem of feature subset selection in classification-oriented datasets with a (very) large number of attributes. In such datasets complex classical wrapper approaches become intractable due to the high number of wrapper evaluations to be carried out. One way to alleviate this problem is to use the so-called filter-wrapper approach or Incremental Wrapper-based Subset Selection (IWSS), which consists of the construction of a ranking among the predictive attributes by using a filter measure, and then a wrapper approach is used by following the rank. In this way the number of wrapper evaluations is linear on the number of predictive attributes. In this paper we present two contributions to the IWSS approach. The first one is related with obtaining more compact subsets, and enables not only the addition of new attributes but also their interchange with some of those already included in the selected subset. Our second contribution, termed early stopping, sets an adaptive threshold on the number of attributes in the ranking to be considered. The advantages of these new approaches are analyzed both theoretically and experimentally. The results over a set of 12 high-dimensional datasets corroborate the success of our proposals.

Download Full-text