Empowering Simultaneous Feature and Instance Selection in Classification Problems through the Adaptation of Two Selection Algorithms

Abstract A novel approach for instance selection in classification problems is presented. This adaptive instance selection is designed to simultaneously decrease the amount of computation resources required and increase the classification quality achieved. The approach generates new training samples during the evolutionary process and changes the training set for the algorithm. The instance selection is guided by means of changing probabilities, so that the algorithm concentrates on problematic examples which are difficult to classify. The hybrid fuzzy classification algorithm with a self-configuration procedure is used as a problem solver. The classification quality is tested upon 9 problem data sets from the KEEL repository. A special balancing strategy is used in the instance selection approach to improve the classification quality on imbalanced datasets. The results prove the usefulness of the proposed approach as compared with other classification methods.

Download Full-text

Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

Journal of Healthcare Engineering ◽

10.1155/2018/1817479 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 7

Author(s):

Min-Wei Huang ◽

Wei-Chao Lin ◽

Chih-Fong Tsai

Keyword(s):

Missing Values ◽

Positive Impact ◽

Numerical Data ◽

Data Type ◽

Mixed Data ◽

Instance Selection ◽

Missing Value ◽

Missing Value Imputation ◽

Noisy Information ◽

Selection Algorithms

Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.

Download Full-text

Bagging of Instance Selection Algorithms

Artificial Intelligence and Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-07176-3_4 ◽

2014 ◽

pp. 40-51 ◽

Cited By ~ 9

Author(s):

Marcin Blachnik ◽

Mirosław Kordos

Keyword(s):

Instance Selection ◽

Selection Algorithms

Download Full-text

MULTI-OBJECTIVE EVOLUTIONARY ALGORITHMS FOR FILTER BASED FEATURE SELECTION IN CLASSIFICATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500243 ◽

2013 ◽

Vol 22 (04) ◽

pp. 1350024 ◽

Cited By ~ 24

Author(s):

BING XUE ◽

LIAM CERVANTE ◽

LIN SHANG ◽

WILL N. BROWNE ◽

MENGJIE ZHANG

Keyword(s):

Feature Selection ◽

Evaluation Criteria ◽

Classification Performance ◽

Classification Problems ◽

Multi Objective ◽

Conflicting Objectives ◽

Benchmark Datasets ◽

Selection Algorithms ◽

Filter Algorithms ◽

Single Objective

Feature selection is a multi-objective problem with the two main conflicting objectives of minimising the number of features and maximising the classification performance. However, most existing feature selection algorithms are single objective and do not appropriately reflect the actual need. There are a small number of multi-objective feature selection algorithms, which are wrapper based and accordingly are computationally expensive and less general than filter algorithms. Evolutionary computation techniques are particularly suitable for multi-objective optimisation because they use a population of candidate solutions and are able to find multiple non-dominated solutions in a single run. However, the two well-known evolutionary multi-objective algorithms, non-dominated sorting based multi-objective genetic algorithm II (NSGAII) and strength Pareto evolutionary algorithm 2 (SPEA2) have not been applied to filter based feature selection. In this work, based on NSGAII and SPEA2, we develop two multi-objective, filter based feature selection frameworks. Four multi-objective feature selection methods are then developed by applying mutual information and entropy as two different filter evaluation criteria in each of the two proposed frameworks. The proposed multi-objective algorithms are examined and compared with a single objective method and three traditional methods (two filters and one wrapper) on eight benchmark datasets. A decision tree is employed to test the classification performance. Experimental results show that the proposed multi-objective algorithms can automatically evolve a set of non-dominated solutions that include a smaller number of features and achieve better classification performance than using all features. NSGAII and SPEA2 outperform the single objective algorithm, the two traditional filter algorithms and even the traditional wrapper algorithm in terms of both the number of features and the classification performance in most cases. NSGAII achieves similar performance to SPEA2 for the datasets that consist of a small number of features and slightly better results when the number of features is large. This work represents the first study on NSGAII and SPEA2 for filter feature selection in classification problems with both providing field leading classification performance.

Download Full-text

Comparison of Instance Selection Algorithms II. Results and Comments

Lecture Notes in Computer Science - Artificial Intelligence and Soft Computing - ICAISC 2004 ◽

10.1007/978-3-540-24844-6_87 ◽

2004 ◽

pp. 580-585 ◽

Cited By ~ 35

Author(s):

Marek Grochowski ◽

Norbert Jankowski

Keyword(s):

Instance Selection ◽

Selection Algorithms

Download Full-text

Feature Selection Algorithm Using Relative Odds for Data Mining Classification

Big Data Analytics for Sustainable Computing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-9750-6.ch005 ◽

2020 ◽

pp. 81-106 ◽

Cited By ~ 3

Author(s):

Donald Douglas Atsa'am

Keyword(s):

Feature Selection ◽

Binary Classification ◽

Initial Step ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Classification Problems ◽

Odds Ratios ◽

Relative Odds ◽

Importance Ranking ◽

Selection Algorithms

A filter feature selection algorithm is developed and its performance tested. In the initial step, the algorithm dichotomizes the dataset then separately computes the association between each predictor and the class variable using relative odds (odds ratios). The value of the odds ratios becomes the importance ranking of the corresponding explanatory variable in determining the output. Logistic regression classification is deployed to test the performance of the new algorithm in comparison with three existing feature selection algorithms: the Fisher index, Pearson's correlation, and the varImp function. A number of experimental datasets are employed, and in most cases, the subsets selected by the new algorithm produced models with higher classification accuracy than the subsets suggested by the existing feature selection algorithms. Therefore, the proposed algorithm is a reliable alternative in filter feature selection for binary classification problems.

Download Full-text