Scaling up Instance Selection Algorithms by Dividing-and-Conquering

Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.

Download Full-text

Bagging of Instance Selection Algorithms

Artificial Intelligence and Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-07176-3_4 ◽

2014 ◽

pp. 40-51 ◽

Cited By ~ 9

Author(s):

Marcin Blachnik ◽

Mirosław Kordos

Keyword(s):

Instance Selection ◽

Selection Algorithms

Download Full-text

Comparison of Instance Selection Algorithms II. Results and Comments

Lecture Notes in Computer Science - Artificial Intelligence and Soft Computing - ICAISC 2004 ◽

10.1007/978-3-540-24844-6_87 ◽

2004 ◽

pp. 580-585 ◽

Cited By ~ 35

Author(s):

Marek Grochowski ◽

Norbert Jankowski

Keyword(s):

Instance Selection ◽

Selection Algorithms

Download Full-text

Ensembles of instance selection methods: A comparative study

International Journal of Applied Mathematics and Computer Science ◽

10.2478/amcs-2019-0012 ◽

2019 ◽

Vol 29 (1) ◽

pp. 151-168

Author(s):

Marcin Blachnik

Keyword(s):

Prediction Accuracy ◽

Additive Noise ◽

Instance Selection ◽

Selection Methods ◽

Empirical Comparison ◽

Training Set ◽

Objective Criterion ◽

Single Dataset ◽

Selection Algorithms ◽

First Time

Abstract Instance selection is often performed as one of the preprocessing methods which, along with feature selection, allows a significant reduction in computational complexity and an increase in prediction accuracy. So far, only few authors have considered ensembles of instance selection methods, while the ensembles of final predictive models attract many researchers. To bridge that gap, in this paper we compare four ensembles adapted to instance selection: Bagging, Feature Bagging, AdaBoost and Additive Noise. The last one is introduced for the first time in this paper. The study is based on empirical comparison performed on 43 datasets and 9 base instance selection methods. The experiments are divided into three scenarios. In the first one, evaluated on a single dataset, we demonstrate the influence of the ensembles on the compression–accuracy relation, in the second scenario the goal is to achieve the highest prediction accuracy, and in the third one both accuracy and the level of dataset compression constitute a multi-objective criterion. The obtained results indicate that ensembles of instance selection improve the base instance selection algorithms except for unstable methods such as CNN and IB3, which is achieved at the expense of compression. In the comparison, Bagging and AdaBoost lead in most of the scenarios. In the experiments we evaluate three classifiers: 1NN, kNN and SVM. We also note a deterioration in prediction accuracy for robust classifiers (kNN and SVM) trained on data filtered by any instance selection methods (including the ensembles) when compared with the results obtained when the entire training set was used to train these classifiers.

Download Full-text

Instance selection algorithms of balanced class distribution based on Hubness for time series

Journal of Computer Applications ◽

10.3724/sp.j.1087.2012.03034 ◽

2013 ◽

Vol 32 (11) ◽

pp. 3034-3037

Author(s):

Ting-ting ZHAI ◽

Zhen-feng HE

Keyword(s):

Time Series ◽

Instance Selection ◽

Class Distribution ◽

Selection Algorithms

Download Full-text