Comparison of Instance Selection and Construction Methods with Various Classifiers

Marcin Blachnik; Mirosław Kordos

doi:10.3390/app10113933

Comparison of Instance Selection and Construction Methods with Various Classifiers

Applied Sciences ◽

10.3390/app10113933 ◽

2020 ◽

Vol 10 (11) ◽

pp. 3933 ◽

Cited By ~ 1

Author(s):

Marcin Blachnik ◽

Mirosław Kordos

Keyword(s):

Classification Accuracy ◽

Nearest Neighbors ◽

General Purpose ◽

Support Vector ◽

Instance Selection ◽

Selection Methods ◽

Training Set ◽

K Nearest Neighbors ◽

Set Size ◽

Construction Methods

Instance selection and construction methods were originally designed to improve the performance of the k-nearest neighbors classifier by increasing its speed and improving the classification accuracy. These goals were achieved by eliminating redundant and noisy samples, thus reducing the size of the training set. In this paper, the performance of instance selection methods is investigated in terms of classification accuracy and reduction of training set size. The classification accuracy of the following classifiers is evaluated: decision trees, random forest, Naive Bayes, linear model, support vector machine and k-nearest neighbors. The obtained results indicate that for the most of the classifiers compressing the training set affects prediction performance and only a small group of instance selection methods can be recommended as a general purpose preprocessing step. These are learning vector quantization based algorithms, along with the Drop2 and Drop3. Other methods are less efficient or provide low compression ratio.

Download Full-text

An instance selection algorithm for fuzzy K-nearest neighbor

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200124 ◽

2021 ◽

Vol 40 (1) ◽

pp. 521-533

Author(s):

Junhai Zhai ◽

Jiaxing Qi ◽

Sufang Zhang

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbors ◽

Fuzzy Membership ◽

Instance Selection ◽

Selection Algorithm ◽

K Nearest Neighbor ◽

Training Set ◽

K Nearest Neighbors ◽

K Nearest Neighbor Algorithm ◽

Testing Accuracy

The condensed nearest neighbor (CNN) is a pioneering instance selection algorithm for 1-nearest neighbor. Many variants of CNN for K-nearest neighbor have been proposed by different researchers. However, few studies were conducted on condensed fuzzy K-nearest neighbor. In this paper, we present a condensed fuzzy K-nearest neighbor (CFKNN) algorithm that starts from an initial instance set S and iteratively selects informative instances from training set T, moving them from T to S. Specifically, CFKNN consists of three steps. First, for each instance x ∈ T, it finds the K-nearest neighbors in S and calculates the fuzzy membership degrees of the K nearest neighbors using S rather than T. Second it computes the fuzzy membership degrees of x using the fuzzy K-nearest neighbor algorithm. Finally, it calculates the information entropy of x and selects an instance according to the calculated value. Extensive experiments on 11 datasets are conducted to compare CFKNN with four state-of-the-art algorithms (CNN, edited nearest neighbor (ENN), Tomeklinks, and OneSidedSelection) regarding the number of selected instances, the testing accuracy, and the compression ratio. The experimental results show that CFKNN provides excellent performance and outperforms the other four algorithms.

Download Full-text

A New Feature Selection Method for Sentiment Analysis in Short Text

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0171 ◽

2018 ◽

Vol 29 (1) ◽

pp. 1122-1134

Author(s):

H. M. Keerthi Kumar ◽

B. S. Harish

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Classification Accuracy ◽

Information Gain ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Selection Methods ◽

K Nearest Neighbors ◽

Short Text

Abstract In recent internet era, micro-blogging sites produce enormous amount of short textual information, which appears in the form of opinions or sentiments of users. Sentiment analysis is a challenging task in short text, due to use of formal language, misspellings, and shortened forms of words, which leads to high dimensionality and sparsity. In order to deal with these challenges, this paper proposes a novel, simple, and yet effective feature selection method, to select frequently distributed features related to each class. In this paper, the feature selection method is based on class-wise information, to identify the relevant feature related to each class. We evaluate the proposed feature selection method by comparing with existing feature selection methods like chi-square ( χ2), entropy, information gain, and mutual information. The performances are evaluated using classification accuracy obtained from support vector machine, K nearest neighbors, and random forest classifiers on two publically available datasets viz., Stanford Twitter dataset and Ravikiran Janardhana dataset. In order to demonstrate the effectiveness of the proposed feature selection method, we conducted extensive experimentation by selecting different feature sets. The proposed feature selection method outperforms the existing feature selection methods in terms of classification accuracy on the Stanford Twitter dataset. Similarly, the proposed method performs competently equally in terms of classification accuracy compared to other feature selection methods in most of the feature subsets on Ravikiran Janardhana dataset.

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text

Persian Handwritten Number Recognition Using Adapted Framing Feature and Support Vector Machines

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026816500048 ◽

2016 ◽

Vol 15 (01) ◽

pp. 1650004 ◽

Cited By ~ 3

Author(s):

Hedieh Sajedi ◽

Mehran Bahador

Keyword(s):

Support Vector Machines ◽

Recognition Rate ◽

Nearest Neighbors ◽

Polynomial Kernel ◽

Support Vector ◽

K Nearest Neighbors ◽

New Approach ◽

Number Recognition ◽

Vector Machines

In this paper, a new approach for segmentation and recognition of Persian handwritten numbers is presented. This method utilizes the framing feature technique in combination with outer profile feature that we named this the adapted framing feature. In our proposed approach, segmentation of the numbers into digits has been carried out automatically. In the classification stage of the proposed method, Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) are used. Experimentations are conducted on the IFHCDB database consisting 17,740 numeral images and HODA database consisting 102,352 numeral images. In isolated digit level on IFHCDB, the recognition rate of 99.27%, is achieved by using SVM with polynomial kernel. Furthermore, in isolated digit level on HODA, the recognition rate of 99.07% is achieved by using SVM with polynomial kernel. The experiments illustrate that applying our proposed method resulted higher accuracy compared to previous researches.

Download Full-text

Preprocessing Unbalanced Data using Support Vector Machine with Method K-Nearest Neighbors for Cerebral Infarction Classification

Journal of Physics Conference Series ◽

10.1088/1742-6596/1752/1/012037 ◽

2021 ◽

Vol 1752 (1) ◽

pp. 012037

Author(s):

A G M Sari ◽

A M Putri ◽

Z Rustam ◽

J Pandelaki

Keyword(s):

Support Vector Machine ◽

Cerebral Infarction ◽

Nearest Neighbors ◽

Support Vector ◽

Unbalanced Data ◽

K Nearest Neighbors

Download Full-text

General-Purpose Learning Machine Using K-Nearest Neighbors Algorithm

RoboCup 2005: Robot Soccer World Cup IX - Lecture Notes in Computer Science ◽

10.1007/11780519_50 ◽

2006 ◽

pp. 529-536 ◽

Cited By ~ 2

Author(s):

Seyed Hamid Hamraz ◽

Seyed Shams Feyzabadi

Keyword(s):

Nearest Neighbors ◽

General Purpose ◽

K Nearest Neighbors ◽

Learning Machine

Download Full-text

Prediction of breast cancer using support vector machine and K-Nearest neighbors

2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC) ◽

10.1109/r10-htc.2017.8288944 ◽

2017 ◽

Cited By ~ 25

Author(s):

Md. Milon Islam ◽

Hasib Iqbal ◽

Md. Rezwanul Haque ◽

Md. Kamrul Hasan

Keyword(s):

Breast Cancer ◽

Support Vector Machine ◽

Nearest Neighbors ◽

Support Vector ◽

K Nearest Neighbors

Download Full-text

Intelligent System to Classify Peanuts Varieties Using K-Nearest Neighbors (K-NN) and Support Vector Machine (SVM)

Communications in Computer and Information Science - Advanced Informatics for Computing Research ◽

10.1007/978-981-15-0108-1_33 ◽

2019 ◽

pp. 359-368

Author(s):

V. G. Narendra ◽

K. Govardhan Hegde

Keyword(s):

Support Vector Machine ◽

Intelligent System ◽

Nearest Neighbors ◽

Support Vector ◽

K Nearest Neighbors

Download Full-text

Bio-Inspired Optimization Algorithms for Arabic Handwritten Characters

Handbook of Research on Machine Learning Innovations and Trends - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-2229-4.ch039 ◽

2017 ◽

pp. 897-914 ◽

Cited By ~ 3

Author(s):

Ahmed.T. Sahlol ◽

Aboul Ella Hassanien

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Random Forests ◽

Classification Accuracy ◽

Processing Time ◽

Optimization Algorithms ◽

Nearest Neighbors ◽

Benchmark Dataset ◽

K Nearest Neighbors ◽

Linear Discriminant

There are still many obstacles for achieving high recognition accuracy for Arabic handwritten optical character recognition system, each character has a different shape, as well as the similarities between characters. In this chapter, several feature selection-based bio-inspired optimization algorithms including Bat Algorithm, Grey Wolf Optimization, Whale optimization Algorithm, Particle Swarm Optimization and Genetic Algorithm have been presented and an application of Arabic handwritten characters recognition has been chosen to see their ability and accuracy to recognize Arabic characters. The experiments have been performed using a benchmark dataset, CENPARMI by k-Nearest neighbors, Linear Discriminant Analysis, and random forests. The achieved results show superior results for the selected features when comparing the classification accuracy for the selected features by the optimization algorithms with the whole feature set in terms of the classification accuracy and the processing time. The experiments have been performed using a benchmark dataset, CENPARMI by k-Nearest neighbors, Linear Discriminant Analysis, and random forests. The achieved results show superior results for the selected features when comparing the classification accuracy for the selected features by the optimization algorithms with the whole feature set in terms of the classification accuracy and the processing time.

Download Full-text

Técnicas de aprendizaje de máquina utilizadas para la minería de texto

Investigación Bibliotecológica Archivonomía Bibliotecología e Información ◽

10.22201/iibi.0187358xp.2017.71.57812 ◽

2017 ◽

Vol 31 (71) ◽

pp. 103

Author(s):

Ángel Freddy Godoy Viera

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Nearest Neighbors ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbors ◽

Self Organizing Maps ◽

Self Organizing

Las técnicas de aprendizaje de máquina continúan siendo muy utilizadas para la minería de texto. Para este artículo se realizó una revisión de literatura en periódicos científicos publicados en los años de 2010 y 2011, con el objetivo de identificar las principales formas de aprendizaje de máquina empleadas para la minería de texto. Se utilizó estadística descriptiva para organizar, resumir y analizar los datos encontrados, y se presentó una descripción resumida de las principales encontradas. En los artículos analizados se hallaron 13 aplicadas para la minería de texto, el 83% de los artículos mencionaban de 1 a 3 técnicas de aprendizaje de máquina, las principales usadas por los autores en los artículos estudiados fueron support vector machine (svm), k-means (k-m),k-nearest neighbors (k-nn), naive bayes (nb), self-organizing maps (som). Los pares que aparecen con mayor frecuencia son svm/nb, svm/k-nn, svm/decission tree.

Download Full-text