SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier

The Scientific World JOURNAL ◽

10.1155/2014/795624 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 30

Author(s):

Mei-Ling Huang ◽

Yung-Hsiang Hung ◽

W. M. Lee ◽

R. K. Li ◽

Bo-Ru Jiang

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Explanatory Power ◽

Disease Diagnosis ◽

Parameters Optimization ◽

Recursive Feature Elimination ◽

Support Vector ◽

Svm Classifier ◽

Classification Problems ◽

Class Variable

Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parametersCandγto increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text

Feature Selection Method Based on Mutual Information and Support Vector Machine

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800142150021x ◽

2021 ◽

pp. 2150021

Author(s):

Gang Liu ◽

Chunlei Yang ◽

Sen Liu ◽

Chunbao Xiao ◽

Bin Song

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Mutual Information ◽

Classification Accuracy ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Standard Data ◽

Feature Dimension

A feature selection method based on mutual information and support vector machine (SVM) is proposed in order to eliminate redundant feature and improve classification accuracy. First, local correlation between features and overall correlation is calculated by mutual information. The correlation reflects the information inclusion relationship between features, so the features are evaluated and redundant features are eliminated with analyzing the correlation. Subsequently, the concept of mean impact value (MIV) is defined and the influence degree of input variables on output variables for SVM network based on MIV is calculated. The importance weights of the features described with MIV are sorted by descending order. Finally, the SVM classifier is used to implement feature selection according to the classification accuracy of feature combination which takes MIV order of feature as a reference. The simulation experiments are carried out with three standard data sets of UCI, and the results show that this method can not only effectively reduce the feature dimension and high classification accuracy, but also ensure good robustness.

Download Full-text

The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification

Iraqi Journal of Science ◽

10.24996/ijs.2020.61.12.28 ◽

2020 ◽

pp. 3397-3407

Author(s):

Nur Syafiqah Mohd Nafis ◽

Suryanti Awang

Keyword(s):

Feature Selection ◽

Text Classification ◽

Training Dataset ◽

Recursive Feature Elimination ◽

High Dimensional ◽

Significant Feature ◽

Support Vector ◽

Svm Classifier ◽

Text Documents ◽

Text Document

Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.

Download Full-text

Hybrid adapted fast correlation FCBF-support vector machine recursive feature elimination for feature selection

Intelligent Decision Technologies ◽

10.3233/idt-190014 ◽

2020 ◽

Vol 14 (3) ◽

pp. 269-279

Author(s):

Hayet Djellali ◽

Nacira Ghoualmi-Zine ◽

Souad Guessoum

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Recursive Feature Elimination ◽

Support Vector ◽

Svm Classifier ◽

Hybrid Architecture ◽

Features Selection ◽

K Nearest Neighbors ◽

Correlation Based Feature Selection ◽

Embedded Method

This paper investigates feature selection methods based on hybrid architecture using feature selection algorithm called Adapted Fast Correlation Based Feature selection and Support Vector Machine Recursive Feature Elimination (AFCBF-SVMRFE). The AFCBF-SVMRFE has three stages and composed of SVMRFE embedded method with Correlation based Features Selection. The first stage is the relevance analysis, the second one is a redundancy analysis, and the third stage is a performance evaluation and features restoration stage. Experiments show that the proposed method tested on different classifiers: Support Vector Machine SVM and K nearest neighbors KNN provide a best accuracy on various dataset. The SVM classifier outperforms KNN classifier on these data. The AFCBF-SVMRFE outperforms FCBF multivariate filter, SVMRFE, Particle swarm optimization PSO and Artificial bees colony ABC.

Download Full-text

Feature selection in classification using self-adaptive owl search optimization algorithm with elitism and mutation strategies

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200258 ◽

2021 ◽

Vol 40 (1) ◽

pp. 535-550

Author(s):

Ashis Kumar Mandal ◽

Rikta Sen ◽

Basabi Chakraborty

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Classification Accuracy ◽

Heuristic Algorithms ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Feature Subset ◽

Search Optimization ◽

Self Adaptive

The fundamental aim of feature selection is to reduce the dimensionality of data by removing irrelevant and redundant features. As finding out the best subset of features from all possible subsets is computationally expensive, especially for high dimensional data sets, meta-heuristic algorithms are often used as a promising method for addressing the task. In this paper, a variant of recent meta-heuristic approach Owl Search Optimization algorithm (OSA) has been proposed for solving the feature selection problem within a wrapper-based framework. Several strategies are incorporated with an aim to strengthen BOSA (binary version of OSA) in searching the global best solution. The meta-parameter of BOSA is initialized dynamically and then adjusted using a self-adaptive mechanism during the search process. Besides, elitism and mutation operations are combined with BOSA to control the exploitation and exploration better. This improved BOSA is named in this paper as Modified Binary Owl Search Algorithm (MBOSA). Decision Tree (DT) classifier is used for wrapper based fitness function, and the final classification performance of the selected feature subset is evaluated by Support Vector Machine (SVM) classifier. Simulation experiments are conducted on twenty well-known benchmark datasets from UCI for the evaluation of the proposed algorithm, and the results are reported based on classification accuracy, the number of selected features, and execution time. In addition, BOSA along with three common meta-heuristic algorithms Binary Bat Algorithm (BBA), Binary Particle Swarm Optimization (BPSO), and Binary Genetic Algorithm (BGA) are used for comparison. Simulation results show that the proposed approach outperforms similar methods by reducing the number of features significantly while maintaining a comparable level of classification accuracy.

Download Full-text

Breast and Colon Cancer Classification from Gene Expression Profiles Using Data Mining Techniques

10.20944/preprints202002.0324.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Mohamed Loey ◽

Mohammed Wajeeh Jasim ◽

Hazem M. EL-Bakry ◽

Mohamed Hamed N. Taha ◽

Nour Eldeen M. Khalifa

Keyword(s):

Gene Expression ◽

Classification Accuracy ◽

Information Gain ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Disease Diagnosis ◽

Performance Measure ◽

Support Vector ◽

Svm Classifier ◽

Cancer Type

Early detection of cancer increases the probability of recovery. This paper presents an intelligent decision support system (IDSS) for the early diagnosis of cancer based on gene expression profiles collected using DNA microarrays. Such datasets pose a challenge because of the small number of samples (no more than a few hundred) relative to the large number of genes (on the order of thousands). Therefore, a method of reducing the number of features (genes) that are not relevant to the disease of interest is necessary to avoid overfitting. The proposed methodology uses the information gain (IG) to select the most important features from the input patterns. Then, the selected features (genes) are reduced by applying the grey wolf optimization (GWO) algorithm. Finally, the methodology employs a support vector machine (SVM) classifier for cancer type classification. The proposed methodology was applied to two datasets (Breast and Colon) and was evaluated based on its classification accuracy, which is the most important performance measure in disease diagnosis. The experimental results indicate that the proposed methodology is able to enhance the stability of the classification accuracy as well as the feature selection

Download Full-text

Breast and Colon Cancer Classification from Gene Expression Profiles Using Data Mining Techniques

Symmetry ◽

10.3390/sym12030408 ◽

2020 ◽

Vol 12 (3) ◽

pp. 408 ◽

Cited By ~ 3

Author(s):

Mohamed Loey Ramadan AbdElNabi ◽

Mohammed Wajeeh Jasim ◽

Hazem M. EL-Bakry ◽

Mohamed Hamed N. Taha ◽

Nour Eldeen M. Khalifa

Keyword(s):

Gene Expression ◽

Classification Accuracy ◽

Information Gain ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Disease Diagnosis ◽

Performance Measure ◽

Support Vector ◽

Svm Classifier ◽

Cancer Type

Early detection of cancer increases the probability of recovery. This paper presents an intelligent decision support system (IDSS) for the early diagnosis of cancer based on gene expression profiles collected using DNA microarrays. Such datasets pose a challenge because of the small number of samples (no more than a few hundred) relative to the large number of genes (in the order of thousands). Therefore, a method of reducing the number of features (genes) that are not relevant to the disease of interest is necessary to avoid overfitting. The proposed methodology uses the information gain (IG) to select the most important features from the input patterns. Then, the selected features (genes) are reduced by applying the grey wolf optimization (GWO) algorithm. Finally, the methodology employs a support vector machine (SVM) classifier for cancer type classification. The proposed methodology was applied to two datasets (Breast and Colon) and was evaluated based on its classification accuracy, which is the most important performance measure in disease diagnosis. The experimental results indicate that the proposed methodology is able to enhance the stability of the classification accuracy as well as the feature selection.

Download Full-text

Automatic COVID-19 Detection Using Exemplar Hybrid Deep Features with X-ray Images

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18158052 ◽

2021 ◽

Vol 18 (15) ◽

pp. 8052

Author(s):

Prabal Datta Barua ◽

Nadia Fareeda Muhammad Gowdh ◽

Kartini Rahmat ◽

Norlisah Ramli ◽

Wei Lin Ng ◽

...

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Classification Accuracy ◽

Support Vector ◽

Svm Classifier ◽

Feature Generation ◽

X Ray ◽

Deep Feature ◽

Deep Networks ◽

Fully Connected

COVID-19 and pneumonia detection using medical images is a topic of immense interest in medical and healthcare research. Various advanced medical imaging and machine learning techniques have been presented to detect these respiratory disorders accurately. In this work, we have proposed a novel COVID-19 detection system using an exemplar and hybrid fused deep feature generator with X-ray images. The proposed Exemplar COVID-19FclNet9 comprises three basic steps: exemplar deep feature generation, iterative feature selection and classification. The novelty of this work is the feature extraction using three pre-trained convolutional neural networks (CNNs) in the presented feature extraction phase. The common aspects of these pre-trained CNNs are that they have three fully connected layers, and these networks are AlexNet, VGG16 and VGG19. The fully connected layer of these networks is used to generate deep features using an exemplar structure, and a nine-feature generation method is obtained. The loss values of these feature extractors are computed, and the best three extractors are selected. The features of the top three fully connected features are merged. An iterative selector is used to select the most informative features. The chosen features are classified using a support vector machine (SVM) classifier. The proposed COVID-19FclNet9 applied nine deep feature extraction methods by using three deep networks together. The most appropriate deep feature generation model selection and iterative feature selection have been employed to utilise their advantages together. By using these techniques, the image classification ability of the used three deep networks has been improved. The presented model is developed using four X-ray image corpora (DB1, DB2, DB3 and DB4) with two, three and four classes. The proposed Exemplar COVID-19FclNet9 achieved a classification accuracy of 97.60%, 89.96%, 98.84% and 99.64% using the SVM classifier with 10-fold cross-validation for four datasets, respectively. Our developed Exemplar COVID-19FclNet9 model has achieved high classification accuracy for all four databases and may be deployed for clinical application.

Download Full-text

Weed Detection and Classification using ICA Based SVM Classifier

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5410.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 1557-1560

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

Learning Algorithm ◽

Feature Weighting ◽

Training Data ◽

Support Vector ◽

Svm Classifier ◽

Classification Problems ◽

Weed Detection ◽

Data Set

Support vector machine (SVM) is a commonly known efficient supervised learning algorithm for classification problems. However, the classification accuracy of the SVM classifier depends on its training parameters and the training data set as well. The main objective of this paper is to optimize its parameters and feature weighting in order to improve the strength of the SVM simultaneously. In this paper, the Imperialist Competitive Algorithm based Support Vector Machine (ICA-SVM) classifier is proposed to classify the efficient weed detection. This enhanced ICA-SVM classifier is able to select the appropriate input features and to optimize the parameters of SVM and is improving the classification accuracy. Experimental results show that the ICA-SVM classification algorithm reduces the computational complexity tremendously and improves classification Accuracy.

Download Full-text

A Transfer Learning Architecture Based on a Support Vector Machine for Histopathology Image Classification

Applied Sciences ◽

10.3390/app11146380 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6380

Author(s):

Jiayi Fan ◽

JangHyeon Lee ◽

YongKeun Lee

Keyword(s):

Support Vector Machine ◽

Transfer Learning ◽

Classification Accuracy ◽

Support Vector ◽

Svm Classifier ◽

Breast Cancer Dataset ◽

Classification Problems ◽

Cancer Dataset ◽

Softmax Classifier ◽

Histopathology Images

Recently, digital pathology is an essential application for clinical practice and medical research. Due to the lack of large annotated datasets, the deep transfer learning technique is often used to classify histopathology images. A softmax classifier is often used to perform classification tasks. Besides, a Support Vector Machine (SVM) classifier is also popularly employed, especially for binary classification problems. Accurately determining the category of the histopathology images is vital for the diagnosis of diseases. In this paper, the conventional softmax classifier and the SVM classifier-based transfer learning approach are evaluated to classify histopathology cancer images in a binary breast cancer dataset and a multiclass lung and colon cancer dataset. In order to achieve better classification accuracy, a methodology that attaches SVM classifier to the fully-connected (FC) layer of the softmax-based transfer learning model is proposed. The proposed architecture involves a first step training the newly added FC layer on the target dataset using the softmax-based model and a second step training the SVM classifier with the newly trained FC layer. Cross-validation is used to ensure no bias for the evaluation of the performance of the models. Experimental results reveal that the conventional SVM classifier-based model is the least accurate on either binary or multiclass cancer datasets. The conventional softmax-based model shows moderate classification accuracy, while the proposed synthetic architecture achieves the best classification accuracy.

Download Full-text