Input Feature Selection Method Based on Feature Set Equivalence and Mutual Information Gain Maximization

Feature Selection Method Based on Mutual Information and Support Vector Machine

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800142150021x ◽

2021 ◽

pp. 2150021

Author(s):

Gang Liu ◽

Chunlei Yang ◽

Sen Liu ◽

Chunbao Xiao ◽

Bin Song

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Mutual Information ◽

Classification Accuracy ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Standard Data ◽

Feature Dimension

A feature selection method based on mutual information and support vector machine (SVM) is proposed in order to eliminate redundant feature and improve classification accuracy. First, local correlation between features and overall correlation is calculated by mutual information. The correlation reflects the information inclusion relationship between features, so the features are evaluated and redundant features are eliminated with analyzing the correlation. Subsequently, the concept of mean impact value (MIV) is defined and the influence degree of input variables on output variables for SVM network based on MIV is calculated. The importance weights of the features described with MIV are sorted by descending order. Finally, the SVM classifier is used to implement feature selection according to the classification accuracy of feature combination which takes MIV order of feature as a reference. The simulation experiments are carried out with three standard data sets of UCI, and the results show that this method can not only effectively reduce the feature dimension and high classification accuracy, but also ensure good robustness.

Download Full-text

A Robust Gene selection Method for Microarray-based Cancer Classification

Cancer Informatics ◽

10.4137/cin.s3794 ◽

2010 ◽

Vol 9 ◽

pp. CIN.S3794 ◽

Cited By ~ 21

Author(s):

Xiaosheng Wang ◽

Osamu Gotoh

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Information Gain ◽

Expression Profiles ◽

Feature Selection Method ◽

Gene Expression Profiles ◽

Molecular Classification ◽

Selection Method ◽

Chi Square

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.

Download Full-text

A Feature Selection Method Using a Fuzzy Mutual Information Measure

Advances in Soft Computing - Innovations in Hybrid Intelligent Systems ◽

10.1007/978-3-540-74972-1_9 ◽

2007 ◽

pp. 56-63 ◽

Cited By ~ 2

Author(s):

Javier Grande ◽

María del Rosario Suárez ◽

José Ramón Villar

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Feature Selection Method ◽

Selection Method ◽

Information Measure

Download Full-text

Mini-Batch Normalized Mutual Information: A Hybrid Feature Selection Method

IEEE Access ◽

10.1109/access.2019.2936346 ◽

2019 ◽

Vol 7 ◽

pp. 116875-116885 ◽

Cited By ~ 4

Author(s):

G. S. Thejas ◽

Sajal Raj Joshi ◽

S. S. Iyengar ◽

N. R. Sunitha ◽

Prajwal Badrinath

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Feature Selection Method ◽

Selection Method ◽

Normalized Mutual Information

Download Full-text

A hybrid feature selection method based on genetic algorithm and information gain

2016 5th International Conference on Computer Science and Network Technology (ICCSNT) ◽

10.1109/iccsnt.2016.8070172 ◽

2016 ◽

Cited By ~ 1

Author(s):

Fei He ◽

Huamin Yang ◽

Yu Miao ◽

Rainbow Louis

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Information Gain ◽

Feature Selection Method ◽

Selection Method

Download Full-text

Feature Selection Method Based on Maximum Conditional and Joint Mutual Information

2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC) ◽

10.1109/icivc47709.2019.8981340 ◽

2019 ◽

Author(s):

Jun Qian ◽

Yingchi Mao ◽

Jianghong Tang ◽

Longbao Wang

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Feature Selection Method ◽

Selection Method

Download Full-text

Optimization of a Computer-Aided Detection Scheme Using a Logistic Regression Model and Information Gain Feature Selection Method

Global Journal of Breast Cancer Research ◽

10.14205/2309-4419.2013.01.01.1 ◽

2013 ◽

Author(s):

Zheng

Keyword(s):

Logistic Regression ◽

Feature Selection ◽

Regression Model ◽

Logistic Regression Model ◽

Information Gain ◽

Feature Selection Method ◽

Selection Method ◽

Computer Aided Detection ◽

Detection Scheme ◽

Computer Aided

Download Full-text

A NEW FEATURE SELECTION METHOD FOR TEXT CLASSIFICATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001407005466 ◽

2007 ◽

Vol 21 (02) ◽

pp. 423-438 ◽

Cited By ~ 9

Author(s):

GULDEN UCHYIGIT ◽

KEITH CLARK

Keyword(s):

Feature Selection ◽

Text Classification ◽

Information Gain ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

Computational Time ◽

Small Subset ◽

Selection Methods ◽

New Feature

Text classification is the problem of classifying a set of documents into a pre-defined set of classes. A major problem with text classification problems is the high dimensionality of the feature space. Only a small subset of these words are feature words which can be used in determining a document's class, while the rest adds noise and can make the results unreliable and significantly increase computational time. A common approach in dealing with this problem is feature selection where the number of words in the feature space are significantly reduced. In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study including the new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Bayesian Probabilistic Classifier.

Download Full-text