A Feature Selection Method for Multi-Label Text Based on Feature Importance

Lu Zhang; Qingling Duan

doi:10.3390/app9040665

A Feature Selection Method for Multi-Label Text Based on Feature Importance

Applied Sciences ◽

10.3390/app9040665 ◽

2019 ◽

Vol 9 (4) ◽

pp. 665 ◽

Cited By ~ 4

Author(s):

Lu Zhang ◽

Qingling Duan

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

Efficient Technology ◽

Discrimination Ability ◽

Assignment Method ◽

Public Data ◽

Feature Importance ◽

Category Discrimination

Multi-label text classification refers to a text divided into multiple categories simultaneously, which corresponds to a text associated with multiple topics in the real world. The feature space generated by text data has the characteristics of high dimensionality and sparsity. Feature selection is an efficient technology that removes useless and redundant features, reduces the dimension of the feature space, and avoids dimension disaster. A feature selection method for multi-label text based on feature importance is proposed in this paper. Firstly, multi-label texts are transformed into single-label texts using the label assignment method. Secondly, the importance of each feature is calculated using the method based on Category Contribution (CC). Finally, features with higher importance are selected to construct the feature space. In the proposed method, the feature importance is calculated from the perspective of the category, which ensures the selected features have strong category discrimination ability. Specifically, the contributions of the features to each category from two aspects of inter-category and intra-category are calculated, then the importance of the features is obtained with the combination of them. The proposed method is tested on six public data sets and the experimental results are good, which demonstrates the effectiveness of the proposed method.

Download Full-text

A New Feature Selection Method for Text Classification Based on Independent Feature Space Search

Mathematical Problems in Engineering ◽

10.1155/2020/6076272 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14 ◽

Cited By ~ 3

Author(s):

Yong Liu ◽

Shenggen Ju ◽

Junfeng Wang ◽

Chong Su

Keyword(s):

Feature Selection ◽

Text Classification ◽

Predictive Accuracy ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

The Other ◽

Feature Subset ◽

Search Range ◽

Text Documents

Feature selection method is designed to select the representative feature subsets from the original feature set by different evaluation of feature relevance, which focuses on reducing the dimension of the features while maintaining the predictive accuracy of a classifier. In this study, we propose a feature selection method for text classification based on independent feature space search. Firstly, a relative document-term frequency difference (RDTFD) method is proposed to divide the features in all text documents into two independent feature sets according to the features’ ability to discriminate the positive and negative samples, which has two important functions: one is to improve the high class correlation of the features and reduce the correlation between the features and the other is to reduce the search range of feature space and maintain appropriate feature redundancy. Secondly, the feature search strategy is used to search the optimal feature subset in independent feature space, which can improve the performance of text classification. Finally, we evaluate several experiments conduced on six benchmark corpora, the experimental results show the RDTFD method based on independent feature space search is more robust than the other feature selection methods.

Download Full-text

A NEW FEATURE SELECTION METHOD FOR TEXT CLASSIFICATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001407005466 ◽

2007 ◽

Vol 21 (02) ◽

pp. 423-438 ◽

Cited By ~ 9

Author(s):

GULDEN UCHYIGIT ◽

KEITH CLARK

Keyword(s):

Feature Selection ◽

Text Classification ◽

Information Gain ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

Computational Time ◽

Small Subset ◽

Selection Methods ◽

New Feature

Text classification is the problem of classifying a set of documents into a pre-defined set of classes. A major problem with text classification problems is the high dimensionality of the feature space. Only a small subset of these words are feature words which can be used in determining a document's class, while the rest adds noise and can make the results unreliable and significantly increase computational time. A common approach in dealing with this problem is feature selection where the number of words in the feature space are significantly reduced. In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study including the new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Bayesian Probabilistic Classifier.

Download Full-text

Performance Analysis of Classification Techniques With Feature Selection Method for Prediction of Chronic Kidney Disease

Innovations in Digital Branding and Content Marketing - Advances in Marketing, Customer Relationship Management, and E-Services ◽

10.4018/978-1-7998-4420-4.ch010 ◽

2021 ◽

pp. 220-244

Author(s):

Noopur Goel

Keyword(s):

Chronic Kidney Disease ◽

Feature Selection ◽

Kidney Disease ◽

Feature Selection Method ◽

Selection Method ◽

Target Feature ◽

Feature Importance ◽

The World ◽

Independent Features

Chronic kidney disease has become a very prevalent problem worldwide and almost 10% of the population is suffering and millions of people are dying every year because of chronic kidney disease. Numerous machine learning and data mining techniques are applied by many researchers around the world to diagnose the presence of chronic kidney disease, so that the patients of chronic kidney disease may get benefited in terms of getting proper healthcare follow-up. In this chapter, Experiment 1 is conducted by implementing different five different classifiers on the original chronic kidney disease dataset. In Experiment 2, feature selection using feature importance method is used to reduce the chronic kidney disease dataset. A subset of 15 independent features and one target feature ‘class' is obtained. Again, the same steps are implemented but on the newly obtained reduced dataset. The results of both the Experiments 1 and 2 are compared, and it is observed that the accuracy of classifiers with feature selection is far better than the accuracy of classifiers without feature selection.

Download Full-text

A novel hybrid feature selection method based on dynamic feature importance

Applied Soft Computing ◽

10.1016/j.asoc.2020.106337 ◽

2020 ◽

Vol 93 ◽

pp. 106337 ◽

Cited By ~ 3

Author(s):

Guangfen Wei ◽

Jie Zhao ◽

Yanli Feng ◽

Aixiang He ◽

Jun Yu

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Dynamic Feature ◽

Feature Importance

Download Full-text

Unsupervised Optimal Discriminant Vector Based Feature Selection Method

Mathematical Problems in Engineering ◽

10.1155/2013/396780 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Su-Qun Cao ◽

Jonathan H. Manton

Keyword(s):

Feature Selection ◽

Fault Diagnosis ◽

Objective Function ◽

Feature Selection Method ◽

Selection Method ◽

Feature Subset ◽

Fisher Criterion ◽

Feature Importance ◽

Discriminant Vector ◽

Class Labels

An efficient unsupervised feature selection method based on unsupervised optimal discriminant vector is developed to find the important features without using class labels. Features are ranked according to the feature importance measurement based on unsupervised optimal discriminant vector in the following steps. First, fuzzy Fisher criterion is adopted as objective function to derive the optimal discriminant vector in unsupervised pattern. Second, the feature importance measurement based on elements of unsupervised optimal discriminant vector is defined to determine the importance of each feature. The features with little importance measurement are removed from the feature subset. Experiments on UCI dataset and fault diagnosis are carried out to show that the proposed method is very efficient and able to deliver reliable results.

Download Full-text

Spam Review Classification Using Ensemble of Global and Local Feature Selectors

Cybernetics and Information Technologies ◽

10.2478/cait-2018-0046 ◽

2018 ◽

Vol 18 (4) ◽

pp. 29-42 ◽

Cited By ~ 1

Author(s):

Gunjan Ansari ◽

Tanvir Ahmad ◽

Mohammad Najmud Doja

Keyword(s):

Feature Selection ◽

Performance Metrics ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

Experimental Results ◽

High Dimensionality ◽

Spam Detection ◽

Feature Selector ◽

Global And Local

Abstract In our work, we propose an ensemble of local and global filter-based feature selection method to reduce the high dimensionality of feature space and increase accuracy of spam review classification. These selected features are then used for training various classifiers for spam detection. Experimental results with four classifiers on two available datasets of hotel reviews show that the proposed feature selector improves the performance of spam classification in terms of well-known performance metrics such as AUC score.

Download Full-text

Improvement of feature selection method in spam filtering

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.02812 ◽

2009 ◽

Vol 29 (10) ◽

pp. 2812-2815

Author(s):

Yang-zhu LU ◽

Xin-you ZHANG ◽

Yu QI

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Spam Filtering

Download Full-text

Feature Selection for Histopathological Image Classification using levy Flight Salp Swarm Optimizer

Recent Patents on Computer Science ◽

10.2174/2213275912666181210165129 ◽

2019 ◽

Vol 12 (4) ◽

pp. 329-337 ◽

Cited By ~ 2

Author(s):

Venubabu Rachapudi ◽

Golagani Lavanya Devi

Keyword(s):

Feature Selection ◽

Image Classification ◽

Feature Selection Method ◽

Selection Method ◽

Lévy Flight ◽

Levy Flight ◽

Local Optima ◽

Histopathological Image ◽

Surf Features ◽

Histopathological Image Classification

Background: An efficient feature selection method for Histopathological image classification plays an important role to eliminate irrelevant and redundant features. Therefore, this paper proposes a new levy flight salp swarm optimizer based feature selection method. Methods: The proposed levy flight salp swarm optimizer based feature selection method uses the levy flight steps for each follower salp to deviate them from local optima. The best solution returns the relevant and non-redundant features, which are fed to different classifiers for efficient and robust image classification. Results: The efficiency of the proposed levy flight salp swarm optimizer has been verified on 20 benchmark functions. The anticipated scheme beats the other considered meta-heuristic approaches. Furthermore, the anticipated feature selection method has shown better reduction in SURF features than other considered methods and performed well for histopathological image classification. Conclusion: This paper proposes an efficient levy flight salp Swarm Optimizer by modifying the step size of follower salp. The proposed modification reduces the chances of sticking into local optima. Furthermore, levy flight salp Swarm Optimizer has been utilized in the selection of optimum features from SURF features for the histopathological image classification. The simulation results validate that proposed method provides optimal values and high classification performance in comparison to other methods.

Download Full-text

The Effectiveness of the Fused Weighted Filter Feature Selection Method to Improve Software Fault Prediction

Journal of Communications Technology Electronics and Computer Science ◽

10.22385/jctecs.v8i0.96 ◽

2016 ◽

Vol 8 ◽

pp. 5 ◽

Cited By ~ 1

Author(s):

Fatemeh Alighardashi ◽

Mohammad Ali Zare Chahooki

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Algorithms ◽

Fault Prediction ◽

Filter Method ◽

Selection Methods ◽

Software Projects ◽

Software Fault Prediction ◽

Software Fault

Improving the software product quality before releasing by periodic tests is one of the most expensive activities in software projects. Due to limited resources to modules test in software projects, it is important to identify fault-prone modules and use the test sources for fault prediction in these modules. Software fault predictors based on machine learning algorithms, are effective tools for identifying fault-prone modules. Extensive studies are being done in this field to find the connection between features of software modules, and their fault-prone. Some of features in predictive algorithms are ineffective and reduce the accuracy of prediction process. So, feature selection methods to increase performance of prediction models in fault-prone modules are widely used. In this study, we proposed a feature selection method for effective selection of features, by using combination of filter feature selection methods. In the proposed filter method, the combination of several filter feature selection methods presented as fused weighed filter method. Then, the proposed method caused convergence rate of feature selection as well as the accuracy improvement. The obtained results on NASA and PROMISE with ten datasets, indicates the effectiveness of proposed method in improvement of accuracy and convergence of software fault prediction.

Download Full-text

A lazy feature selection method for multi-label classification

Intelligent Data Analysis ◽

10.3233/ida-194878 ◽

2021 ◽

Vol 25 (1) ◽

pp. 21-34

Author(s):

Rafael B. Pereira ◽

Alexandre Plastino ◽

Bianca Zadrozny ◽

Luiz H.C. Merschmann

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Selection Method ◽

Video Classification ◽

Classification Problems ◽

Class Label ◽

New Feature ◽

Feature Selection Techniques ◽

Biomolecular Analysis

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.

Download Full-text