Partial Classifier Chains with Feature Selection by Exploiting Label Correlation in Multi-Label Classification

Zhenwu Wang; Tielin Wang; Benting Wan; Mengjie Han

doi:10.3390/e22101143

Multi-Label Feature Selection Based on High-Order Label Correlation Assumption

Entropy ◽

10.3390/e22070797 ◽

2020 ◽

Vol 22 (7) ◽

pp. 797

Author(s):

Ping Zhang ◽

Wanfu Gao ◽

Juncheng Hu ◽

Yonghao Li

Keyword(s):

Feature Selection ◽

State Of The Art ◽

Classification Performance ◽

High Order ◽

Selection Methods ◽

Label Data ◽

Cumulative Summation ◽

Label Correlations ◽

Classification Information ◽

Correlation Term

Multi-label data often involve features with high dimensionality and complicated label correlations, resulting in a great challenge for multi-label learning. Feature selection plays an important role in multi-label learning to address multi-label data. Exploring label correlations is crucial for multi-label feature selection. Previous information-theoretical-based methods employ the strategy of cumulative summation approximation to evaluate candidate features, which merely considers low-order label correlations. In fact, there exist high-order label correlations in label set, labels naturally cluster into several groups, similar labels intend to cluster into the same group, different labels belong to different groups. However, the strategy of cumulative summation approximation tends to select the features related to the groups containing more labels while ignoring the classification information of groups containing less labels. Therefore, many features related to similar labels are selected, which leads to poor classification performance. To this end, Max-Correlation term considering high-order label correlations is proposed. Additionally, we combine the Max-Correlation term with feature redundancy term to ensure that selected features are relevant to different label groups. Finally, a new method named Multi-label Feature Selection considering Max-Correlation (MCMFS) is proposed. Experimental results demonstrate the classification superiority of MCMFS in comparison to eight state-of-the-art multi-label feature selection methods.

Download Full-text

MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451392 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-24

Author(s):

Yaojin Lin ◽

Qinghua Hu ◽

Jinghua Liu ◽

Xingquan Zhu ◽

Xindong Wu

Keyword(s):

Empirical Studies ◽

Feature Space ◽

Training Data ◽

Data Sets ◽

Learning Framework ◽

Feature Spaces ◽

Public Data ◽

Margin Distribution ◽

Label Correlations ◽

Label Correlation

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.

Download Full-text

Protein Remote Homology Detection Based on an Ensemble Learning Approach

BioMed Research International ◽

10.1155/2016/5813645 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 3

Author(s):

Junjie Chen ◽

Bingquan Liu ◽

Dong Huang

Keyword(s):

State Of The Art ◽

Predictive Performance ◽

Ensemble Classifier ◽

Homology Detection ◽

Weighted Voting ◽

Remote Homology ◽

Sequence Composition ◽

Feature Spaces ◽

Voting Strategy ◽

Remote Homology Detection

Protein remote homology detection is one of the central problems in bioinformatics. Although some computational methods have been proposed, the problem is still far from being solved. In this paper, an ensemble classifier for protein remote homology detection, called SVM-Ensemble, was proposed with a weighted voting strategy. SVM-Ensemble combined three basic classifiers based on different feature spaces, including Kmer, ACC, and SC-PseAAC. These features consider the characteristics of proteins from various perspectives, incorporating both the sequence composition and the sequence-order information along the protein sequences. Experimental results on a widely used benchmark dataset showed that the proposed SVM-Ensemble can obviously improve the predictive performance for the protein remote homology detection. Moreover, it achieved the best performance and outperformed other state-of-the-art methods.

Download Full-text

An Effective Multi-Label Feature Selection Model Towards Eliminating Noisy Features

Applied Sciences ◽

10.3390/app10228093 ◽

2020 ◽

Vol 10 (22) ◽

pp. 8093

Author(s):

Jun Wang ◽

Yuanyuan Xu ◽

Hengpeng Xu ◽

Zhe Sun ◽

Zhenglu Yang ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Performance ◽

Space Structures ◽

Learning Tasks ◽

Feature Spaces ◽

Selection Approach ◽

Label Correlations ◽

Feature Selection Approach ◽

Low Dimensional

Feature selection has devoted a consistently great amount of effort to dimension reduction for various machine learning tasks. Existing feature selection models focus on selecting the most discriminative features for learning targets. However, this strategy is weak in handling two kinds of features, that is, the irrelevant and redundant ones, which are collectively referred to as noisy features. These features may hamper the construction of optimal low-dimensional subspaces and compromise the learning performance of downstream tasks. In this study, we propose a novel multi-label feature selection approach by embedding label correlations (dubbed ELC) to address these issues. Particularly, we extract label correlations for reliable label space structures and employ them to steer feature selection. In this way, label and feature spaces can be expected to be consistent and noisy features can be effectively eliminated. An extensive experimental evaluation on public benchmarks validated the superiority of ELC.

Download Full-text

Genetic Programming based Feature Manipulation for Skin Cancer Image Classification

10.26686/wgtn.17151719.v1 ◽

2021 ◽

Author(s):

◽

~ Qurrat Ul Ain

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Image Classification ◽

State Of The Art ◽

Classification Performance ◽

Image Features ◽

Feature Construction ◽

Classification Methods ◽

Wrapper Approach ◽

High Level

<p>Skin image classification involves the development of computational methods for solving problems such as cancer detection in lesion images, and their use for biomedical research and clinical care. Such methods aim at extracting relevant information or knowledge from skin images that can significantly assist in the early detection of disease. Skin images are enormous, and come with various artifacts that hinder effective feature extraction leading to inaccurate classification. Feature selection and feature construction can significantly reduce the amount of data while improving classification performance by selecting prominent features and constructing high-level features. Existing approaches mostly rely on expert intervention and follow multiple stages for pre-processing, feature extraction, and classification, which decreases the reliability, and increases the computational complexity. Since good generalization accuracy is not always the primary objective, clinicians are also interested in analyzing specific features such as pigment network, streaks, and blobs responsible for developing the disease; interpretable methods are favored. In Evolutionary Computation, Genetic Programming (GP) can automatically evolve an interpretable model and address the curse of dimensionality (through feature selection and construction). GP has been successfully applied to many areas, but its potential for feature selection, feature construction, and classification in skin images has not been thoroughly investigated. The overall goal of this thesis is to develop a new GP approach to skin image classification by utilizing GP to evolve programs that are capable of automatically selecting prominent image features, constructing new high level features, interpreting useful image features which can help dermatologist to diagnose a type of cancer, and are robust to processing skin images captured from specialized instruments and standard cameras. This thesis focuses on utilizing a wide range of texture, color, frequency-based, local, and global image properties at the terminal nodes of GP to classify skin cancer images from multiple modalities effectively. This thesis develops new two-stage GP methods using embedded and wrapper feature selection and construction approaches to automatically generating a feature vector of selected and constructed features for classification. The results show that wrapper approach outperforms the embedded approach, the existing baseline GP and other machine learning methods, but the embedded approach is faster than the wrapper approach. This thesis develops a multi-tree GP based embedded feature selection approach for melanoma detection using domain specific and domain independent features. It explores suitable crossover and mutation operators to evolve GP classifiers effectively and further extends this approach using a weighted fitness function. The results show that these multi-tree approaches outperformed single tree GP and other classification methods. They identify that a specific feature extraction method extracts most suitable features for particular images taken from a specific optical instrument. This thesis develops the first GP method utilizing frequency-based wavelet features, where the wrapper based feature selection and construction methods automatically evolve useful constructed features to improve the classification performance. The results show the evidence of successful feature construction by significantly outperforming existing GP approaches, state-of-the-art CNN, and other classification methods. This thesis develops a GP approach to multiple feature construction for ensemble learning in classification. The results show that the ensemble method outperformed existing GP approaches, state-of-the-art skin image classification, and commonly used ensemble methods. Further analysis of the evolved constructed features identified important image features that can potentially help the dermatologist identify further medical procedures in real-world situations.</p>

Download Full-text

Genetic Algorithm Based Feature Selection In a Recognition Scheme Using Adaptive Neuro Fuzzy Techniques

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2010.4.2495 ◽

2010 ◽

Vol 5 (4) ◽

pp. 458 ◽

Cited By ~ 4

Author(s):

Mahua Bhattacharya ◽

Arpita Das

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Feature Space ◽

Classification Performance ◽

Significant Feature ◽

Feature Subset ◽

Neuro Fuzzy ◽

Feature Spaces ◽

Fuzzy Techniques

The problem of feature selection consists of finding a significant feature subset of input training as well as test patterns that enable to describe all information required to classify a particular pattern. In present paper we focus in this particular problem which plays a key role in machine learning problems. In fact, before building a model for feature selection, our goal is to identify and to reject the features that degrade the classification performance of a classifier. This is especially true when the available input feature space is very large, and need exists to develop an efficient searching algorithm to combine these features spaces to a few significant one which are capable to represent that particular class. Presently, authors have described two approaches for combining the large feature spaces to efficient numbers using Genetic Algorithm and Fuzzy Clustering techniques. Finally the classification of patterns has been achieved using adaptive neuro-fuzzy techniques. The aim of entire work is to implement the recognition scheme for classification of tumor lesions appearing in human brain as space occupying lesions identified by CT and MR images. A part of the work has been presented in this paper. The proposed model indicates a promising direction for adaptation in a changing environment.

Download Full-text

Genetic Programming based Feature Manipulation for Skin Cancer Image Classification

10.26686/wgtn.17151719 ◽

2021 ◽

Author(s):

◽

~ Qurrat Ul Ain

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Image Classification ◽

State Of The Art ◽

Classification Performance ◽

Image Features ◽

Feature Construction ◽

Classification Methods ◽

Wrapper Approach ◽

High Level

<p>Skin image classification involves the development of computational methods for solving problems such as cancer detection in lesion images, and their use for biomedical research and clinical care. Such methods aim at extracting relevant information or knowledge from skin images that can significantly assist in the early detection of disease. Skin images are enormous, and come with various artifacts that hinder effective feature extraction leading to inaccurate classification. Feature selection and feature construction can significantly reduce the amount of data while improving classification performance by selecting prominent features and constructing high-level features. Existing approaches mostly rely on expert intervention and follow multiple stages for pre-processing, feature extraction, and classification, which decreases the reliability, and increases the computational complexity. Since good generalization accuracy is not always the primary objective, clinicians are also interested in analyzing specific features such as pigment network, streaks, and blobs responsible for developing the disease; interpretable methods are favored. In Evolutionary Computation, Genetic Programming (GP) can automatically evolve an interpretable model and address the curse of dimensionality (through feature selection and construction). GP has been successfully applied to many areas, but its potential for feature selection, feature construction, and classification in skin images has not been thoroughly investigated. The overall goal of this thesis is to develop a new GP approach to skin image classification by utilizing GP to evolve programs that are capable of automatically selecting prominent image features, constructing new high level features, interpreting useful image features which can help dermatologist to diagnose a type of cancer, and are robust to processing skin images captured from specialized instruments and standard cameras. This thesis focuses on utilizing a wide range of texture, color, frequency-based, local, and global image properties at the terminal nodes of GP to classify skin cancer images from multiple modalities effectively. This thesis develops new two-stage GP methods using embedded and wrapper feature selection and construction approaches to automatically generating a feature vector of selected and constructed features for classification. The results show that wrapper approach outperforms the embedded approach, the existing baseline GP and other machine learning methods, but the embedded approach is faster than the wrapper approach. This thesis develops a multi-tree GP based embedded feature selection approach for melanoma detection using domain specific and domain independent features. It explores suitable crossover and mutation operators to evolve GP classifiers effectively and further extends this approach using a weighted fitness function. The results show that these multi-tree approaches outperformed single tree GP and other classification methods. They identify that a specific feature extraction method extracts most suitable features for particular images taken from a specific optical instrument. This thesis develops the first GP method utilizing frequency-based wavelet features, where the wrapper based feature selection and construction methods automatically evolve useful constructed features to improve the classification performance. The results show the evidence of successful feature construction by significantly outperforming existing GP approaches, state-of-the-art CNN, and other classification methods. This thesis develops a GP approach to multiple feature construction for ensemble learning in classification. The results show that the ensemble method outperformed existing GP approaches, state-of-the-art skin image classification, and commonly used ensemble methods. Further analysis of the evolved constructed features identified important image features that can potentially help the dermatologist identify further medical procedures in real-world situations.</p>

Download Full-text

Comparing biological information contained in mRNA and non-coding RNAs for classification of lung cancer patients

BMC Cancer ◽

10.1186/s12885-019-6338-1 ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 2

Author(s):

Johannes Smolander ◽

Alexey Stupnikov ◽

Galina Glazko ◽

Matthias Dehmer ◽

Frank Emmert-Streib

Keyword(s):

Lung Cancer ◽

Feature Selection ◽

State Of The Art ◽

Classification Performance ◽

Belief Networks ◽

Classification Methods ◽

Deep Belief Networks ◽

Protein Coding ◽

Non Coding Rnas

Abstract Background Deciphering the meaning of the human DNA is an outstanding goal which would revolutionize medicine and our way for treating diseases. In recent years, non-coding RNAs have attracted much attention and shown to be functional in part. Yet the importance of these RNAs especially for higher biological functions remains under investigation. Methods In this paper, we analyze RNA-seq data, including non-coding and protein coding RNAs, from lung adenocarcinoma patients, a histologic subtype of non-small-cell lung cancer, with deep learning neural networks and other state-of-the-art classification methods. The purpose of our paper is three-fold. First, we compare the classification performance of different versions of deep belief networks with SVMs, decision trees and random forests. Second, we compare the classification capabilities of protein coding and non-coding RNAs. Third, we study the influence of feature selection on the classification performance. Results As a result, we find that deep belief networks perform at least competitively to other state-of-the-art classifiers. Second, data from non-coding RNAs perform better than coding RNAs across a number of different classification methods. This demonstrates the equivalence of predictive information as captured by non-coding RNAs compared to protein coding RNAs, conventionally used in computational diagnostics tasks. Third, we find that feature selection has in general a negative effect on the classification performance which means that unfiltered data with all features give the best classification results. Conclusions Our study is the first to use ncRNAs beyond miRNAs for the computational classification of cancer and for performing a direct comparison of the classification capabilities of protein coding RNAs and non-coding RNAs.

Download Full-text

Feature selection of the armature winding broken coils in synchronous motor using genetic algorithm and mahalanobis distance

Archives of Metallurgy and Materials ◽

10.2478/v10172-012-0091-7 ◽

2012 ◽

Vol 57 (3) ◽

pp. 829-835 ◽

Cited By ~ 1

Author(s):

Z. Głowacz ◽

J. Kozik

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Mahalanobis Distance ◽

Distance Measure ◽

Synchronous Motor ◽

Medical Diagnostics ◽

Motor Current ◽

Feature Spaces ◽

Multidimensional Feature Spaces ◽

Selection Of

The paper describes a procedure for automatic selection of symptoms accompanying the break in the synchronous motor armature winding coils. This procedure, called the feature selection, leads to choosing from a full set of features describing the problem, such a subset that would allow the best distinguishing between healthy and damaged states. As the features the spectra components amplitudes of the motor current signals were used. The full spectra of current signals are considered as the multidimensional feature spaces and their subspaces are tested. Particular subspaces are chosen with the aid of genetic algorithm and their goodness is tested using Mahalanobis distance measure. The algorithm searches for such a subspaces for which this distance is the greatest. The algorithm is very efficient and, as it was confirmed by research, leads to good results. The proposed technique is successfully applied in many other fields of science and technology, including medical diagnostics.

Download Full-text

An Optimized Approach for Breast Cancer Classification for Histopathological Images Based on Hybrid Feature Set

Current Medical Imaging Formerly Current Medical Imaging Reviews ◽

10.2174/1573405616666200423085826 ◽

2020 ◽

Vol 16 ◽

Cited By ~ 1

Author(s):

Inzamam Mashood Nasir ◽

Muhammad Rashid ◽

Jamal Hussain Shah ◽

Muhammad Sharif ◽

Muhammad Yahiya Haider Awan ◽

...

Keyword(s):

Breast Cancer ◽

Cancer Detection ◽

State Of The Art ◽

Hybrid Approach ◽

Classification Performance ◽

Diagnose Breast Cancer ◽

Histopathological Images ◽

And Performance ◽

Learned Features ◽

Intelligent Healthcare

Background: Breast cancer is considered as the most perilous sickness among females worldwide and the ratio of new cases is expanding yearly. Many researchers have proposed efficient algorithms to diagnose breast cancer at early stages, which have increased the efficiency and performance by utilizing the learned features of gold standard histopathological images. Objective: Most of these systems have either used traditional handcrafted features or deep features which had a lot of noise and redundancy, which ultimately decrease the performance of the system. Methods: A hybrid approach is proposed by fusing and optimizing the properties of handcrafted and deep features to classify the breast cancer images. HOG and LBP features are serially fused with pretrained models VGG19 and InceptionV3. PCR and ICR are used to evaluate the classification performance of proposed method. Results: The method concentrates on histopathological images to classify the breast cancer. The performance is compared with state-of-the-art techniques, where an overall patient-level accuracy of 97.2% and image-level accuracy of 96.7% is recorded. Conclusion: The proposed hybrid method achieves the best performance as compared to previous methods and it can be used for the intelligent healthcare systems and early breast cancer detection.

Download Full-text