Combining Multiple Feature-Ranking Techniques and Clustering of Variables for Feature Selection

Genetic Programming for Biomarker Detection in Classification of Mass Spectrometry Data

10.26686/wgtn.17013680 ◽

2021 ◽

Author(s):

◽

Soha Ahmed

Keyword(s):

Mass Spectrometry ◽

Feature Selection ◽

Classification Performance ◽

Feature Ranking ◽

Feature Construction ◽

Biomarker Detection ◽

Multi Objective ◽

Multiple Feature ◽

High Level

<p>Mass spectrometry (MS) is currently the most commonly used technology in biochemical research for proteomic analysis. The primary goal of proteomic profiling using mass spectrometry is the classification of samples from different experimental states. To classify the MS samples, the identification of protein or peptides (biomarker detection) that are expressed differently between the classes, is required. However, due to the high dimensionality of the data and the small number of samples, classification of MS data is extremely challenging. Another important aspect of biomarker detection is the verification of the detected biomarker that acts as an intermediate step before passing these biomarkers to the experimental validation stage. Biomarker detection aims at altering the input space of the learning algorithm for improving classification of proteomic or metabolomic data. This task is performed through feature manipulation. Feature manipulation consists of three aspects: feature ranking, feature selection, and feature construction. Genetic programming (GP) is an evolutionary computation algorithm that has the intrinsic capability for the three aspects of feature manipulation. The ability of GP for feature manipulation in proteomic biomarker discovery has not been fully investigated. This thesis, therefore, proposes an embedded methodology for these three aspects of feature manipulation in high dimensional MS data using GP. The thesis also presents a method for biomarker verification, using GP. The thesis investigates the use of GP for both single-objective and multi-objective feature selection and construction. In feature ranking, the thesis proposes a GP-based method for ranking subsets of features by using GP as an ensemble approach. The proposed algorithm uses GP capability to combine the advantages of different feature ranking metrics and evolve a new ranking scheme for the subset of the features selected from the top ranked features. The capability of GP as a classifier is also investigated by this method. The results show that GP can select a smaller number of features and provide a better ranking of the selected features, which can improve the classification performance of five classifiers. In feature construction, this thesis proposes a novel multiple feature construction method, which uses a single GP tree to generate a new set of high-level features from the original set of selected features. The results show that the proposed new algorithm outperforms two feature selection algorithms. In feature selection, the thesis introduces the first GP multi-objective method for biomarker detection, which simultaneously increase the classification accuracy and reduce the number of detected features. The proposed multi-objective method can obtain better subsets of features than the single-objective algorithm and two traditional multi-objective approaches for feature selection. This thesis also develops the first multi-objective multiple feature construction algorithm for MS data. The proposed method aims at both maximising the classification performance and minimizing the cardinality of the constructed new high-level features. The results show that GP can dis- cover the complex relationships between the features and can significantly improve classification performance and reduce the cardinality. For biomarker verification, the thesis proposes the first GP biomarker verification method through measuring the peptide detectability. The method solves the imbalance problem in the data and shows improvement over the benchmark algorithms. Also, the algorithm outperforms a well-known peptide detection method. The thesis also introduces a new GP method for alignment of MS data as a preprocessing stage, which will further help in improving the biomarker detection process.</p>

Download Full-text

Genetic Programming for Biomarker Detection in Classification of Mass Spectrometry Data

10.26686/wgtn.17013680.v1 ◽

2021 ◽

Author(s):

◽

Soha Ahmed

Keyword(s):

Mass Spectrometry ◽

Feature Selection ◽

Classification Performance ◽

Feature Ranking ◽

Feature Construction ◽

Biomarker Detection ◽

Multi Objective ◽

Multiple Feature ◽

High Level

<p>Mass spectrometry (MS) is currently the most commonly used technology in biochemical research for proteomic analysis. The primary goal of proteomic profiling using mass spectrometry is the classification of samples from different experimental states. To classify the MS samples, the identification of protein or peptides (biomarker detection) that are expressed differently between the classes, is required. However, due to the high dimensionality of the data and the small number of samples, classification of MS data is extremely challenging. Another important aspect of biomarker detection is the verification of the detected biomarker that acts as an intermediate step before passing these biomarkers to the experimental validation stage. Biomarker detection aims at altering the input space of the learning algorithm for improving classification of proteomic or metabolomic data. This task is performed through feature manipulation. Feature manipulation consists of three aspects: feature ranking, feature selection, and feature construction. Genetic programming (GP) is an evolutionary computation algorithm that has the intrinsic capability for the three aspects of feature manipulation. The ability of GP for feature manipulation in proteomic biomarker discovery has not been fully investigated. This thesis, therefore, proposes an embedded methodology for these three aspects of feature manipulation in high dimensional MS data using GP. The thesis also presents a method for biomarker verification, using GP. The thesis investigates the use of GP for both single-objective and multi-objective feature selection and construction. In feature ranking, the thesis proposes a GP-based method for ranking subsets of features by using GP as an ensemble approach. The proposed algorithm uses GP capability to combine the advantages of different feature ranking metrics and evolve a new ranking scheme for the subset of the features selected from the top ranked features. The capability of GP as a classifier is also investigated by this method. The results show that GP can select a smaller number of features and provide a better ranking of the selected features, which can improve the classification performance of five classifiers. In feature construction, this thesis proposes a novel multiple feature construction method, which uses a single GP tree to generate a new set of high-level features from the original set of selected features. The results show that the proposed new algorithm outperforms two feature selection algorithms. In feature selection, the thesis introduces the first GP multi-objective method for biomarker detection, which simultaneously increase the classification accuracy and reduce the number of detected features. The proposed multi-objective method can obtain better subsets of features than the single-objective algorithm and two traditional multi-objective approaches for feature selection. This thesis also develops the first multi-objective multiple feature construction algorithm for MS data. The proposed method aims at both maximising the classification performance and minimizing the cardinality of the constructed new high-level features. The results show that GP can dis- cover the complex relationships between the features and can significantly improve classification performance and reduce the cardinality. For biomarker verification, the thesis proposes the first GP biomarker verification method through measuring the peptide detectability. The method solves the imbalance problem in the data and shows improvement over the benchmark algorithms. Also, the algorithm outperforms a well-known peptide detection method. The thesis also introduces a new GP method for alignment of MS data as a preprocessing stage, which will further help in improving the biomarker detection process.</p>

Download Full-text

Ranked MSD: A New Feature Ranking and Feature Selection Approach for Biomarker Identification

Lecture Notes in Computer Science - Machine Learning and Knowledge Extraction ◽

10.1007/978-3-030-29726-8_10 ◽

2019 ◽

pp. 147-167

Author(s):

Ghanshyam Verma ◽

Alokkumar Jha ◽

Dietrich Rebholz-Schuhmann ◽

Michael G. Madden

Keyword(s):

Feature Selection ◽

Feature Ranking ◽

Biomarker Identification ◽

Selection Approach ◽

New Feature ◽

Feature Selection Approach

Download Full-text

PET Imaging of Tau Pathology and Amyloid-β, and MRI for Alzheimer’s Disease Feature Fusion and Multimodal Classification

Journal of Alzheimer s Disease ◽

10.3233/jad-210064 ◽

2021 ◽

pp. 1-18

Author(s):

Mehdi Shojaie ◽

Solale Tabarestani ◽

Mercedes Cabrerizo ◽

Steven T. DeKosky ◽

David E. Vaillancourt ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Feature Selection ◽

Mutual Information ◽

Feature Fusion ◽

Early Stage ◽

Model Performance ◽

Amyloid Β ◽

Feature Ranking ◽

Promising Tool

Background: Machine learning is a promising tool for biomarker-based diagnosis of Alzheimer’s disease (AD). Performing multimodal feature selection and studying the interaction between biological and clinical AD can help to improve the performance of the diagnosis models. Objective: This study aims to formulate a feature ranking metric based on the mutual information index to assess the relevance and redundancy of regional biomarkers and improve the AD classification accuracy. Methods: From the Alzheimer’s Disease Neuroimaging Initiative (ADNI), 722 participants with three modalities, including florbetapir-PET, flortaucipir-PET, and MRI, were studied. The multivariate mutual information metric was utilized to capture the redundancy and complementarity of the predictors and develop a feature ranking approach. This was followed by evaluating the capability of single-modal and multimodal biomarkers in predicting the cognitive stage. Results: Although amyloid-β deposition is an earlier event in the disease trajectory, tau PET with feature selection yielded a higher early-stage classification F1-score (65.4%) compared to amyloid-β PET (63.3%) and MRI (63.2%). The SVC multimodal scenario with feature selection improved the F1-score to 70.0% and 71.8% for the early and late-stage, respectively. When age and risk factors were included, the scores improved by 2 to 4%. The Amyloid-Tau-Neurodegeneration [AT(N)] framework helped to interpret the classification results for different biomarker categories. Conclusion: The results underscore the utility of a novel feature selection approach to reduce the dimensionality of multimodal datasets and enhance model performance. The AT(N) biomarker framework can help to explore the misclassified cases by revealing the relationship between neuropathological biomarkers and cognition.

Download Full-text

Intra-Inter Feature Ranking based Feature Selection Method for Bearing Fault Classification

2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) ◽

10.1109/icccnt45670.2019.8944884 ◽

2019 ◽

Author(s):

Sandeep S. Udmale ◽

Sanjay Kumar Singh

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Fault Classification ◽

Feature Ranking ◽

Bearing Fault

Download Full-text

An Adaptive Multiple Feature Subset Method for Feature Ranking and Selection

2010 International Conference on Technologies and Applications of Artificial Intelligence ◽

10.1109/taai.2010.50 ◽

2010 ◽

Cited By ~ 1

Author(s):

Fu Chang ◽

Jen-Cheng Chen

Keyword(s):

Ranking And Selection ◽

Feature Ranking ◽

Feature Subset ◽

Multiple Feature

Download Full-text

AdaBoost Multiple Feature Selection and Combination for Face Recognition

Pattern Recognition and Image Analysis - Lecture Notes in Computer Science ◽

10.1007/978-3-642-02172-5_44 ◽

2009 ◽

pp. 338-345 ◽

Cited By ~ 2

Author(s):

Francisco Martínez-Contreras ◽

Carlos Orrite-Uruñuela ◽

Jesús Martínez-del-Rincón

Keyword(s):

Feature Selection ◽

Face Recognition ◽

Multiple Feature

Download Full-text

Generalized N-dimensional independent component analysis and its application to multiple feature selection and fusion for image classification

Neurocomputing ◽

10.1016/j.neucom.2012.09.020 ◽

2013 ◽

Vol 103 ◽

pp. 186-197 ◽

Cited By ~ 5

Author(s):

Danni Ai ◽

Guifang Duan ◽

Xianhua Han ◽

Yen-Wei Chen

Keyword(s):

Feature Selection ◽

Independent Component Analysis ◽

Image Classification ◽

Component Analysis ◽

Independent Component ◽

Multiple Feature

Download Full-text

A NEW ENSEMBLE METHOD FOR FEATURE RANKING IN TEXT MINING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500103 ◽

2013 ◽

Vol 22 (03) ◽

pp. 1350010 ◽

Cited By ~ 6

Author(s):

SABEREH SADEGHI ◽

HAMID BEIGY

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Dimensionality Reduction ◽

Ensemble Methods ◽

Ease Of Use ◽

Convergence Time ◽

Feature Ranking ◽

Initial Population ◽

Feature Subset ◽

Ranking Methods

Dimensionality reduction is a necessary task in data mining when working with high dimensional data. A type of dimensionality reduction is feature selection. Feature selection based on feature ranking has received much attention by researchers. The major reasons are its scalability, ease of use, and fast computation. Feature ranking methods can be divided into different categories and may use different measures for ranking features. Recently, ensemble methods have entered in the field of ranking and achieved more accuracy among others. Accordingly, in this paper a Heterogeneous ensemble based algorithm for feature ranking is proposed. The base ranking methods in this ensemble structure are chosen from different categories like information theoretic, distance based, and statistical methods. The results of the base ranking methods are then fused into a final feature subset by means of genetic algorithm. The diversity of the base methods improves the quality of initial population of the genetic algorithm and thus reducing the convergence time of the genetic algorithm. In most of ranking methods, it's the user's task to determine the threshold for choosing the appropriate subset of features. It is a problem, which may cause the user to try many different values to select a good one. In the proposed algorithm, the difficulty of determining a proper threshold by the user is decreased. The performance of the algorithm is evaluated on four different text datasets and the experimental results show that the proposed method outperforms all other five feature ranking methods used for comparison. One advantage of the proposed method is that it is independent to the classification method used for classification.

Download Full-text

Novel feature ranking criteria for interval valued feature selection

2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI) ◽

10.1109/icacci.2016.7732039 ◽

2016 ◽

Cited By ~ 4

Author(s):

D S Guru ◽

N Vinay Kumar

Keyword(s):

Feature Selection ◽

Feature Ranking ◽

Ranking Criteria ◽

Interval Valued

Download Full-text