Integrating Biological Information for Feature Selection in Microarray Data Classification

A Fusion-Based Feature Selection Framework for Microarray Data Classification

Lecture Notes on Data Engineering and Communications Technologies - Innovative Systems for Intelligent Health Informatics ◽

10.1007/978-3-030-70713-2_52 ◽

2021 ◽

pp. 565-576

Author(s):

Talal Almutiri ◽

Faisal Saeed ◽

Manar Alassaf ◽

Essa Abdullah Hezzam

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Data Classification ◽

Selection Framework

Download Full-text

Approach to The Selection of Significant Features in Solving Biomedical Problems of Binary Classification of Microarray Data

Математическая биология и биоинформатика ◽

10.17537/2020.15.4 ◽

2020 ◽

Vol 15 (1) ◽

pp. 4-19 ◽

Cited By ~ 1

Author(s):

I.Y. Boyko ◽

D.S. Anisimov ◽

L.L. Smolyakova ◽

M.A. Ryazanov

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Binary Classification ◽

Biological Information ◽

Features Selection ◽

Data Set ◽

Early Diagnosis Of Cancer ◽

Latent Structures ◽

Classification Quality ◽

Selection Of

In modern biomedical research aimed at finding methods for early diagnosis of cancer, microarrays containing certain biological information about patients are used. Based on these data, patients are assigned to one of two classes, corresponding to the presence and absence of some diagnosis. When solving this problem, one of the steps that have a decisive influence on the quality of classification is the significant features selection. This paper proposes a criterion for the selection of significant features, based on the ledge-coefficient of correlation. The ledge-coefficient was previously used to estimate the degree of interrelation of numerical and binary features. For two sets of microarray data, comparative examples of their binary classification are presented using three feature selection algorithms, three dimensionality reduction methods, six classification models. The use of the ledge-criterion for feature selection made it possible to obtain a classification quality comparable to the results of using common methods of feature selection, such as t-test and U-test. For the data set of the peptide microarrays considered in the paper, the effectiveness of applying the projection method to latent structures had previously been identified. The use of this method in combination with the significant features’ selection using the ledge-criterion made it possible to obtain a higher classification quality measure.

Download Full-text

Comparative analysis of ReliefF-SVM and CFS-SVM for microarray data classification

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i4.pp3393-3402 ◽

2021 ◽

Vol 11 (4) ◽

pp. 3393

Author(s):

Mochamad Agusta Naofal Hakim ◽

Adiwijaya Adiwijaya ◽

Widi Astuti

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Causes Of Death ◽

Data Classification ◽

World Health ◽

Support Vector ◽

Feature Selection Technique ◽

Cancer Disease ◽

Cancer Symptoms ◽

The World

Cancer is one of the main causes of death in the world where the World Health Organization (WHO) recognized cancer as among the top causes of death in 2018. Thus, detecting cancer symptoms is paramount in order to cure and subsequently reduce the casualties due to cancer disease. Many studies have been developed data mining approaches to detect symptoms of cancer through a classifying human gene data expression. One popular approach is using microarray data based on DNA. However, DNA microarray data has many dimensions that can have a detrimental effect on the accuracy of classification. Therefore, before performing classification, a feature selection technique must be used to eliminate features that do not have important information to support the classification process. The feature selection techniques used were ReliefF and correlation-based feature selection (CFS) and a classification technique used in this study is support vector machine (SVM). Several testing schemes were applied in this analysis to compare the performance of ReliefF and CFS with SVM. It showed that the ReliefF outperformed compared with CFS as microarray data classification approach.

Download Full-text

A novel feature selection method for microarray data classification based on hidden Markov model

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2019.103213 ◽

2019 ◽

Vol 95 ◽

pp. 103213 ◽

Cited By ~ 2

Author(s):

Mohammadreza Momenzadeh ◽

Mohammadreza Sehhati ◽

Hossein Rabbani

Keyword(s):

Feature Selection ◽

Markov Model ◽

Hidden Markov Model ◽

Microarray Data ◽

Hidden Markov ◽

Feature Selection Method ◽

Data Classification ◽

Selection Method

Download Full-text

Feature selection in independent component subspace for microarray data classification

Neurocomputing ◽

10.1016/j.neucom.2006.02.006 ◽

2006 ◽

Vol 69 (16-18) ◽

pp. 2407-2410 ◽

Cited By ~ 53

Author(s):

Chun-Hou Zheng ◽

De-Shuang Huang ◽

Li Shang

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Data Classification ◽

Independent Component

Download Full-text

Microarray Data Classification Based on Evolutionary Multiple Classifier System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.130-134.2077 ◽

2011 ◽

Vol 130-134 ◽

pp. 2077-2080

Author(s):

Zheng Gang Gu ◽

Kun Hong Liu

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Selection Process ◽

Data Classification ◽

Research Area ◽

Ensemble Method ◽

Multiple Classifier System ◽

Classifier System ◽

Classifier Selection ◽

Multiple Classifier

Designing an evolutionary multiple classifier system (MCS) is a relatively new research area. In this paper, we propose a genetic algorithm (GA) based MCS for microarray data classification. We construct a feature poll with different feature selection methods first, and then a multi-objective GA is applied to implement ensemble feature selection process so as to generate a set of classifiers. When this GA stops, a set of base classifiers are generated. Here we use all the nondominated individuals in last generation to build an ensemble system and test the proposed ensemble method and the method that apply a classifier selection process to select proper classifiers from all the individuals in last generation. The experimental results show the proposed ensemble method is roubust and can lead to promising results.

Download Full-text

Distributed feature selection: An application to microarray data classification

Applied Soft Computing ◽

10.1016/j.asoc.2015.01.035 ◽

2015 ◽

Vol 30 ◽

pp. 136-150 ◽

Cited By ~ 82

Author(s):

V. Bolón-Canedo ◽

N. Sánchez-Maroño ◽

A. Alonso-Betanzos

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Data Classification

Download Full-text

A novel feature extraction approach based on ensemble feature selection and modified discriminant independent component analysis for microarray data classification

Journal of Applied Biomedicine ◽

10.1016/j.bbe.2016.05.001 ◽

2016 ◽

Vol 36 (3) ◽

pp. 521-529 ◽

Cited By ~ 23

Author(s):

Maryam Mollaee ◽

Mohammad Hossein Moattar

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Independent Component Analysis ◽

Microarray Data ◽

Data Classification ◽

Component Analysis ◽

Independent Component

Download Full-text

GENE EXPRESSION DATA CLASSIFICATION COMBINING HIERARCHICAL REPRESENTATION AND EFFICIENT FEATURE SELECTION

Journal of Biological System ◽

10.1142/s0218339012400025 ◽

2012 ◽

Vol 20 (04) ◽

pp. 349-375 ◽

Cited By ~ 5

Author(s):

MATTIA BOSIO ◽

PAU BELLOT ◽

PHILIPPE SALEMBIER ◽

ALBERT OLIVERAS-VERGÉS

Keyword(s):

Feature Selection ◽

Error Rate ◽

Selection Process ◽

Enrichment Analysis ◽

Data Classification ◽

Gene Set Enrichment Analysis ◽

Biological Information ◽

Study Phase ◽

Forward Selection ◽

Linear Discriminant

A general framework for microarray data classification is proposed in this paper. It produces precise and reliable classifiers through a two-step approach. At first, the original feature set is enhanced by a new set of features called metagenes. These new features are obtained through a hierarchical clustering process on the original data. Two different metagene generation rules have been analyzed, called Treelets clustering and Euclidean clustering. Metagenes creation is attractive for several reasons: first, they can improve the classification since they broaden the available feature space and capture the common behavior of similar genes reducing the residual measurement noise. Furthermore, by analyzing some of the chosen metagenes for classification with gene set enrichment analysis algorithms, it is shown how metagenes can summarize the behavior of functionally related probe sets. Additionally, metagenes can point out, still undocumented, highly discriminant probe sets numerically related to other probes endowed with prior biological information in order to contribute to the knowledge discovery process. The second step of the framework is the feature selection which applies the Improved Sequential Floating Forward Selection algorithm (IFFS) to properly choose a subset from the available feature set for classification composed of genes and metagenes. Considering the microarray sample scarcity problem, besides the classical error rate, a reliability measure is introduced to improve the feature selection process. Different scoring schemes are studied to choose the best one using both error rate and reliability. The Linear Discriminant Analysis classifier (LDA) has been used throughout this work, due to its good characteristics, but the proposed framework can be used with almost any classifier. The potential of the proposed framework has been evaluated analyzing all the publicly available datasets offered by the Micro Array Quality Control Study, phase II (MAQC). The comparative results showed that the proposed framework can compete with a wide variety of state of the art alternatives and it can obtain the best mean performance if a particular setup is chosen. A Monte Carlo simulation confirmed that the proposed framework obtains stable and repeatable results.

Download Full-text

Comparison of population based metaheuristics for feature selection: Application to microarray data classification

2008 IEEE/ACS International Conference on Computer Systems and Applications ◽

10.1109/aiccsa.2008.4493515 ◽

2008 ◽

Cited By ~ 23

Author(s):

E-G. Talbi ◽

L. Jourdan ◽

J. Garcia-Nieto ◽

E. Alba

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Data Classification ◽

Population Based

Download Full-text