Swarm Intelligence Algorithms in Gene Selection Profile Based on Classification of Microarray Data: A Review

Alan Jahwar; Nawzat Ahmed

doi:10.38094/jastt20161

Swarm Intelligence Algorithms in Gene Selection Profile Based on Classification of Microarray Data: A Review

Journal of Applied Science and Technology Trends ◽

10.38094/jastt20161 ◽

2021 ◽

Vol 2 (01) ◽

pp. 01-09

Author(s):

Alan Jahwar ◽

Nawzat Ahmed

Keyword(s):

Swarm Intelligence ◽

Microarray Data ◽

Gene Selection ◽

Classification Algorithms ◽

Data Sets ◽

Paper Briefly ◽

Large Gene ◽

Microarray Datasets ◽

Selection Of

Microarray data plays a major role in diagnosing and treating cancer. In several microarray data sets, many gene fragments are not associated with the target diseases. A solution to the gene selection problem might become important when analyzing large gene datasets. The key task is to better represent genes through optimum accuracy in classifying the samples. Different gene classification algorithms have been provided in past studies; after all, they suffered due to the selection of several genes mostly in high-dimensional microarray data. This paper aims to review classification and feature selection with different microarray datasets focused on swarm intelligence algorithms. We explain microarray data and its types in this paper briefly. Moreover, our paper presents an introduction to most common swarm intelligence algorithms. A review on swarm intelligence algorithms in gene selection profile based on classification of Microarray Data is presented in this paper.

Download Full-text

Maximal Covariance Complexity-Based Penalized Likelihood Method in High Dimensional Data

Regular issue - International Journal of Management and Humanities ◽

10.35940/ijmh.i0904.0641020 ◽

2020 ◽

Vol 4 (10) ◽

pp. 24-32

Keyword(s):

Microarray Data ◽

Gene Selection ◽

High Dimensional Data ◽

Evaluation Criteria ◽

Likelihood Method ◽

High Dimensional ◽

Scale Invariant ◽

Selection Operator ◽

Selection Of

Classification of cancer and selection of genes is one of the most important application of DNA microarray data. As a result of the higher dimensionality of microarray data, classification and selection of gene techniques are frequently employed to support the professional systems in the diagnosing ability of cancer with higher precision in classification. Least absolute shrinkage and selection operator (LASSO) is one of the most popular method for cancer classification and gene selection in high dimensional data. However, Lasso has limitations of being biased and cannot select variables more than the sample size (n) in gene selection and classification of high dimensional microarray data. To address this problems, LASSO-C1F was proposed using scale invariant measure of maximal information complexity of covariance matrix denoted with weight modifications as data-adaptive alternative to the fairly arbitrary choice of the regularization term in the least absolute shrinkage and selection operator (LASSO). The results indicated the effectiveness of the proposed method LASSO-C1F over the classical LASSO. The evaluation criteria result shows that the proposed method, LASSO-C1F has a better performance in terms of AUC and number of genes selected

Download Full-text

Gene selection ensembles and classifier ensembles for medical diagnosis

Biometrical Letters ◽

10.2478/bile-2019-0007 ◽

2019 ◽

Vol 56 (2) ◽

pp. 117-138

Author(s):

Małgorzata Ćwiklińska-Jurkowska

Keyword(s):

Microarray Data ◽

Gene Selection ◽

Data Sets ◽

Selection Methods ◽

Classifier Ensembles ◽

K Nearest Neighbors ◽

Data Set ◽

Cancer Data ◽

Misclassification Rates ◽

Selection Of

SummaryThe usefulness of combining methods is examined using the example of microarray cancer data sets, where expression levels of huge numbers of genes are reported. Problems of discrimination into two groups are examined on three data sets relating to the expression of huge numbers of genes. For the three examined microarray data sets, the cross-validation errors evaluated on the remaining half of the whole data set, not used earlier for the selection of genes, were used as measures of classifier performance. Common single procedures for the selection of genes—Prediction Analysis of Microarrays (PAM) and Significance Analysis of Microarrays (SAM)—were compared with the fusion of eight selection procedures, or of a smaller subset of five of them, excluding SAM or PAM. Merging five or eight selection methods gave similar results. Based on the misclassification rates for the three examined microarray data sets, for any examined ensemble of classifiers, the combining of gene selection methods was not superior to single PAM or SAM selection for two of the examined data sets. Additionally, the procedure of heterogeneous combining of five base classifiers—k-nearest neighbors, SVM linear and SVM radial with parameter c=1, shrunken centroids regularized classifier (SCRDA) and nearest mean classifier—proved to significantly outperform resampling classifiers such as bagging decision trees. Heterogeneously combined classifiers also outperformed double bagging for some ranges of gene numbers and data sets, but merging is generally not superior to random forests. The preliminary step of combining gene rankings was generally not essential for the performance for either heterogeneously or homogeneously combined classifiers.

Download Full-text

Robust and Stable Gene Selection via Maximum-Minimum Correntropy Criterion

10.1101/029538 ◽

2015 ◽

Author(s):

Majid Mohammadi ◽

Hossein Sharifi Noghabi ◽

Ghosheh Abed Hodtani ◽

Habib Rajabi Mashhadi

Keyword(s):

Microarray Data ◽

Gene Selection ◽

Optimal Number ◽

Support Vector ◽

Stable Gene ◽

Data Sets ◽

Data Set ◽

Selection Algorithms ◽

Maximum Minimum ◽

Selection Of

One of the central challenges in cancer research is identifying significant genes among thousands of others on a microarray. Since preventing outbreak and progression of cancer is the ultimate goal in bioinformatics and computational biology, detection of genes that are most involved is vital and crucial. In this article, we propose a Maximum-Minimum Correntropy Criterion (MMCC) approach for selection of biologically meaningful genes from microarray data sets which is stable, fast and robust against diverse noise and outliers and competitively accurate in comparison with other algorithms. Moreover, via an evolutionary optimization process, the optimal number of features for each data set is determined. Through broad experimental evaluation, MMCC is proved to be significantly better compared to other well-known gene selection algorithms for 25 commonly used microarray data sets. Surprisingly, high accuracy in classification by Support Vector Machine (SVM) is achieved by less than 10 genes selected by MMCC in all of the cases.

Download Full-text

Gene Selection and Classification of Human Lymphoma from Microarray Data

Biological and Medical Data Analysis - Lecture Notes in Computer Science ◽

10.1007/11573067_38 ◽

2005 ◽

pp. 379-390

Author(s):

Joarder Kamruzzaman ◽

Suryani Lim ◽

Iqbal Gondal ◽

Rezaul Begg

Keyword(s):

Microarray Data ◽

Gene Selection ◽

Human Lymphoma

Download Full-text

Hybridization of Genetic and Quantum Algorithm for gene selection and classification of Microarray data

2009 IEEE International Symposium on Parallel & Distributed Processing ◽

10.1109/ipdps.2009.5161116 ◽

2009 ◽

Cited By ~ 2

Author(s):

Allani Abderrahim ◽

El-Ghazali Talbi ◽

Mellouli Khaled

Keyword(s):

Microarray Data ◽

Gene Selection ◽

Quantum Algorithm

Download Full-text

Gene Selection in Cancer Classification Using Sparse Logistic Regression with L1/2 Regularization

Applied Sciences ◽

10.3390/app8091569 ◽

2018 ◽

Vol 8 (9) ◽

pp. 1569 ◽

Cited By ~ 3

Author(s):

Shengbing Wu ◽

Hongkun Jiang ◽

Haiwei Shen ◽

Ziyi Yang

Keyword(s):

Logistic Regression ◽

Gene Selection ◽

Classification Performance ◽

Cancer Classification ◽

Sparse Logistic Regression ◽

The Subject ◽

Selection For ◽

Microarray Datasets ◽

Sparse Methods

In recent years, gene selection for cancer classification based on the expression of a small number of gene biomarkers has been the subject of much research in genetics and molecular biology. The successful identification of gene biomarkers will help in the classification of different types of cancer and improve the prediction accuracy. Recently, regularized logistic regression using the L 1 regularization has been successfully applied in high-dimensional cancer classification to tackle both the estimation of gene coefficients and the simultaneous performance of gene selection. However, the L 1 has a biased gene selection and dose not have the oracle property. To address these problems, we investigate L 1 / 2 regularized logistic regression for gene selection in cancer classification. Experimental results on three DNA microarray datasets demonstrate that our proposed method outperforms other commonly used sparse methods ( L 1 and L E N ) in terms of classification performance.

Download Full-text

SVM-Based Local Search for Gene Selection and Classification of Microarray Data

Communications in Computer and Information Science - Bioinformatics Research and Development ◽

10.1007/978-3-540-70600-7_39 ◽

2008 ◽

pp. 499-508 ◽

Cited By ~ 3

Author(s):

Jose Crispin Hernandez Hernandez ◽

Béatrice Duval ◽

Jin-Kao Hao

Keyword(s):

Local Search ◽

Microarray Data ◽

Gene Selection

Download Full-text

Hybrid Ensemble Learning Methods for Classification of Microarray Data

Data Analytics in Medicine ◽

10.4018/978-1-7998-1204-3.ch038 ◽

2020 ◽

pp. 707-725

Author(s):

Sujata Dash

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Microarray Data ◽

Classification Model ◽

Rotation Forest ◽

Ensemble Technique ◽

Basic Characteristics ◽

Microarray Datasets ◽

Feature Selection Techniques

Efficient classification and feature extraction techniques pave an effective way for diagnosing cancers from microarray datasets. It has been observed that the conventional classification techniques have major limitations in discriminating the genes accurately. However, such kind of problems can be addressed by an ensemble technique to a great extent. In this paper, a hybrid RotBagg ensemble framework has been proposed to address the problem specified above. This technique is an integration of Rotation Forest and Bagging ensemble which in turn preserves the basic characteristics of ensemble architecture i.e., diversity and accuracy. Three different feature selection techniques are employed to select subsets of genes to improve the effectiveness and generalization of the RotBagg ensemble. The efficiency is validated through five microarray datasets and also compared with the results of base learners. The experimental results show that the correlation based FRFR with PCA-based RotBagg ensemble form a highly efficient classification model.

Download Full-text