scholarly journals Swarm Intelligence Algorithms in Gene Selection Profile Based on Classification of Microarray Data: A Review

2021 ◽  
Vol 2 (01) ◽  
pp. 01-09
Author(s):  
Alan Jahwar ◽  
Nawzat Ahmed

Microarray data plays a major role in diagnosing and treating cancer. In several microarray data sets, many gene fragments are not associated with the target diseases. A solution to the gene selection problem might become important when analyzing large gene datasets. The key task is to better represent genes through optimum accuracy in classifying the samples. Different gene classification algorithms have been provided in past studies; after all, they suffered due to the selection of several genes mostly in high-dimensional microarray data. This paper aims to review classification and feature selection with different microarray datasets focused on swarm intelligence algorithms. We explain microarray data and its types in this paper briefly. Moreover, our paper presents an introduction to most common swarm intelligence algorithms. A review on swarm intelligence algorithms in gene selection profile based on classification of Microarray Data is presented in this paper.

Classification of cancer and selection of genes is one of the most important application of DNA microarray data. As a result of the higher dimensionality of microarray data, classification and selection of gene techniques are frequently employed to support the professional systems in the diagnosing ability of cancer with higher precision in classification. Least absolute shrinkage and selection operator (LASSO) is one of the most popular method for cancer classification and gene selection in high dimensional data. However, Lasso has limitations of being biased and cannot select variables more than the sample size (n) in gene selection and classification of high dimensional microarray data. To address this problems, LASSO-C1F was proposed using scale invariant measure of maximal information complexity of covariance matrix denoted with weight modifications as data-adaptive alternative to the fairly arbitrary choice of the regularization term in the least absolute shrinkage and selection operator (LASSO). The results indicated the effectiveness of the proposed method LASSO-C1F over the classical LASSO. The evaluation criteria result shows that the proposed method, LASSO-C1F has a better performance in terms of AUC and number of genes selected


2019 ◽  
Vol 56 (2) ◽  
pp. 117-138
Author(s):  
Małgorzata Ćwiklińska-Jurkowska

SummaryThe usefulness of combining methods is examined using the example of microarray cancer data sets, where expression levels of huge numbers of genes are reported. Problems of discrimination into two groups are examined on three data sets relating to the expression of huge numbers of genes. For the three examined microarray data sets, the cross-validation errors evaluated on the remaining half of the whole data set, not used earlier for the selection of genes, were used as measures of classifier performance. Common single procedures for the selection of genes—Prediction Analysis of Microarrays (PAM) and Significance Analysis of Microarrays (SAM)—were compared with the fusion of eight selection procedures, or of a smaller subset of five of them, excluding SAM or PAM. Merging five or eight selection methods gave similar results. Based on the misclassification rates for the three examined microarray data sets, for any examined ensemble of classifiers, the combining of gene selection methods was not superior to single PAM or SAM selection for two of the examined data sets. Additionally, the procedure of heterogeneous combining of five base classifiers—k-nearest neighbors, SVM linear and SVM radial with parameter c=1, shrunken centroids regularized classifier (SCRDA) and nearest mean classifier—proved to significantly outperform resampling classifiers such as bagging decision trees. Heterogeneously combined classifiers also outperformed double bagging for some ranges of gene numbers and data sets, but merging is generally not superior to random forests. The preliminary step of combining gene rankings was generally not essential for the performance for either heterogeneously or homogeneously combined classifiers.


2015 ◽  
Author(s):  
Majid Mohammadi ◽  
Hossein Sharifi Noghabi ◽  
Ghosheh Abed Hodtani ◽  
Habib Rajabi Mashhadi

One of the central challenges in cancer research is identifying significant genes among thousands of others on a microarray. Since preventing outbreak and progression of cancer is the ultimate goal in bioinformatics and computational biology, detection of genes that are most involved is vital and crucial. In this article, we propose a Maximum-Minimum Correntropy Criterion (MMCC) approach for selection of biologically meaningful genes from microarray data sets which is stable, fast and robust against diverse noise and outliers and competitively accurate in comparison with other algorithms. Moreover, via an evolutionary optimization process, the optimal number of features for each data set is determined. Through broad experimental evaluation, MMCC is proved to be significantly better compared to other well-known gene selection algorithms for 25 commonly used microarray data sets. Surprisingly, high accuracy in classification by Support Vector Machine (SVM) is achieved by less than 10 genes selected by MMCC in all of the cases.


2018 ◽  
Vol 8 (9) ◽  
pp. 1569 ◽  
Author(s):  
Shengbing Wu ◽  
Hongkun Jiang ◽  
Haiwei Shen ◽  
Ziyi Yang

In recent years, gene selection for cancer classification based on the expression of a small number of gene biomarkers has been the subject of much research in genetics and molecular biology. The successful identification of gene biomarkers will help in the classification of different types of cancer and improve the prediction accuracy. Recently, regularized logistic regression using the L 1 regularization has been successfully applied in high-dimensional cancer classification to tackle both the estimation of gene coefficients and the simultaneous performance of gene selection. However, the L 1 has a biased gene selection and dose not have the oracle property. To address these problems, we investigate L 1 / 2 regularized logistic regression for gene selection in cancer classification. Experimental results on three DNA microarray datasets demonstrate that our proposed method outperforms other commonly used sparse methods ( L 1 and L E N ) in terms of classification performance.


2020 ◽  
pp. 707-725
Author(s):  
Sujata Dash

Efficient classification and feature extraction techniques pave an effective way for diagnosing cancers from microarray datasets. It has been observed that the conventional classification techniques have major limitations in discriminating the genes accurately. However, such kind of problems can be addressed by an ensemble technique to a great extent. In this paper, a hybrid RotBagg ensemble framework has been proposed to address the problem specified above. This technique is an integration of Rotation Forest and Bagging ensemble which in turn preserves the basic characteristics of ensemble architecture i.e., diversity and accuracy. Three different feature selection techniques are employed to select subsets of genes to improve the effectiveness and generalization of the RotBagg ensemble. The efficiency is validated through five microarray datasets and also compared with the results of base learners. The experimental results show that the correlation based FRFR with PCA-based RotBagg ensemble form a highly efficient classification model.


Author(s):  
Shinn-Ying Ho ◽  
Chong-Cheng Lee ◽  
Hung-Ming Chen ◽  
Hui-Ling Huang

Sign in / Sign up

Export Citation Format

Share Document