Gene selection via BPSO and Backward generation for cancer classification

2019 ◽  
Vol 53 (1) ◽  
pp. 269-288 ◽  
Author(s):  
Ahmed Bir-Jmel ◽  
Sidi Mohamed Douiri ◽  
Souad Elbernoussi

Gene expression data (DNA microarray) enable researchers to simultaneously measure the levels of expression of several thousand genes. These levels of expression are very important in the classification of different types of tumors. In this work, we are interested in gene selection, which is an essential step in the data pre-processing for cancer classification. This selection makes it possible to represent a small subset of genes from a large set, and to eliminate the redundant, irrelevant or noisy genes. The combinatorial nature of the selection problem requires the development of specific techniques such as filters and Wrappers, or hybrids combining several optimization processes. In this context, we propose two hybrid approaches (RBPSO-1NN and FBPSO-SVM) for the gene selection problem, based on the combination of the filter methods (the Fisher criterion and the ReliefF algorithm), the BPSO metaheuristic algorithms and the Backward algorithm using the classifiers (SVM and 1NN) for the evaluation of the relevance of the candidate subsets. In order to verify the performance of our methods, we have tested them on eight well-known microarray datasets of high dimensions varying from 2308 to 11225 genes. The experiments carried out on the different datasets show that our methods prove to be very competitive with the existing works.

2018 ◽  
Vol 8 (9) ◽  
pp. 1569 ◽  
Author(s):  
Shengbing Wu ◽  
Hongkun Jiang ◽  
Haiwei Shen ◽  
Ziyi Yang

In recent years, gene selection for cancer classification based on the expression of a small number of gene biomarkers has been the subject of much research in genetics and molecular biology. The successful identification of gene biomarkers will help in the classification of different types of cancer and improve the prediction accuracy. Recently, regularized logistic regression using the L 1 regularization has been successfully applied in high-dimensional cancer classification to tackle both the estimation of gene coefficients and the simultaneous performance of gene selection. However, the L 1 has a biased gene selection and dose not have the oracle property. To address these problems, we investigate L 1 / 2 regularized logistic regression for gene selection in cancer classification. Experimental results on three DNA microarray datasets demonstrate that our proposed method outperforms other commonly used sparse methods ( L 1 and L E N ) in terms of classification performance.


2012 ◽  
Vol 23 (02) ◽  
pp. 431-444 ◽  
Author(s):  
ALLANI ABDERRAHIM ◽  
EL-GHAZALI TALBI ◽  
MELLOULI KHALED

In this work, we hybridize the Genetic Quantum Algorithm with the Support Vector Machines classifier for gene selection and classification of high dimensional Microarray Data. We named our algorithm GQA SVM. Its purpose is to identify a small subset of genes that could be used to separate two classes of samples with high accuracy. A comparison of the approach with different methods of literature, in particular GA SVM and PSO SVM [2], was realized on six different datasets issued of microarray experiments dealing with cancer (leukemia, breast, colon, ovarian, prostate, and lung) and available on Web. The experiments clearified the very good performances of the method. The first contribution shows that the algorithm GQA SVM is able to find genes of interest and improve the classification on a meaningful way. The second important contribution consists in the actual discovery of new and challenging results on datasets used.


2022 ◽  
Author(s):  
Rabia Musheer Aziz

Abstract A modified Artificial Bee Colony (ABC) metaheuristics optimization technique is applied for cancer classification, that reduces the classifier's prediction errors and allows for faster convergence by selecting informative genes. Cuckoo search (CS) algorithm was used in the onlooker bee phase (exploitation phase)of ABC to boost performance by maintaining the balance between exploration and exploitation of ABC. Tuned the modified ABC algorithm by using Naïve Bayes (NB) classifiers to improve the further accuracy of the model. Independent Component Analysis (ICA) is used for dimensionality reduction. In the first step, the reduced dataset is optimized by using Modified ABC and after that, in the second step, the optimized dataset is used to train the NB classifier. Extensive experiments were performed for comprehensive comparative analysis of the proposed algorithm with well-known metaheuristic algorithms, namely Genetic Algorithm (GA) when used with the same framework for the classification of six high-dimensional cancer datasets. The comparison results showed that the proposed model with the CS algorithm achieves the highest performance as maximum classification accuracy with less count of selected genes. This shows the effectiveness of the proposed algorithm which is validated using ANOVA for cancer classification.


2014 ◽  
Author(s):  
Hong-Dong Li ◽  
Qing-Song Xu ◽  
Yi-Zeng Liang

Identifying a small subset of discriminate genes is important for predicting clinical outcomes and facilitating disease diagnosis. Based on the model population analysis framework, we present a method, called PHADIA, which is able to output a phase diagram displaying the predictive ability of each variable, which provides an intuitive way for selecting informative variables. Using two publicly available microarray datasets, it’s demonstrated that our method can selects a few informative genes and achieves significantly better or comparable classification accuracy compared to the reported results in the literature. The source codes are freely available at: www.libpls.net.


2019 ◽  
Vol 2019 ◽  
pp. 1-20 ◽  
Author(s):  
Ahmed Bir-Jmel ◽  
Sidi Mohamed Douiri ◽  
Souad Elbernoussi

The recent advance in the microarray data analysis makes it easy to simultaneously measure the expression levels of several thousand genes. These levels can be used to distinguish cancerous tissues from normal ones. In this work, we are interested in gene expression data dimension reduction for cancer classification, which is a common task in most microarray data analysis studies. This reduction has an essential role in enhancing the accuracy of the classification task and helping biologists accurately predict cancer in the body; this is carried out by selecting a small subset of relevant genes and eliminating the redundant or noisy genes. In this context, we propose a hybrid approach (MWIS-ACO-LS) for the gene selection problem, based on the combination of a new graph-based approach for gene selection (MWIS), in which we seek to minimize the redundancy between genes by considering the correlation between the latter and maximize gene-ranking (Fisher) scores, and a modified ACO coupled with a local search (LS) algorithm using the classifier 1NN for measuring the quality of the candidate subsets. In order to evaluate the proposed method, we tested MWIS-ACO-LS on ten well-replicated microarray datasets of high dimensions varying from 2308 to 12600 genes. The experimental results based on ten high-dimensional microarray classification problems demonstrated the effectiveness of our proposed method.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Russul Alanni ◽  
Jingyu Hou ◽  
Hasseeb Azzawi ◽  
Yong Xiang

Abstract Background Microarray datasets consist of complex and high-dimensional samples and genes, and generally the number of samples is much smaller than the number of genes. Due to this data imbalance, gene selection is a demanding task for microarray expression data analysis. Results The gene set selected by DGS has shown its superior performances in cancer classification. DGS has a high capability of reducing the number of genes in the original microarray datasets. The experimental comparisons with other representative and state-of-the-art gene selection methods also showed that DGS achieved the best performance in terms of the number of selected genes, classification accuracy, and computational cost. Conclusions We provide an efficient gene selection algorithm can select relevant genes which are significantly sensitive to the samples’ classes. With the few discriminative genes and less cost time by the proposed algorithm achieved much high prediction accuracy on several public microarray data, which in turn verifies the efficiency and effectiveness of the proposed gene selection method.


2021 ◽  
Vol 2 (01) ◽  
pp. 01-09
Author(s):  
Alan Jahwar ◽  
Nawzat Ahmed

Microarray data plays a major role in diagnosing and treating cancer. In several microarray data sets, many gene fragments are not associated with the target diseases. A solution to the gene selection problem might become important when analyzing large gene datasets. The key task is to better represent genes through optimum accuracy in classifying the samples. Different gene classification algorithms have been provided in past studies; after all, they suffered due to the selection of several genes mostly in high-dimensional microarray data. This paper aims to review classification and feature selection with different microarray datasets focused on swarm intelligence algorithms. We explain microarray data and its types in this paper briefly. Moreover, our paper presents an introduction to most common swarm intelligence algorithms. A review on swarm intelligence algorithms in gene selection profile based on classification of Microarray Data is presented in this paper.


2021 ◽  
Vol 5 (2) ◽  
pp. 15-21
Author(s):  
Fathima Fajila ◽  
Yuhanis Yusof

Although numerous methods of using microarray data analysis for classification have been reported, there is space in the field of cancer classification for new inventions in terms of informative gene selection. This study introduces a new incremental search-based gene selection approach for cancer classification. The strength of wrappers in determining relevant genes in a gene pool can be increased as they evaluate each possible gene’s subset. Nevertheless, the searching algorithms play a major role in gene’s subset selection. Hence, there is the possibility of finding more informative genes with incremental application. Thus, we introduce an approach which utilizes two searching algorithms in gene’s subset selection. The approach was efficient enough to classify five out of six microarray datasets with 100% accuracy using only a few biomarkers while the rest classified with only one misclassification.


Sign in / Sign up

Export Citation Format

Share Document