A phase diagram for gene selection and disease classification

Mapping Intimacies ◽

10.1101/002360 ◽

2014 ◽

Author(s):

Hong-Dong Li ◽

Qing-Song Xu ◽

Yi-Zeng Liang

Keyword(s):

Phase Diagram ◽

Gene Selection ◽

Population Analysis ◽

Predictive Ability ◽

Disease Diagnosis ◽

Disease Classification ◽

Small Subset ◽

Analysis Framework ◽

Source Codes ◽

Microarray Datasets

Identifying a small subset of discriminate genes is important for predicting clinical outcomes and facilitating disease diagnosis. Based on the model population analysis framework, we present a method, called PHADIA, which is able to output a phase diagram displaying the predictive ability of each variable, which provides an intuitive way for selecting informative variables. Using two publicly available microarray datasets, its demonstrated that our method can selects a few informative genes and achieves significantly better or comparable classification accuracy compared to the reported results in the literature. The source codes are freely available at: www.libpls.net.

Download Full-text

A phase diagram for gene selection and disease classification

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2017.06.008 ◽

2017 ◽

Vol 167 ◽

pp. 208-213 ◽

Cited By ~ 4

Author(s):

Hong-Dong Li ◽

Qing-Song Xu ◽

Yi-Zeng Liang

Keyword(s):

Phase Diagram ◽

Gene Selection ◽

Disease Classification

Download Full-text

Probability Based Most Informative Gene Selection From Microarray Data

International Journal of Rough Sets and Data Analysis ◽

10.4018/ijrsda.2018010101 ◽

2018 ◽

Vol 5 (1) ◽

pp. 1-12

Author(s):

Sunanda Das ◽

Asit Kumar Das

Keyword(s):

Gene Selection ◽

Relevant Information ◽

Subsequent Treatment ◽

Disease Classification ◽

Informative Gene ◽

Biologically Relevant ◽

Database Technology ◽

Ambiguous Data ◽

Microarray Datasets ◽

Research Analysis

Microarray datasets have a wide application in bioinformatics research. Analysis to measure the expression level of thousands of genes of this kind of high-throughput data can help for finding the cause and subsequent treatment of any disease. There are many techniques in gene analysis to extract biologically relevant information from inconsistent and ambiguous data. In this paper, the concepts of functional dependency and closure of an attribute of database technology are used for finding the most important set of genes for cancer detection. Firstly, the method computes similarity factor between each pair of genes. Based on the similarity factors a set of gene dependency is formed from which closure set is obtained. Subsequently, conditional probability based interestingness measurements are used to determine the most informative gene for disease classification. The proposed method is applied on some publicly available cancerous gene expression dataset. The result shows the effectiveness and robustness of the algorithm.

Download Full-text

Bayesian Gene Selection Based on Pathway Information and Network-Constrained Regularization

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/7471516 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Ming Cao ◽

Yue Fan ◽

Qinke Peng

Keyword(s):

Bayesian Methods ◽

Gene Selection ◽

Structural Information ◽

Disease Diagnosis ◽

Bayesian Lasso ◽

Pathway Information ◽

High Throughput Data ◽

Group Information ◽

Microarray Datasets ◽

Better Than

High-throughput data make it possible to study expression levels of thousands of genes simultaneously under a particular condition. However, only few of the genes are discriminatively expressed. How to identify these biomarkers precisely is significant for disease diagnosis, prognosis, and therapy. Many studies utilized pathway information to identify the biomarkers. However, most of these studies only incorporate the group information while the pathway structural information is ignored. In this paper, we proposed a Bayesian gene selection with a network-constrained regularization method, which can incorporate the pathway structural information as priors to perform gene selection. All the priors are conjugated; thus, the parameters can be estimated effectively through Gibbs sampling. We present the application of our method on 6 microarray datasets, comparing with Bayesian Lasso, Bayesian Elastic Net, and Bayesian Fused Lasso. The results show that our method performs better than other Bayesian methods and pathway structural information can improve the result.

Download Full-text

Gene selection via BPSO and Backward generation for cancer classification

RAIRO - Operations Research ◽

10.1051/ro/2018059 ◽

2019 ◽

Vol 53 (1) ◽

pp. 269-288 ◽

Cited By ~ 2

Author(s):

Ahmed Bir-Jmel ◽

Sidi Mohamed Douiri ◽

Souad Elbernoussi

Keyword(s):

Gene Selection ◽

Cancer Classification ◽

Selection Problem ◽

Small Subset ◽

Large Set ◽

High Dimensions ◽

Hybrid Approaches ◽

Filter Methods ◽

Microarray Datasets

Gene expression data (DNA microarray) enable researchers to simultaneously measure the levels of expression of several thousand genes. These levels of expression are very important in the classification of different types of tumors. In this work, we are interested in gene selection, which is an essential step in the data pre-processing for cancer classification. This selection makes it possible to represent a small subset of genes from a large set, and to eliminate the redundant, irrelevant or noisy genes. The combinatorial nature of the selection problem requires the development of specific techniques such as filters and Wrappers, or hybrids combining several optimization processes. In this context, we propose two hybrid approaches (RBPSO-1NN and FBPSO-SVM) for the gene selection problem, based on the combination of the filter methods (the Fisher criterion and the ReliefF algorithm), the BPSO metaheuristic algorithms and the Backward algorithm using the classifiers (SVM and 1NN) for the evaluation of the relevance of the candidate subsets. In order to verify the performance of our methods, we have tested them on eight well-known microarray datasets of high dimensions varying from 2308 to 11225 genes. The experiments carried out on the different datasets show that our methods prove to be very competitive with the existing works.

Download Full-text

Development of a biomarker database toward performing disease classification and finding disease interrelations

Database ◽

10.1093/database/baab011 ◽

2021 ◽

Vol 2021 ◽

Author(s):

Shaikh Farhad Hossain ◽

Ming Huang ◽

Naoaki Ono ◽

Aki Morita ◽

Shigehiko Kanaya ◽

...

Keyword(s):

Rapid Detection ◽

Molecular Mechanisms ◽

Clustering Algorithm ◽

Disease Diagnosis ◽

Disease Classification ◽

Network Clustering ◽

Medical Field ◽

Database Development ◽

Abnormal State ◽

Comprehensive Information

Abstract A biomarker is a measurable indicator of a disease or abnormal state of a body that plays an important role in disease diagnosis, prognosis and treatment. The biomarker has become a significant topic due to its versatile usage in the medical field and in rapid detection of the presence or severity of some diseases. The volume of biomarker data is rapidly increasing and the identified data are scattered. To provide comprehensive information, the explosively growing data need to be recorded in a single platform. There is no open-source freely available comprehensive online biomarker database. To fulfill this purpose, we have developed a human biomarker database as part of the KNApSAcK family databases which contain a vast quantity of information on the relationships between biomarkers and diseases. We have classified the diseases into 18 disease classes, mostly according to the National Center for Biotechnology Information definitions. Apart from this database development, we also have performed disease classification by separately using protein and metabolite biomarkers based on the network clustering algorithm DPClusO and hierarchical clustering. Finally, we reached a conclusion about the relationships among the disease classes. The human biomarker database can be accessed online and the inter-disease relationships may be helpful in understanding the molecular mechanisms of diseases. To our knowledge, this is one of the first approaches to classify diseases based on biomarkers. Database URL: http://www.knapsackfamily.com/Biomarker/top.php

Download Full-text

Gene Selection and Classification in Microarray Datasets using a Hybrid Approach of PCC-BPSO/GA with Multi Classifiers

Journal of Computer Science ◽

10.3844/jcssp.2018.868.880 ◽

2018 ◽

Vol 14 (6) ◽

pp. 868-880 ◽

Cited By ~ 3

Author(s):

Shilan S. Hameed ◽

Fahmi F. Muhammad ◽

Rohayanti Hassan ◽

Faisal Saeed

Keyword(s):

Gene Selection ◽

Hybrid Approach ◽

Microarray Datasets

Download Full-text

Gene Selection in Cancer Classification Using Sparse Logistic Regression with L1/2 Regularization

Applied Sciences ◽

10.3390/app8091569 ◽

2018 ◽

Vol 8 (9) ◽

pp. 1569 ◽

Cited By ~ 3

Author(s):

Shengbing Wu ◽

Hongkun Jiang ◽

Haiwei Shen ◽

Ziyi Yang

Keyword(s):

Logistic Regression ◽

Gene Selection ◽

Classification Performance ◽

Cancer Classification ◽

Sparse Logistic Regression ◽

The Subject ◽

Selection For ◽

Microarray Datasets ◽

Sparse Methods

In recent years, gene selection for cancer classification based on the expression of a small number of gene biomarkers has been the subject of much research in genetics and molecular biology. The successful identification of gene biomarkers will help in the classification of different types of cancer and improve the prediction accuracy. Recently, regularized logistic regression using the L 1 regularization has been successfully applied in high-dimensional cancer classification to tackle both the estimation of gene coefficients and the simultaneous performance of gene selection. However, the L 1 has a biased gene selection and dose not have the oracle property. To address these problems, we investigate L 1 / 2 regularized logistic regression for gene selection in cancer classification. Experimental results on three DNA microarray datasets demonstrate that our proposed method outperforms other commonly used sparse methods ( L 1 and L E N ) in terms of classification performance.

Download Full-text

Pathway-guided deep neural network toward interpretable and predictive modeling of drug sensitivity

10.1101/2020.02.06.930503 ◽

2020 ◽

Author(s):

Lei Deng ◽

Yideng Cai ◽

Wenhao Zhang ◽

Wenyi Yang ◽

Bo Gao ◽

...

Keyword(s):

Neural Network ◽

Cancer Cells ◽

Drug Targets ◽

Deep Neural Network ◽

Drug Sensitivity ◽

Predictive Ability ◽

Biological Knowledge ◽

Data Sets ◽

Source Codes ◽

Sensitivity Data

AbstractMotivationTo efficiently save cost and reduce risk in drug research and development, there is a pressing demand to develop in-silico methods to predict drug sensitivity to cancer cells. With the exponentially increasing number of multi-omics data derived from high-throughput techniques, machine learning-based methods have been applied to the prediction of drug sensitivities. However, these methods have drawbacks either in the interpretability of mechanism of drug action or limited performance in modeling drug sensitivity.ResultsIn this paper, we presented a pathway-guided deep neural network model, referred to as pathDNN, to predict the drug sensitivity to cancer cells. Biological pathways describe a group of molecules in a cell that collaborates to control various biological functions like cell proliferation and death, thereby abnormal function of pathways can result in disease. To make advantage of both the excellent predictive ability of deep neural network and the biological knowledge of pathways, we reshape the canonical DNN structure by incorporating a layer of pathway nodes and their connections to input gene nodes, which makes the DNN model more interpretable and predictive compared to canonical DNN. We have conducted extensive performance evaluations on multiple independent drug sensitivity data sets, and demonstrate that pathDNN significantly outperformed canonical DNN model and seven other classical regression models. Most importantly, we observed remarkable activity decreases of disease-related pathway nodes during forward propagation upon inputs of drug targets, which implicitly corresponds to the inhibition effect of disease-related pathways induced by drug treatment on cancer cells. Our empirical experiments show that pathDNN achieves pharmacological interpretability and predictive ability in modeling drug sensitivity to cancer cells.AvailabilityThe web server, as well as the processed data sets and source codes for reproducing our work, is available at http://pathdnn.denglab.org

Download Full-text

A novel gene selection algorithm for cancer classification using microarray datasets

BMC Medical Genomics ◽

10.1186/s12920-018-0447-6 ◽

2019 ◽

Vol 12 (1) ◽

Cited By ~ 10

Author(s):

Russul Alanni ◽

Jingyu Hou ◽

Hasseeb Azzawi ◽

Yong Xiang

Keyword(s):

Gene Selection ◽

Cancer Classification ◽

Selection Algorithm ◽

Novel Gene ◽

Microarray Datasets ◽

Gene Selection Algorithm

Download Full-text

A hybrid method for gene selection in microarray datasets

2014 IEEE International Conference on Granular Computing (GrC) ◽

10.1109/grc.2014.6982825 ◽

2014 ◽

Author(s):

Yungho Leu ◽

Chien-Pan Lee ◽

Ai-Chen Chang

Keyword(s):

Hybrid Method ◽

Gene Selection ◽

Microarray Datasets

Download Full-text