scholarly journals A phase diagram for gene selection and disease classification

2014 ◽  
Author(s):  
Hong-Dong Li ◽  
Qing-Song Xu ◽  
Yi-Zeng Liang

Identifying a small subset of discriminate genes is important for predicting clinical outcomes and facilitating disease diagnosis. Based on the model population analysis framework, we present a method, called PHADIA, which is able to output a phase diagram displaying the predictive ability of each variable, which provides an intuitive way for selecting informative variables. Using two publicly available microarray datasets, it’s demonstrated that our method can selects a few informative genes and achieves significantly better or comparable classification accuracy compared to the reported results in the literature. The source codes are freely available at: www.libpls.net.

2017 ◽  
Vol 167 ◽  
pp. 208-213 ◽  
Author(s):  
Hong-Dong Li ◽  
Qing-Song Xu ◽  
Yi-Zeng Liang

2018 ◽  
Vol 5 (1) ◽  
pp. 1-12
Author(s):  
Sunanda Das ◽  
Asit Kumar Das

Microarray datasets have a wide application in bioinformatics research. Analysis to measure the expression level of thousands of genes of this kind of high-throughput data can help for finding the cause and subsequent treatment of any disease. There are many techniques in gene analysis to extract biologically relevant information from inconsistent and ambiguous data. In this paper, the concepts of functional dependency and closure of an attribute of database technology are used for finding the most important set of genes for cancer detection. Firstly, the method computes similarity factor between each pair of genes. Based on the similarity factors a set of gene dependency is formed from which closure set is obtained. Subsequently, conditional probability based interestingness measurements are used to determine the most informative gene for disease classification. The proposed method is applied on some publicly available cancerous gene expression dataset. The result shows the effectiveness and robustness of the algorithm.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Ming Cao ◽  
Yue Fan ◽  
Qinke Peng

High-throughput data make it possible to study expression levels of thousands of genes simultaneously under a particular condition. However, only few of the genes are discriminatively expressed. How to identify these biomarkers precisely is significant for disease diagnosis, prognosis, and therapy. Many studies utilized pathway information to identify the biomarkers. However, most of these studies only incorporate the group information while the pathway structural information is ignored. In this paper, we proposed a Bayesian gene selection with a network-constrained regularization method, which can incorporate the pathway structural information as priors to perform gene selection. All the priors are conjugated; thus, the parameters can be estimated effectively through Gibbs sampling. We present the application of our method on 6 microarray datasets, comparing with Bayesian Lasso, Bayesian Elastic Net, and Bayesian Fused Lasso. The results show that our method performs better than other Bayesian methods and pathway structural information can improve the result.


2019 ◽  
Vol 53 (1) ◽  
pp. 269-288 ◽  
Author(s):  
Ahmed Bir-Jmel ◽  
Sidi Mohamed Douiri ◽  
Souad Elbernoussi

Gene expression data (DNA microarray) enable researchers to simultaneously measure the levels of expression of several thousand genes. These levels of expression are very important in the classification of different types of tumors. In this work, we are interested in gene selection, which is an essential step in the data pre-processing for cancer classification. This selection makes it possible to represent a small subset of genes from a large set, and to eliminate the redundant, irrelevant or noisy genes. The combinatorial nature of the selection problem requires the development of specific techniques such as filters and Wrappers, or hybrids combining several optimization processes. In this context, we propose two hybrid approaches (RBPSO-1NN and FBPSO-SVM) for the gene selection problem, based on the combination of the filter methods (the Fisher criterion and the ReliefF algorithm), the BPSO metaheuristic algorithms and the Backward algorithm using the classifiers (SVM and 1NN) for the evaluation of the relevance of the candidate subsets. In order to verify the performance of our methods, we have tested them on eight well-known microarray datasets of high dimensions varying from 2308 to 11225 genes. The experiments carried out on the different datasets show that our methods prove to be very competitive with the existing works.


Database ◽  
2021 ◽  
Vol 2021 ◽  
Author(s):  
Shaikh Farhad Hossain ◽  
Ming Huang ◽  
Naoaki Ono ◽  
Aki Morita ◽  
Shigehiko Kanaya ◽  
...  

Abstract A biomarker is a measurable indicator of a disease or abnormal state of a body that plays an important role in disease diagnosis, prognosis and treatment. The biomarker has become a significant topic due to its versatile usage in the medical field and in rapid detection of the presence or severity of some diseases. The volume of biomarker data is rapidly increasing and the identified data are scattered. To provide comprehensive information, the explosively growing data need to be recorded in a single platform. There is no open-source freely available comprehensive online biomarker database. To fulfill this purpose, we have developed a human biomarker database as part of the KNApSAcK family databases which contain a vast quantity of information on the relationships between biomarkers and diseases. We have classified the diseases into 18 disease classes, mostly according to the National Center for Biotechnology Information definitions. Apart from this database development, we also have performed disease classification by separately using protein and metabolite biomarkers based on the network clustering algorithm DPClusO and hierarchical clustering. Finally, we reached a conclusion about the relationships among the disease classes. The human biomarker database can be accessed online and the inter-disease relationships may be helpful in understanding the molecular mechanisms of diseases. To our knowledge, this is one of the first approaches to classify diseases based on biomarkers. Database URL:  http://www.knapsackfamily.com/Biomarker/top.php


2018 ◽  
Vol 14 (6) ◽  
pp. 868-880 ◽  
Author(s):  
Shilan S. Hameed ◽  
Fahmi F. Muhammad ◽  
Rohayanti Hassan ◽  
Faisal Saeed

2018 ◽  
Vol 8 (9) ◽  
pp. 1569 ◽  
Author(s):  
Shengbing Wu ◽  
Hongkun Jiang ◽  
Haiwei Shen ◽  
Ziyi Yang

In recent years, gene selection for cancer classification based on the expression of a small number of gene biomarkers has been the subject of much research in genetics and molecular biology. The successful identification of gene biomarkers will help in the classification of different types of cancer and improve the prediction accuracy. Recently, regularized logistic regression using the L 1 regularization has been successfully applied in high-dimensional cancer classification to tackle both the estimation of gene coefficients and the simultaneous performance of gene selection. However, the L 1 has a biased gene selection and dose not have the oracle property. To address these problems, we investigate L 1 / 2 regularized logistic regression for gene selection in cancer classification. Experimental results on three DNA microarray datasets demonstrate that our proposed method outperforms other commonly used sparse methods ( L 1 and L E N ) in terms of classification performance.


2020 ◽  
Author(s):  
Lei Deng ◽  
Yideng Cai ◽  
Wenhao Zhang ◽  
Wenyi Yang ◽  
Bo Gao ◽  
...  

AbstractMotivationTo efficiently save cost and reduce risk in drug research and development, there is a pressing demand to develop in-silico methods to predict drug sensitivity to cancer cells. With the exponentially increasing number of multi-omics data derived from high-throughput techniques, machine learning-based methods have been applied to the prediction of drug sensitivities. However, these methods have drawbacks either in the interpretability of mechanism of drug action or limited performance in modeling drug sensitivity.ResultsIn this paper, we presented a pathway-guided deep neural network model, referred to as pathDNN, to predict the drug sensitivity to cancer cells. Biological pathways describe a group of molecules in a cell that collaborates to control various biological functions like cell proliferation and death, thereby abnormal function of pathways can result in disease. To make advantage of both the excellent predictive ability of deep neural network and the biological knowledge of pathways, we reshape the canonical DNN structure by incorporating a layer of pathway nodes and their connections to input gene nodes, which makes the DNN model more interpretable and predictive compared to canonical DNN. We have conducted extensive performance evaluations on multiple independent drug sensitivity data sets, and demonstrate that pathDNN significantly outperformed canonical DNN model and seven other classical regression models. Most importantly, we observed remarkable activity decreases of disease-related pathway nodes during forward propagation upon inputs of drug targets, which implicitly corresponds to the inhibition effect of disease-related pathways induced by drug treatment on cancer cells. Our empirical experiments show that pathDNN achieves pharmacological interpretability and predictive ability in modeling drug sensitivity to cancer cells.AvailabilityThe web server, as well as the processed data sets and source codes for reproducing our work, is available at http://pathdnn.denglab.org


Sign in / Sign up

Export Citation Format

Share Document