scholarly journals Boosting Gene Expression Clustering with System-Wide Biological Information: A Robust Autoencoder Approach

2017 ◽  
Author(s):  
Hongzhu Cui ◽  
Chong Zhou ◽  
Xinyu Dai ◽  
Yuting Liang ◽  
Randy Paffenroth ◽  
...  

AbstractGene expression analysis provides genome-wide insights into the transcriptional activity of a cell. One of the first computational steps in exploration and analysis of the gene expression data is clustering. With a number of standard clustering methods routinely used, most of the methods do not take prior biological information into account. In this paper, we propose a new approach for gene expression clustering analysis. The approach benefits from a new deep learning architecture, Robust Autoencoder, which provides a more accurate high-level representation of the feature sets, and from incorporating prior biological information into the clustering process. We tested our approach on two distinct gene expression datasets and compared the performance with two widely used clustering methods, hierarchical clustering and k-means, as well as with a recent deep learning clustering approach. As a result, our approach outperformed all other clustering methods on the labeled yeast gene expression dataset. Furthermore we showed that it is better in identifying the functionally common clusters than k-means on the unlabeled human gene expression dataset. The results demonstrate that our new deep learning architecture could generalize well the specific properties of gene expression profiles. Furthermore, the results confirm our hypothesis that the prior biological network knowledge could be helpful in the gene expression clustering task.

Author(s):  
Liviu Badea ◽  
Emil Stănescu

AbstractLinking phenotypes to specific gene expression profiles is an extremely important problem in biology, which has been approached mainly by correlation methods or, more fundamentally, by studying the effects of gene perturbations. However, genome-wide perturbations involve extensive experimental efforts, which may be prohibitive for certain organisms. On the other hand, the characterization of the various phenotypes frequently requires an expert’s subjective interpretation, such as a histopathologist’s description of tissue slide images in terms of complex visual features (e.g. ‘acinar structures’). In this paper, we use Deep Learning to eliminate the inherent subjective nature of these visual histological features and link them to genomic data, thus establishing a more precisely quantifiable correlation between transcriptomes and phenotypes. Using a dataset of whole slide images with matching gene expression data from 39 normal tissue types, we first developed a Deep Learning tissue classifier with an accuracy of 94%. Then we searched for genes whose expression correlates with features inferred by the classifier and demonstrate that Deep Learning can automatically derive visual (phenotypical) features that are well correlated with the transcriptome and therefore biologically interpretable. As we are particularly concerned with interpretability and explainability of the inferred histological models, we also develop visualizations of the inferred features and compare them with gene expression patterns determined by immunohistochemistry. This can be viewed as a first step toward bridging the gap between the level of genes and the cellular organization of tissues.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0242858
Author(s):  
Liviu Badea ◽  
Emil Stănescu

Linking phenotypes to specific gene expression profiles is an extremely important problem in biology, which has been approached mainly by correlation methods or, more fundamentally, by studying the effects of gene perturbations. However, genome-wide perturbations involve extensive experimental efforts, which may be prohibitive for certain organisms. On the other hand, the characterization of the various phenotypes frequently requires an expert’s subjective interpretation, such as a histopathologist’s description of tissue slide images in terms of complex visual features (e.g. ‘acinar structures’). In this paper, we use Deep Learning to eliminate the inherent subjective nature of these visual histological features and link them to genomic data, thus establishing a more precisely quantifiable correlation between transcriptomes and phenotypes. Using a dataset of whole slide images with matching gene expression data from 39 normal tissue types, we first developed a Deep Learning tissue classifier with an accuracy of 94%. Then we searched for genes whose expression correlates with features inferred by the classifier and demonstrate that Deep Learning can automatically derive visual (phenotypical) features that are well correlated with the transcriptome and therefore biologically interpretable. As we are particularly concerned with interpretability and explainability of the inferred histological models, we also develop visualizations of the inferred features and compare them with gene expression patterns determined by immunohistochemistry. This can be viewed as a first step toward bridging the gap between the level of genes and the cellular organization of tissues.


2021 ◽  
Vol 22 (12) ◽  
pp. 6556
Author(s):  
Junjun Huang ◽  
Xiaoyu Li ◽  
Xin Chen ◽  
Yaru Guo ◽  
Weihong Liang ◽  
...  

ATP-binding cassette (ABC) transporter proteins are a gene super-family in plants and play vital roles in growth, development, and response to abiotic and biotic stresses. The ABC transporters have been identified in crop plants such as rice and buckwheat, but little is known about them in soybean. Soybean is an important oil crop and is one of the five major crops in the world. In this study, 255 ABC genes that putatively encode ABC transporters were identified from soybean through bioinformatics and then categorized into eight subfamilies, including 7 ABCAs, 52 ABCBs, 48 ABCCs, 5 ABCDs, 1 ABCEs, 10 ABCFs, 111 ABCGs, and 21 ABCIs. Their phylogenetic relationships, gene structure, and gene expression profiles were characterized. Segmental duplication was the main reason for the expansion of the GmABC genes. Ka/Ks analysis suggested that intense purifying selection was accompanied by the evolution of GmABC genes. The genome-wide collinearity of soybean with other species showed that GmABCs were relatively conserved and that collinear ABCs between species may have originated from the same ancestor. Gene expression analysis of GmABCs revealed the distinct expression pattern in different tissues and diverse developmental stages. The candidate genes GmABCB23, GmABCB25, GmABCB48, GmABCB52, GmABCI1, GmABCI5, and GmABCI13 were responsive to Al toxicity. This work on the GmABC gene family provides useful information for future studies on ABC transporters in soybean and potential targets for the cultivation of new germplasm resources of aluminum-tolerant soybean.


2008 ◽  
Vol 5 (2) ◽  
Author(s):  
Li Teng ◽  
Laiwan Chan

SummaryTraditional analysis of gene expression profiles use clustering to find groups of coexpressed genes which have similar expression patterns. However clustering is time consuming and could be diffcult for very large scale dataset. We proposed the idea of Discovering Distinct Patterns (DDP) in gene expression profiles. Since patterns showing by the gene expressions reveal their regulate mechanisms. It is significant to find all different patterns existing in the dataset when there is little prior knowledge. It is also a helpful start before taking on further analysis. We propose an algorithm for DDP by iteratively picking out pairs of gene expression patterns which have the largest dissimilarities. This method can also be used as preprocessing to initialize centers for clustering methods, like K-means. Experiments on both synthetic dataset and real gene expression datasets show our method is very effective in finding distinct patterns which have gene functional significance and is also effcient.


2013 ◽  
Vol 71 (Suppl 3) ◽  
pp. 418.3-418
Author(s):  
J. Fernandez-Tajes ◽  
A. Soto-Hermida ◽  
M. Fernandez-Moreno ◽  
M.E. Vazquez-Mosquera ◽  
N. Oreiro ◽  
...  

2010 ◽  
Vol 61 (1) ◽  
pp. 1-6 ◽  
Author(s):  
Sok Kean Khoo ◽  
Karl Dykema ◽  
Naga Manjari Vadlapatla ◽  
David LaHaie ◽  
Saul Valle ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document