scholarly journals ADAGE analysis of publicly available gene expression data collections illuminates Pseudomonas aeruginosa-host interactions

2015 ◽  
Author(s):  
Jie Tan ◽  
John H Hammond ◽  
Deborah A Hogan ◽  
Casey S Greene

The growth in genome-scale assays of gene expression for different species in publicly available databases presents new opportunities for computational methods that aid in hypothesis generation and biological interpretation of these data. Here, we present an unsupervised machine-learning approach, ADAGE (Analysis using Denoising Autoencoders of Gene Expression) and apply it to the interpretation of all of the publicly available gene expression data for Pseudomonas aeruginosa, an important opportunistic bacterial pathogen. In post-hoc positive control analyses using curated knowledge, the P. aeruginosa ADAGE model found that co-operonic genes often participated in similar processes and accurately predicted which genes had similar functions. By analyzing newly generated data and previously published microarray and RNA-seq data, the ADAGE model identified gene expression differences between strains, modeled the cellular response to low oxygen, and predicted the involvement of biological processes despite low level expression differences in directly involved genes. Comparison of ADAGE with PCA and ICA revealed that ADAGE extracts distinct signals. We provide the ADAGE model with analysis of all publicly available P. aeruginosa GeneChip experiments, and we provide open source code for use in other species and settings.

2015 ◽  
Vol 11 (11) ◽  
pp. 3137-3148
Author(s):  
Nazanin Hosseinkhan ◽  
Peyman Zarrineh ◽  
Hassan Rokni-Zadeh ◽  
Mohammad Reza Ashouri ◽  
Ali Masoudi-Nejad

Gene co-expression analysis is one of the main aspects of systems biology that uses high-throughput gene expression data.


2019 ◽  
Author(s):  
Yuumi Okuzono ◽  
Takashi Hoshino

AbstractRecent rise of microarray and next-generation sequencing in genome-related fields has simplified obtaining gene expression data at whole gene level, and biological interpretation of gene signatures related to life phenomena and diseases has become very important. However, the conventional method is numerical comparison of gene signature, pathway, and gene ontology (GO) overlap and distribution bias, and it is not possible to compare the specificity and importance of genes contained in gene signatures as humans do.This study proposes the gene signature vector (GsVec), a unique method for interpreting gene signatures that clarifies the semantic relationship between gene signatures by incorporating a method of distributed document representation from natural language processing (NLP). In proposed algorithm, a gene-topic vector is created by multiplying the feature vector based on the gene’s distributed representation by the probability of the gene signature topic and the low frequency of occurrence of the corresponding gene in all gene signatures. These vectors are concatenated for genes included in each gene signature to create a signature vector. The degrees of similarity between signature vectors are obtained from the cosine distances, and the levels of relevance between gene signatures are quantified.Using the above algorithm, GsVec learned approximately 5,000 types of canonical pathway and GO biological process gene signatures published in the Molecular Signatures Database (MSigDB). Then, validation of the pathway database BioCarta with known biological significance and validation using actual gene expression data (differentially expressed genes) were performed, and both were able to obtain biologically valid results. In addition, the results compared with the pathway enrichment analysis in Fisher’s exact test used in the conventional method resulted in equivalent or more biologically valid signatures. Furthermore, although NLP is generally developed in Python, GsVec can execute the entire process in only the R language, the main language of bioinformatics.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Blaise Hanczar ◽  
Farida Zehraoui ◽  
Tina Issa ◽  
Mathieu Arles

Abstract Background The use of predictive gene signatures to assist clinical decision is becoming more and more important. Deep learning has a huge potential in the prediction of phenotype from gene expression profiles. However, neural networks are viewed as black boxes, where accurate predictions are provided without any explanation. The requirements for these models to become interpretable are increasing, especially in the medical field. Results We focus on explaining the predictions of a deep neural network model built from gene expression data. The most important neurons and genes influencing the predictions are identified and linked to biological knowledge. Our experiments on cancer prediction show that: (1) deep learning approach outperforms classical machine learning methods on large training sets; (2) our approach produces interpretations more coherent with biology than the state-of-the-art based approaches; (3) we can provide a comprehensive explanation of the predictions for biologists and physicians. Conclusion We propose an original approach for biological interpretation of deep learning models for phenotype prediction from gene expression data. Since the model can find relationships between the phenotype and gene expression, we may assume that there is a link between the identified genes and the phenotype. The interpretation can, therefore, lead to new biological hypotheses to be investigated by biologists.


2019 ◽  
Author(s):  
Lara H Urban ◽  
Christian W Remmele ◽  
Marcus Dittrich ◽  
Roland F Schwarz ◽  
Tobias Müller

Abstract Objective The biological interpretation of gene expression measurements is a challenging task. While ordination methods are routinely used to identify clusters of samples or co-expressed genes, these methods do not take sample or gene annotations into account. We aim to provide a tool that allows users of all backgrounds to assess and visualize the intrinsic correlation structure of complex annotated gene expression data and discover the covariates that jointly affect expression patterns. Results The Bioconductor package covRNA provides a convenient and fast interface for testing and visualizing complex relationships between sample and gene covariates mediated by gene expression data in an entirely unsupervised setting. The relationships between sample and gene covariates are tested by statistical permutation tests and visualized by ordination. The methods are inspired by the fourthcorner and RLQ analyses used in ecological research for the analysis of species abundance data, that we modified to make them suitable for the distributional characteristics of both, RNA-Seq read counts and microarray intensities, and to provide a high-performance parallelized implementation for the analysis of large-scale gene expression data on multi-core computational systems. CovRNA provides additional modules for unsupervised gene filtering and plotting functions to ensure a smooth and coherent analysis workflow.


2008 ◽  
Vol 2 ◽  
pp. BBI.S441 ◽  
Author(s):  
Katrin Fundel ◽  
Robert Küffner ◽  
Thomas Aigner ◽  
Ralf Zimmer

Introduction Numerous methods exist for basic processing, e.g. normalization, of microarray gene expression data. These methods have an important effect on the final analysis outcome. Therefore, it is crucial to select methods appropriate for a given dataset in order to assure the validity and reliability of expression data analysis. Furthermore, biological interpretation requires expression values for genes, which are often represented by several spots or probe sets on a microarray. How to best integrate spot/probe set values into gene values has so far been a somewhat neglected problem. Results We present a case study comparing different between-array normalization methods with respect to the identification of differentially expressed genes. Our results show that it is feasible and necessary to use prior knowledge on gene expression measurements to select an adequate normalization method for the given data. Furthermore, we provide evidence that combining spot/probe set p-values into gene p-values for detecting differentially expressed genes has advantages compared to combining expression values for spots/probe sets into gene expression values. The comparison of different methods suggests to use Stouffer's method for this purpose. The study has been conducted on gene expression experiments investigating human joint cartilage samples of Osteoarthritis related groups: a cDNA microarray (83 samples, four groups) and an Affymetrix (26 samples, two groups) data set. Conclusion The apparently straight forward steps of gene expression data analysis, e.g. between-array normalization and detection of differentially regulated genes, can be accomplished by numerous different methods. We analyzed multiple methods and the possible effects and thereby demonstrate the importance of the single decisions taken during data processing. We give guidelines for evaluating normalization outcomes. An overview of these effects via appropriate measures and plots compared to prior knowledge is essential for the biological interpretation of gene expression measurements.


Sign in / Sign up

Export Citation Format

Share Document