ADAGE analysis of publicly available gene expression data collections illuminates Pseudomonas aeruginosa-host interactions

Mapping Intimacies ◽

10.1101/030650 ◽

2015 ◽

Cited By ~ 4

Author(s):

Jie Tan ◽

John H Hammond ◽

Deborah A Hogan ◽

Casey S Greene

Keyword(s):

Gene Expression ◽

Pseudomonas Aeruginosa ◽

Gene Expression Data ◽

Cellular Response ◽

Bacterial Pathogen ◽

Positive Control ◽

Expression Data ◽

Low Oxygen ◽

Biological Interpretation ◽

Data Collections

The growth in genome-scale assays of gene expression for different species in publicly available databases presents new opportunities for computational methods that aid in hypothesis generation and biological interpretation of these data. Here, we present an unsupervised machine-learning approach, ADAGE (Analysis using Denoising Autoencoders of Gene Expression) and apply it to the interpretation of all of the publicly available gene expression data for Pseudomonas aeruginosa, an important opportunistic bacterial pathogen. In post-hoc positive control analyses using curated knowledge, the P. aeruginosa ADAGE model found that co-operonic genes often participated in similar processes and accurately predicted which genes had similar functions. By analyzing newly generated data and previously published microarray and RNA-seq data, the ADAGE model identified gene expression differences between strains, modeled the cellular response to low oxygen, and predicted the involvement of biological processes despite low level expression differences in directly involved genes. Comparison of ADAGE with PCA and ICA revealed that ADAGE extracts distinct signals. We provide the ADAGE model with analysis of all publicly available P. aeruginosa GeneChip experiments, and we provide open source code for use in other species and settings.

Download Full-text

Co-expressional conservation in virulence and stress related genes of three Gammaproteobacterial species: Escherichia coli, Salmonella enterica and Pseudomonas aeruginosa

Molecular BioSystems ◽

10.1039/c5mb00353a ◽

2015 ◽

Vol 11 (11) ◽

pp. 3137-3148

Author(s):

Nazanin Hosseinkhan ◽

Peyman Zarrineh ◽

Hassan Rokni-Zadeh ◽

Mohammad Reza Ashouri ◽

Ali Masoudi-Nejad

Keyword(s):

Gene Expression ◽

Escherichia Coli ◽

Pseudomonas Aeruginosa ◽

Systems Biology ◽

Gene Expression Data ◽

High Throughput ◽

Expression Data ◽

P Gene ◽

Stress Related Genes ◽

High Throughput Gene Expression

Gene co-expression analysis is one of the main aspects of systems biology that uses high-throughput gene expression data.

Download Full-text

BioLattice: A framework for the biological interpretation of microarray gene expression data using concept lattice analysis

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2007.10.003 ◽

2008 ◽

Vol 41 (2) ◽

pp. 232-241 ◽

Cited By ~ 10

Author(s):

Jihun Kim ◽

Hee-Joon Chung ◽

Yong Jung ◽

Kack-Kyun Kim ◽

Ju Han Kim

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Concept Lattice ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Biological Interpretation ◽

Microarray Gene

Download Full-text

Comprehensive biological interpretation of gene signatures using semantic distributed representation

10.1101/846691 ◽

2019 ◽

Author(s):

Yuumi Okuzono ◽

Takashi Hoshino

Keyword(s):

Gene Expression ◽

Conventional Method ◽

Gene Expression Data ◽

Gene Signature ◽

Pathway Enrichment Analysis ◽

Distributed Representation ◽

Expression Data ◽

Gene Signatures ◽

Biological Interpretation ◽

Signature Vector

AbstractRecent rise of microarray and next-generation sequencing in genome-related fields has simplified obtaining gene expression data at whole gene level, and biological interpretation of gene signatures related to life phenomena and diseases has become very important. However, the conventional method is numerical comparison of gene signature, pathway, and gene ontology (GO) overlap and distribution bias, and it is not possible to compare the specificity and importance of genes contained in gene signatures as humans do.This study proposes the gene signature vector (GsVec), a unique method for interpreting gene signatures that clarifies the semantic relationship between gene signatures by incorporating a method of distributed document representation from natural language processing (NLP). In proposed algorithm, a gene-topic vector is created by multiplying the feature vector based on the gene’s distributed representation by the probability of the gene signature topic and the low frequency of occurrence of the corresponding gene in all gene signatures. These vectors are concatenated for genes included in each gene signature to create a signature vector. The degrees of similarity between signature vectors are obtained from the cosine distances, and the levels of relevance between gene signatures are quantified.Using the above algorithm, GsVec learned approximately 5,000 types of canonical pathway and GO biological process gene signatures published in the Molecular Signatures Database (MSigDB). Then, validation of the pathway database BioCarta with known biological significance and validation using actual gene expression data (differentially expressed genes) were performed, and both were able to obtain biologically valid results. In addition, the results compared with the pathway enrichment analysis in Fisher’s exact test used in the conventional method resulted in equivalent or more biologically valid signatures. Furthermore, although NLP is generally developed in Python, GsVec can execute the entire process in only the R language, the main language of bioinformatics.

Download Full-text

Biological interpretation of deep neural network for phenotype prediction based on gene expression

BMC Bioinformatics ◽

10.1186/s12859-020-03836-4 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Blaise Hanczar ◽

Farida Zehraoui ◽

Tina Issa ◽

Mathieu Arles

Keyword(s):

Neural Network ◽

Gene Expression ◽

Deep Learning ◽

Gene Expression Data ◽

Deep Neural Network ◽

Expression Profiles ◽

Biological Knowledge ◽

Expression Data ◽

Phenotype Prediction ◽

Biological Interpretation

Abstract Background The use of predictive gene signatures to assist clinical decision is becoming more and more important. Deep learning has a huge potential in the prediction of phenotype from gene expression profiles. However, neural networks are viewed as black boxes, where accurate predictions are provided without any explanation. The requirements for these models to become interpretable are increasing, especially in the medical field. Results We focus on explaining the predictions of a deep neural network model built from gene expression data. The most important neurons and genes influencing the predictions are identified and linked to biological knowledge. Our experiments on cancer prediction show that: (1) deep learning approach outperforms classical machine learning methods on large training sets; (2) our approach produces interpretations more coherent with biology than the state-of-the-art based approaches; (3) we can provide a comprehensive explanation of the predictions for biologists and physicians. Conclusion We propose an original approach for biological interpretation of deep learning models for phenotype prediction from gene expression data. Since the model can find relationships between the phenotype and gene expression, we may assume that there is a link between the identified genes and the phenotype. The interpretation can, therefore, lead to new biological hypotheses to be investigated by biologists.

Download Full-text

covRNA - Discovering covariate associations in large-scale gene expression data

10.21203/rs.2.17618/v1 ◽

2019 ◽

Author(s):

Lara H Urban ◽

Christian W Remmele ◽

Marcus Dittrich ◽

Roland F Schwarz ◽

Tobias Müller

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

High Performance ◽

Large Scale ◽

Expression Patterns ◽

Species Abundance ◽

Expression Data ◽

Analysis Workflow ◽

Biological Interpretation ◽

Complex Relationships

Abstract Objective The biological interpretation of gene expression measurements is a challenging task. While ordination methods are routinely used to identify clusters of samples or co-expressed genes, these methods do not take sample or gene annotations into account. We aim to provide a tool that allows users of all backgrounds to assess and visualize the intrinsic correlation structure of complex annotated gene expression data and discover the covariates that jointly affect expression patterns. Results The Bioconductor package covRNA provides a convenient and fast interface for testing and visualizing complex relationships between sample and gene covariates mediated by gene expression data in an entirely unsupervised setting. The relationships between sample and gene covariates are tested by statistical permutation tests and visualized by ordination. The methods are inspired by the fourthcorner and RLQ analyses used in ecological research for the analysis of species abundance data, that we modified to make them suitable for the distributional characteristics of both, RNA-Seq read counts and microarray intensities, and to provide a high-performance parallelized implementation for the analysis of large-scale gene expression data on multi-core computational systems. CovRNA provides additional modules for unsupervised gene filtering and plotting functions to ensure a smooth and coherent analysis workflow.

Download Full-text

The effects of pre-processing and parameter choices on searches through large gene expression data collections

2009 IEEE International Workshop on Genomic Signal Processing and Statistics ◽

10.1109/gensips.2009.5174357 ◽

2009 ◽

Author(s):

Matthew A. Hibbs

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Large Gene ◽

Data Collections

Download Full-text

Normalization and Gene p-Value Estimation: Issues in Microarray Data Processing

Bioinformatics and Biology Insights ◽

10.4137/bbi.s441 ◽

2008 ◽

Vol 2 ◽

pp. BBI.S441 ◽

Cited By ~ 14

Author(s):

Katrin Fundel ◽

Robert Küffner ◽

Thomas Aigner ◽

Ralf Zimmer

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Data Processing ◽

Prior Knowledge ◽

Gene Expression Data ◽

Expression Data ◽

P Values ◽

Array Normalization ◽

Biological Interpretation ◽

Probe Set

Introduction Numerous methods exist for basic processing, e.g. normalization, of microarray gene expression data. These methods have an important effect on the final analysis outcome. Therefore, it is crucial to select methods appropriate for a given dataset in order to assure the validity and reliability of expression data analysis. Furthermore, biological interpretation requires expression values for genes, which are often represented by several spots or probe sets on a microarray. How to best integrate spot/probe set values into gene values has so far been a somewhat neglected problem. Results We present a case study comparing different between-array normalization methods with respect to the identification of differentially expressed genes. Our results show that it is feasible and necessary to use prior knowledge on gene expression measurements to select an adequate normalization method for the given data. Furthermore, we provide evidence that combining spot/probe set p-values into gene p-values for detecting differentially expressed genes has advantages compared to combining expression values for spots/probe sets into gene expression values. The comparison of different methods suggests to use Stouffer's method for this purpose. The study has been conducted on gene expression experiments investigating human joint cartilage samples of Osteoarthritis related groups: a cDNA microarray (83 samples, four groups) and an Affymetrix (26 samples, two groups) data set. Conclusion The apparently straight forward steps of gene expression data analysis, e.g. between-array normalization and detection of differentially regulated genes, can be accomplished by numerous different methods. We analyzed multiple methods and the possible effects and thereby demonstrate the importance of the single decisions taken during data processing. We give guidelines for evaluating normalization outcomes. An overview of these effects via appropriate measures and plots compared to prior knowledge is essential for the biological interpretation of gene expression measurements.

Download Full-text

Cancer Classification from Gene Expression data using Fuzzy-Rough techniques An Empirical Study

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i6.415420 ◽

2018 ◽

Vol 6 (6) ◽

pp. 415-420

Author(s):

Ansuman Kumar ◽

Anindya Halder

Keyword(s):

Gene Expression ◽

Empirical Study ◽

Gene Expression Data ◽

Cancer Classification ◽

Expression Data

Download Full-text

Statistical methods for analysis of time course gene expression data

Frontiers in Bioscience ◽

10.2741/a743 ◽

2002 ◽

Vol 7 (1) ◽

pp. a90-98 ◽

Cited By ~ 5

Author(s):

Hongzhe Li

Keyword(s):

Gene Expression ◽

Statistical Methods ◽

Gene Expression Data ◽

Time Course ◽

Expression Data

Download Full-text

Faculty Opinions recommendation of A new type of stochastic dependence revealed in gene expression data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1032265.370760 ◽

2006 ◽

Author(s):

Arcady Mushegian

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Stochastic Dependence ◽

New Type

Download Full-text