scholarly journals ADAGE signature analysis: differential expression analysis with data-defined gene sets

2017 ◽  
Author(s):  
Jie Tan ◽  
Matthew Huyck ◽  
Dongbo Hu ◽  
René A. Zelaya ◽  
Deborah A. Hogan ◽  
...  

AbstractBackgroundGene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data.ResultsHere we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server (http://adage.greenelab.com) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and Δanr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr.ConclusionsWe designed ADAGE signature analysis to perform gene set analysis using data-defined functional gene signatures. This approach addresses an important gap for biologists studying non-traditional model organisms and those without extensive curated resources available. We built both an R package and web server to provide ADAGE signature analysis to the community.

2018 ◽  
Vol 21 (2) ◽  
pp. 74-83
Author(s):  
Tzu-Hung Hsiao ◽  
Yu-Chiao Chiu ◽  
Yu-Heng Chen ◽  
Yu-Ching Hsu ◽  
Hung-I Harry Chen ◽  
...  

Aim and Objective: The number of anticancer drugs available currently is limited, and some of them have low treatment response rates. Moreover, developing a new drug for cancer therapy is labor intensive and sometimes cost prohibitive. Therefore, “repositioning” of known cancer treatment compounds can speed up the development time and potentially increase the response rate of cancer therapy. This study proposes a systems biology method for identifying new compound candidates for cancer treatment in two separate procedures. Materials and Methods: First, a “gene set–compound” network was constructed by conducting gene set enrichment analysis on the expression profile of responses to a compound. Second, survival analyses were applied to gene expression profiles derived from four breast cancer patient cohorts to identify gene sets that are associated with cancer survival. A “cancer–functional gene set– compound” network was constructed, and candidate anticancer compounds were identified. Through the use of breast cancer as an example, 162 breast cancer survival-associated gene sets and 172 putative compounds were obtained. Results: We demonstrated how to utilize the clinical relevance of previous studies through gene sets and then connect it to candidate compounds by using gene expression data from the Connectivity Map. Specifically, we chose a gene set derived from a stem cell study to demonstrate its association with breast cancer prognosis and discussed six new compounds that can increase the expression of the gene set after the treatment. Conclusion: Our method can effectively identify compounds with a potential to be “repositioned” for cancer treatment according to their active mechanisms and their association with patients’ survival time.


2019 ◽  
Author(s):  
Heonjong Han ◽  
Sangyoung Lee ◽  
Insuk Lee

ABSTRACTGene set enrichment analysis (GSEA) is a popular tool to identify underlying biological processes in clinical samples using their gene expression phenotypes. GSEA measures the enrichment of annotated gene sets that represent biological processes for differentially expressed genes (DEGs) in clinical samples. GSEA may be suboptimal for functional gene sets, however, because DEGs from the expression dataset may not be functional genes per se but dysregulated genes perturbed by bona fide functional genes. To overcome this shortcoming, we developed network-based GSEA (NGSEA), which measures the enrichment score of functional gene sets using the expression difference of not only individual genes but also their neighbors in the functional network. We found that NGSEA outperformed GSEA in identifying pathway gene sets for matched gene expression phenotypes. We also observed that NGSEA substantially improved the ability to retrieve known anti-cancer drugs from patient-derived gene expression data using drug-target gene sets compared with another method, Connectivity Map. We also repurposed FDA-approved drugs using NGSEA and experimentally validated budesonide as a chemical with anti-cancer effects for colorectal cancer. We, therefore, expect that NGSEA will facilitate both pathway interpretation of gene expression phenotypes and anti-cancer drug repositioning. NGSEA is freely available at www.inetbio.org/ngsea.


2018 ◽  
Vol 35 (14) ◽  
pp. 2362-2370 ◽  
Author(s):  
Catharina Lippmann ◽  
Alfred Ultsch ◽  
Jörn Lötsch

Abstract Motivation The genetic architecture of diseases becomes increasingly known. This raises difficulties in picking suitable targets for further research among an increasing number of candidates. Although expression based methods of gene set reduction are applied to laboratory-derived genetic data, the analysis of topical sets of genes gathered from knowledge bases requires a modified approach as no quantitative information about gene expression is available. Results We propose a computational functional genomics-based approach at reducing sets of genes to the most relevant items based on the importance of the gene within the polyhierarchy of biological processes characterizing the disease. Knowledge bases about the biological roles of genes can provide a valid description of traits or diseases represented as a directed acyclic graph (DAG) picturing the polyhierarchy of disease relevant biological processes. The proposed method uses a gene importance score derived from the location of the gene-related biological processes in the DAG. It attempts to recreate the DAG and thereby, the roles of the original gene set, with the least number of genes in descending order of importance. This obtained precision and recall of over 70% to recreate the components of the DAG charactering the biological functions of n=540 genes relevant to pain with a subset of only the k=29 best-scoring genes. Conclusions A new method for reduction of gene sets is shown that is able to reproduce the biological processes in which the full gene set is involved by over 70%; however, by using only ∼5% of the original genes. Availability and implementation The necessary numerical parameters for the calculation of gene importance are implemented in the R package dbtORA at https://github.com/IME-TMP-FFM/dbtORA. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Yan Tan ◽  
Jernej Godec ◽  
Felix Wu ◽  
Pablo Tamayo ◽  
Jill P. Mesirov ◽  
...  

AbstractGene set enrichment analysis (GSEA) is a widely employed method for analyzing gene expression profiles. The approach uses annotated sets of genes, identifies those that are coordinately up‐ or down-regulated in a biological comparison of interest, and thereby elucidates underlying biological processes relevant to the comparison. As the number of gene sets available in various collections for enrichment analysis has grown, the resulting lists of significant differentially regulated gene sets may also become larger, leading to the need for additional downstream analysis of GSEA results. Here we present a method that allows the rapid identification of a small number of co-regulated groups of genes – “leading edge metagenes” (LEMs) - from high scoring sets in GSEA results. LEM are sub-signatures which are common to multiple gene sets and that “explain” their enrichment specific to the experimental dataset of interest. We show that LEMs contain more refined lists of context-dependent and biologically meaningful genes than the parental gene sets. LEM analysis of the human vaccine response using a large database of immune signatures identified core biological processes induced by five different vaccines in datasets from human peripheral blood mononuclear cells (PBMC). Further study of these biological processes over time following vaccination showed that at day 3 post-vaccination, vaccines derived from viruses or viral subunits exhibit patterns of biological processes that are distinct from protein conjugate vaccines; however, by day 7 these differences were less pronounced. This suggests that the immune response to diverse vaccines eventually converge to a common transcriptional response. LEM analysis can significantly reduce the dimensionality of enriched gene sets, improve the identification of core biological processes active in a comparison of interest, and simplify the biological interpretation of GSEA results.Author SummaryGenome-wide expression profiling is a widely used tool to identify biological mechanisms in a comparison of interest. One analytic method, Gene set enrichment analysis (GSEA) uses annotated sets of genes and identifies those that are coordinately up‐ or down-regulated in a biological comparison of interest. This approach capitalizes on the fact that alternations in biological processes often cause the coordinated change of a large number of genes. However, as the number of gene sets available in various collections for enrichment analysis has grown, the resulting lists of significant differentially regulated gene sets may also become larger, leading to the need for additional downstream analysis of GSEA results. Here we present a method that allows the identification of a small number of co-regulated groups of genes – “leading edge metagenes” (LEMs) – from high scoring sets in GSEA results. We show that LEMs contain more refined lists of context-dependent biologically meaningful genes than the parental gene sets and demonstrate the utility of this approach in analyzing the transcriptional response to vaccination. LEM analysis can significantly reduce the dimensionality of enriched gene sets, improve the identification of core biological processes active in a comparison of interest, and facilitate the biological interpretation of GSEA results.


2008 ◽  
Vol 6 ◽  
pp. CIN.S867 ◽  
Author(s):  
Irina Dinu ◽  
Qi Liu ◽  
John D. Potter ◽  
Adeniyi J. Adewale ◽  
Gian S. Jhangri ◽  
...  

Gene-set analysis of microarray data evaluates biological pathways, or gene sets, for their differential expression by a phenotype of interest. In contrast to the analysis of individual genes, gene-set analysis utilizes existing biological knowledge of genes and their pathways in assessing differential expression. This paper evaluates the biological performance of five gene-set analysis methods testing “self-contained null hypotheses” via subject sampling, along with the most popular gene-set analysis method, Gene Set Enrichment Analysis (GSEA). We use three real microarray analyses in which differentially expressed gene sets are predictable biologically from the phenotype. Two types of gene sets are considered for this empirical evaluation: one type contains “truly positive” sets that should be identified as differentially expressed; and the other type contains “truly negative” sets that should not be identified as differentially expressed. Our evaluation suggests advantages of SAM-GS, Global, and ANCOVA Global methods over GSEA and the other two methods.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Chih-Yi Chien ◽  
Ching-Wei Chang ◽  
Chen-An Tsai ◽  
James J. Chen

Gene set analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes thePvalues and FDR (false discovery rate)q-value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.


2019 ◽  
Author(s):  
Tao Fang ◽  
Iakov Davydov ◽  
Daniel Marbach ◽  
Jitao David Zhang

AbstractMotivationCanonical methods for gene-set enrichment analysis assume independence between gene-sets. In practice, heterogeneous gene-sets from diverse sources are frequently combined and used, resulting in gene-sets with overlapping genes. They compromise statistical modelling and complicate interpretation of results.ResultsWe rephrase gene-set enrichment as a regression problem. Given some genes of interest (e.g.a list of hits from an experiment) and gene-sets (e.g.functional annotations or pathways), we aim to identify a sparse list of gene-sets for the genes of interest. In a regression framework, this amounts to identifying a minimum set of gene-sets that optimally predicts whether any gene belongs to the given genes of interest. To accommodate redundancy between gene-sets, we propose regularized regression techniques such as theelastic net.We report that regression-based results are consistent with established gene-set enrichment methods but more parsimonious and interpretable.AvailabilityWe implement the model ingerr(gene-set enrichment with regularized regression), an R package freely available athttps://github.com/TaoDFang/gerrand submitted toBioconductor.Code and data required to reproduce the results of this study are available athttps://github.com/TaoDFang/GeneModuleAnnotationPaper.ContactJitao David Zhang ([email protected]), Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124, 4070 Basel, Switzerland.


2019 ◽  
Vol 8 (10) ◽  
pp. 1580 ◽  
Author(s):  
Kyoung Min Moon ◽  
Kyueng-Whan Min ◽  
Mi-Hye Kim ◽  
Dong-Hoon Kim ◽  
Byoung Kwan Son ◽  
...  

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jovana Maksimovic ◽  
Alicia Oshlack ◽  
Belinda Phipson

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalization, and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches, and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.


Sign in / Sign up

Export Citation Format

Share Document