scholarly journals Fast gene set enrichment analysis

Author(s):  
Gennady Korotkevich ◽  
Vladimir Sukhov ◽  
Alexey Sergushichev

AbstractPreranked gene set enrichment analysis (GSEA) is a widely used method for interpretation of gene expression data in terms of biological processes. Here we present FGSEA method that is able to estimate arbitrarily low GSEA P-values with a higher accuracy and much faster compared to other implementations. We also present a polynomial algorithm to calculate GSEA P-values exactly, which we use to practically confirm the accuracy of the method.

2014 ◽  
Vol 13s1 ◽  
pp. CIN.S13882 ◽  
Author(s):  
Binghuang Cai ◽  
Xia Jiang

Analyzing biological system abnormalities in cancer patients based on measures of biological entities, such as gene expression levels, is an important and challenging problem. This paper applies existing methods, Gene Set Enrichment Analysis and Signaling Pathway Impact Analysis, to pathway abnormality analysis in lung cancer using microarray gene expression data. Gene expression data from studies of Lung Squamous Cell Carcinoma (LUSC) in The Cancer Genome Atlas project, and pathway gene set data from the Kyoto Encyclopedia of Genes and Genomes were used to analyze the relationship between pathways and phenotypes. Results, in the form of pathway rankings, indicate that some pathways may behave abnormally in LUSC. For example, both the cell cycle and viral carcinogenesis pathways ranked very high in LUSC. Furthermore, some pathways that are known to be associated with cancer, such as the p53 and the PI3K-Akt signal transduction pathways, were found to rank high in LUSC. Other pathways, such as bladder cancer and thyroid cancer pathways, were also ranked high in LUSC.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258554
Author(s):  
Marty O. Visscher ◽  
Ping Hu ◽  
Andrew N. Carr ◽  
Charles C. Bascom ◽  
Robert J. Isfort ◽  
...  

At birth, human infants are poised to survive in harsh, hostile conditions. An understanding of the state of newborn skin development and maturation is key to the maintenance of health, optimum response to injury, healing and disease. The observational study collected full-thickness newborn skin samples from 27 infants at surgery and compared them to skin samples from 43 adult sites protected from ultraviolet radiation exposure, as the standard for stable, mature skin. Transcriptomics profiling and gene set enrichment analysis were performed. Statistical analysis established over 25,000 differentially regulated probe sets, representing 10,647 distinct genes, in infant skin compared to adult skin. Gene set enrichment analysis showed a significant increase in 143 biological processes (adjusted p < 0.01) in infant skin, versus adult skin samples, including extracellular matrix (ECM) organization, cell adhesion, collagen fibril organization and fatty acid metabolic process. ECM organization and ECM structure organization were the biological processes in infant skin with the lowest adjusted P-value. Genes involving epidermal development, immune function, cell differentiation, and hair cycle were overexpressed in adults, representing 101 significantly enriched biological processes (adjusted p < 0.01). The processes with the highest significant difference were skin and epidermal development, e.g., keratinocyte differentiation, keratinization and cornification intermediate filament cytoskeleton organization and hair cycle. Enriched Gene Ontology (GO) biological processes also involved immune function, including antigen processing and presentation. When compared to ultraviolet radiation-protected adult skin, our results provide essential insight into infant skin and its ability to support the newborn’s preparedness to survive and flourish, despite the infant’s new environment laden with microbes, high oxygen tension and potential irritants. This fundamental knowledge is expected to guide strategies to protect and preserve the features of unperturbed, young skin.


PLoS ONE ◽  
2014 ◽  
Vol 9 (9) ◽  
pp. e107629 ◽  
Author(s):  
Pui Shan Wong ◽  
Michihiro Tanaka ◽  
Yoshihiko Sunaga ◽  
Masayoshi Tanaka ◽  
Takeaki Taniguchi ◽  
...  

2019 ◽  
Author(s):  
Ludwig Geistlinger ◽  
Gergely Csaba ◽  
Mara Santarelli ◽  
Marcel Ramos ◽  
Lucas Schiffer ◽  
...  

AbstractBackgroundAlthough gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected data sets and biological reasoning on the relevance of resulting enriched gene sets. However, this is typically incomplete and biased towards the goals of individual investigations.ResultsWe present a general framework for standardized and structured benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization, and detection of relevant processes. This framework incorporates a curated compendium of 75 expression data sets investigating 42 different human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods on the benchmark compendium, identifying significant differences in (i) runtime and applicability to RNA-seq data, (ii) fraction of enriched gene sets depending on the type of null hypothesis tested, and (iii) recovery of the a priori defined relevance rankings. Based on these findings, we make practical recommendations on (i) how methods originally developed for microarray data can efficiently be applied to RNA-seq data, (ii) how to interpret results depending on the type of gene set test conducted, and (iii) which methods are best suited to effectively prioritize gene sets with high relevance for the phenotype investigated.ConclusionWe carried out a systematic assessment of existing enrichment methods, and identified best performing methods, but also general shortcomings in how gene set analysis is currently conducted. We provide a directly executable benchmark system for straightforward assessment of additional enrichment methods.Availabilityhttp://bioconductor.org/packages/GSEABenchmarkeR


2021 ◽  
Vol 11 ◽  
Author(s):  
Junyu Huo ◽  
Liqun Wu ◽  
Yunjin Zang

BackgroundThe high mutation rate of TP53 in hepatocellular carcinoma (HCC) makes it an attractive potential therapeutic target. However, the mechanism by which TP53 mutation affects the prognosis of HCC is not fully understood.Material and ApproachThis study downloaded a gene expression profile and clinical-related information from The Cancer Genome Atlas (TCGA) database and the international genome consortium (ICGC) database. We used Gene Set Enrichment Analysis (GSEA) to determine the difference in gene expression patterns between HCC samples with wild-type TP53 (n=258) and mutant TP53 (n=116) in the TCGA cohort. We screened prognosis-related genes by univariate Cox regression analysis and Kaplan–Meier (KM) survival analysis. We constructed a six-gene prognostic signature in the TCGA training group (n=184) by Lasso and multivariate Cox regression analysis. To assess the predictive capability and applicability of the signature in HCC, we conducted internal validation, external validation, integrated analysis and subgroup analysis.ResultsA prognostic signature consisting of six genes (EIF2S1, SEC61A1, CDC42EP2, SRM, GRM8, and TBCD) showed good performance in predicting the prognosis of HCC. The area under the curve (AUC) values of the ROC curve of 1-, 2-, and 3-year survival of the model were all greater than 0.7 in each independent cohort (internal testing cohort, n = 181; TCGA cohort, n = 365; ICGC cohort, n = 229; whole cohort, n = 594; subgroup, n = 9). Importantly, by gene set variation analysis (GSVA) and the single sample gene set enrichment analysis (ssGSEA) method, we found three possible causes that may lead to poor prognosis of HCC: high proliferative activity, low metabolic activity and immunosuppression.ConclusionOur study provides a reliable method for the prognostic risk assessment of HCC and has great potential for clinical transformation.


2019 ◽  
Author(s):  
Heonjong Han ◽  
Sangyoung Lee ◽  
Insuk Lee

ABSTRACTGene set enrichment analysis (GSEA) is a popular tool to identify underlying biological processes in clinical samples using their gene expression phenotypes. GSEA measures the enrichment of annotated gene sets that represent biological processes for differentially expressed genes (DEGs) in clinical samples. GSEA may be suboptimal for functional gene sets, however, because DEGs from the expression dataset may not be functional genes per se but dysregulated genes perturbed by bona fide functional genes. To overcome this shortcoming, we developed network-based GSEA (NGSEA), which measures the enrichment score of functional gene sets using the expression difference of not only individual genes but also their neighbors in the functional network. We found that NGSEA outperformed GSEA in identifying pathway gene sets for matched gene expression phenotypes. We also observed that NGSEA substantially improved the ability to retrieve known anti-cancer drugs from patient-derived gene expression data using drug-target gene sets compared with another method, Connectivity Map. We also repurposed FDA-approved drugs using NGSEA and experimentally validated budesonide as a chemical with anti-cancer effects for colorectal cancer. We, therefore, expect that NGSEA will facilitate both pathway interpretation of gene expression phenotypes and anti-cancer drug repositioning. NGSEA is freely available at www.inetbio.org/ngsea.


2020 ◽  
Author(s):  
Menglan Cai ◽  
Canh Hao Nguyen ◽  
Hiroshi Mamitsuka ◽  
Limin Li

AbstractGene set enrichment analysis (GSEA) has been widely used to identify gene sets with statistically significant difference between cases and controls against a large gene set. GSEA needs both phenotype labels and expression of genes. However, gene expression are assessed more often for model organisms than minor species. More importantly, gene expression could not be measured under specific conditions for human, due to high healthy risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus predicting enrichment significance (on a phenotype) of a given gene set of a species (target, say human), by using gene expression measured under the same phenotype of the other species (source, say mouse) is a vital and challenging problem, which we call CROSS-species Gene Set Enrichment Problem (XGSEP). For XGSEP, we propose XGSEA (Cross-species Gene Set Enrichment Analysis), with three steps of: 1) running GSEA for a source species to obtain enrichment scores and p-values of source gene sets; 2) representing the relation between source and target gene sets by domain adaptation; and 3) using regression to predict p-values of target gene sets, based on the representation in 2). We extensively validated XGSEA by using four real data sets under various settings, proving that XGSEA significantly outperformed three baseline methods. A case study of identifying important human pathways for T cell dysfunction and reprogramming from mouse ATAC-Seq data further confirmed the reliability of XGSEA. Source code is available through https://github.com/LiminLi-xjtu/XGSEAAuthor summaryGene set enrichment analysis (GSEA) is a powerful tool in the gene sets differential analysis given a ranked gene list. GSEA requires complete data, gene expression with phenotype labels. However, gene expression could not be measured under specific conditions for human, due to high risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus no availability of gene expression leads to more challenging problem, CROSS-species Gene Set Enrichment Problem (XGSEP), in which enrichment significance (on a phenotype) of a given gene set of a species (target, say human) is predicted by using gene expression measured under the same phenotype of the other species (source, say mouse). In this work, we propose XGSEA (Cross-species Gene Set Enrichment Analysis) for XGSEP, with three steps of: 1) GSEA; 2) domain adaptation; and 3) regression. The results of four real data sets and a case study indicate that XGSEA significantly outperformed three baseline methods and confirmed the reliability of XGSEA.


Sign in / Sign up

Export Citation Format

Share Document