Statistical power of gene-set enrichment analysis is a function of gene set correlation structure

Mapping Intimacies ◽

10.1101/186288 ◽

2017 ◽

Author(s):

David M. Swanson

Keyword(s):

Statistical Power ◽

Sequence Data ◽

Enrichment Analysis ◽

Analytical Framework ◽

Gene Set Enrichment Analysis ◽

Correlation Structure ◽

Gene Set ◽

Type 1 Error ◽

Gene Set Testing ◽

Supplementary Material

AbstractMotivation:We describe why statistical power for both self-contained and competitive gene-set tests is a function of the correlation structure of co-expressed genes, and why this characteristic is undesirable for gene-set analyses. Variable statistical power as a function of gene correlation structure has not been observed or studied previously. The observation is important in part because gene-set testing methodology is well-developed, yet this fundamental feature of many of its tests is unknown and has the potential to reinterpret past gene-set test results and guide future implementations, including those using sequence data. Type 1 error inflation is also amenable for study in our statistical framework; while it has been well-studied and described previously for both self-contained and competitive tests, it has less often been done in an analytical framework. Our observations apply to four commonly-used gene-set testing approaches for microarrays, including CAMERA, ROAST, SAFE, and GAGE, and a recently proposed one for RNAseq, MAST.Results:We characterize situations in which power is especially small relative to effect sizes of genes in a set for both competitive and self-contained gene-set tests. We propose three alternative tests, one of which replicates the properties of permutation-based self-contained tests, but avoids the need for even recently proposed, rotation-based approximations to permutations. The two other proposed tests have the unique property that statistical power is not a function of co-expression correlation in the gene-set and therefore is the preferred methodology. We compare our proposed tests to leading gene-set tests and apply them to an already-published study of smoking exposure on pregnant women.Contact:[email protected] Material:Online supplementary material includes additional simulation results supporting the relationship between the “mixed” and “directional” gene-set tests of ROAST and closed-form implementations of them.

Download Full-text

Variance-adjusted Mahalanobis (VAM): a fast and accurate method for cell-specific gene set scoring

10.1101/2020.02.18.954321 ◽

2020 ◽

Author(s):

H. Robert Frost

Keyword(s):

Single Cell ◽

Statistical Power ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Specific Gene ◽

Technical Noise ◽

Gene Set ◽

Pathway Gene ◽

Gene Set Testing ◽

A Cell

AbstractSingle cell RNA sequencing (scRNA-seq) is a powerful tool for analyzing complex tissues with recent advances enabling the transcriptomic profiling of thousands to tens-of-thousands of individual cells. Although scRNA-seq provides unprecedented insights into the biology of heterogeneous cell populations, analyzing such data on a gene-by-gene basis is challenging due to the large number of tested hypotheses, high level of technical noise and inflated zero counts. One promising approach for addressing these challenges is gene set testing, or pathway analysis. By combining the expression data for all genes in a pathway, gene set testing can mitigate the impacts of sparsity and noise and improve interpretation, replication and statistical power. Unfortunately, statistical and biological differences between single cell and bulk expression measurements make it challenging to use gene set testing methods originally developed for bulk tissue on scRNA-seq data and progress on single cell-specific methods has been limited. To address this challenge, we have developed a new gene set testing method, variance-adjusted Mahalanobis (VAM), that seamlessly integrates with the Seurat framework and is designed to accommodate the technical noise, sparsity and large sample sizes characteristic of scRNA-seq data. The VAM method computes cell-specific pathway scores to transform a cell-by-gene matrix into a cell-by-pathway matrix that can be used for both exploratory data visualization and statistical gene set enrichment analysis. Because the distribution of these scores under the null of uncorrelated technical noise has an accurate gamma approximation, inference can be performed at both the population and single cell levels. As we demonstrate using both simulation studies and real data analyses, the VAM method provides superior classification accuracy at a lower computation cost relative to existing single sample gene set testing approaches.

Download Full-text

Gene set enrichment analysis for genome-wide DNA methylation data

Genome Biology ◽

10.1186/s13059-021-02388-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jovana Maksimovic ◽

Alicia Oshlack ◽

Belinda Phipson

Keyword(s):

Dna Methylation ◽

Enrichment Analysis ◽

R Package ◽

Gene Set Enrichment Analysis ◽

Methylation Array ◽

Gene Set ◽

Genome Wide ◽

Genome Methylation ◽

Unbiased Gene ◽

Gene Set Testing

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalization, and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches, and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.

Download Full-text

Easy and efficient ensemble gene set testing with EGSEA

F1000Research ◽

10.12688/f1000research.12544.1 ◽

2017 ◽

Vol 6 ◽

pp. 2010 ◽

Cited By ~ 17

Author(s):

Monther Alhamdoosh ◽

Charity W. Law ◽

Luyi Tian ◽

Julie M. Sheridan ◽

Milica Ng ◽

...

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set ◽

P Gene ◽

Wide Range ◽

Gene Set Testing

Gene set enrichment analysis is a popular approach for prioritising the biological processes perturbed in genomic datasets. The Bioconductor project hosts over 80 software packages capable of gene set analysis. Most of these packages search for enriched signatures amongst differentially regulated genes to reveal higher level biological themes that may be missed when focusing only on evidence from individual genes. With so many different methods on offer, choosing the best algorithm and visualization approach can be challenging. The EGSEA package solves this problem by combining results from up to 12 prominent gene set testing algorithms to obtain a consensus ranking of biologically relevant results.This workflow demonstrates how EGSEA can extend limma-based differential expression analyses for RNA-seq and microarray data using experiments that profile 3 distinct cell populations important for studying the origins of breast cancer. Following data normalization and set-up of an appropriate linear model for differential expression analysis, EGSEA builds gene signature specific indexes that link a wide range of mouse or human gene set collections obtained from MSigDB, GeneSetDB and KEGG to the gene expression data being investigated. EGSEA is then configured and the ensemble enrichment analysis run, returning an object that can be queried using several S4 methods for ranking gene sets and visualizing results via heatmaps, KEGG pathway views, GO graphs, scatter plots and bar plots. Finally, an HTML report that combines these displays can fast-track the sharing of results with collaborators, and thus expedite downstream biological validation. EGSEA is simple to use and can be easily integrated with existing gene expression analysis pipelines for both human and mouse data.

Download Full-text

Gene set enrichment analysis for genome-wide DNA methylation data

10.1101/2020.08.24.265702 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jovana Maksimovic ◽

Alicia Oshlack ◽

Belinda Phipson

Keyword(s):

Dna Methylation ◽

Enrichment Analysis ◽

R Package ◽

Gene Set Enrichment Analysis ◽

Methylation Array ◽

Gene Set ◽

Genome Wide ◽

Genome Methylation ◽

Unbiased Gene ◽

Gene Set Testing

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalisation and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.

Download Full-text

Higher Acid-Base Imbalance Associated with Respiratory Failure Could Decrease the Survival of Patients with Scrub Typhus during Intensive Care Unit Stay: A Gene Set Enrichment Analysis

Journal of Clinical Medicine ◽

10.3390/jcm8101580 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1580 ◽

Cited By ~ 1

Author(s):

Kyoung Min Moon ◽

Kyueng-Whan Min ◽

Mi-Hye Kim ◽

Dong-Hoon Kim ◽

Byoung Kwan Son ◽

...

Keyword(s):

Intensive Care Unit ◽

Intensive Care ◽

Respiratory Failure ◽

Scrub Typhus ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Acid Base ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.

Download Full-text

IKBIP is a novel EMT-related biomarker and predicts poor survival in glioma

Translational Neuroscience ◽

10.1515/tnsci-2021-0002 ◽

2021 ◽

Vol 12 (1) ◽

pp. 009-019

Author(s):

Ying Yang ◽

Jin Wang ◽

Shihai Xu ◽

Wen Lv ◽

Fei Shi ◽

...

Keyword(s):

Epithelial To Mesenchymal Transition ◽

Molecular Subtype ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Transcriptional Level ◽

Who Grade ◽

Interacting Protein ◽

Mesenchymal Transition ◽

Gene Set ◽

Substrate Adhesion

Abstract Background In cancer, kappa B-interacting protein (IKBIP) has rarely been reported. This study aimed at investigating its expression pattern and biological function in brain glioma at the transcriptional level. Methods We selected 301 glioma patients with microarray data from CGGA database and 697 glioma patients with RNAseq data from TCGA database. Transcriptional data and clinical data of 998 samples were analyzed. Statistical analysis and figure generating were performed with R language. Results We found that IKBIP expression showed positive correlation with WHO grade of glioma. IKBIP was increased in isocitrate dehydrogenase (IDH) wild type and mesenchymal molecular subtype of glioma. Gene ontology analysis demonstrated that IKBIP was profoundly associated with extracellular matrix organization, cell–substrate adhesion and response to wounding in both pan-glioma and glioblastoma. Subsequent gene set enrichment analysis revealed that IKBIP was particularly correlated with epithelial-to-mesenchymal transition (EMT). To further elucidate the relationship between IKBIP and EMT, we performed gene set variation analysis to screen the EMT-related signaling pathways and found that IKBIP expression was significantly associated with PI3K/AKT, hypoxia and TGF-β pathway. Moreover, IKBIP expression was found to be synergistic with key biomarkers of EMT, especially with N-cadherin, vimentin, snail, slug and TWIST1. Finally, higher IKBIP indicated significantly shorter survival for glioma patients. Conclusions IKBIP was associated with more aggressive phenotypes of gliomas. Furthermore, IKBIP was significantly involved in EMT and could serve as an independent prognosticator in glioma.

Download Full-text

Drug perturbation gene set enrichment analysis (dpGSEA): a new transcriptomic drug screening approach

BMC Bioinformatics ◽

10.1186/s12859-020-03929-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Mike Fang ◽

Brian Richardson ◽

Cheryl M. Cameron ◽

Jean-Eudes Dazard ◽

Mark J. Cameron

Keyword(s):

Drug Targets ◽

T Regulatory Cells ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Regulatory Cells ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Gastroenteropancreatic Neuroendocrine Tumor ◽

Public Datasets

Abstract Background In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets. Results We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting. Conclusions dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.

Download Full-text

Key pathways involved in prostate cancer based on gene set enrichment analysis and meta analysis

Genetics and Molecular Research ◽

10.4238/2011.december.14.10 ◽

2011 ◽

Vol 10 (4) ◽

pp. 3856-3887 ◽

Cited By ~ 11

Author(s):

Q.Y. Ning ◽

J.Z. Wu ◽

N. Zang ◽

J. Liang ◽

Y.L. Hu ◽

...

Keyword(s):

Prostate Cancer ◽

Meta Analysis ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set Enrichment ◽

Gene Set ◽

Key Pathways

Download Full-text

GSEA (gene set enrichment analysis)

Encyclopedia of Genetics, Genomics, Proteomics and Informatics ◽

10.1007/978-1-4020-6754-9_7187 ◽

2008 ◽

pp. 827-827

Keyword(s):

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set Enrichment ◽

Gene Set

Download Full-text

Development of a Novel Prognostic Signature Based on Antigen Processing and Presentation in Patients with Breast Cancer

Pathology & Oncology Research ◽

10.3389/pore.2021.600727 ◽

2021 ◽

Vol 27 ◽

Author(s):

Aoshuang Qi ◽

Mingyi Ju ◽

Yinfeng Liu ◽

Jia Bi ◽

Qian Wei ◽

...

Keyword(s):

Breast Cancer ◽

Antigen Processing ◽

Cox Regression ◽

Enrichment Analysis ◽

Gene Signature ◽

Gene Set Enrichment Analysis ◽

The Cancer Genome Atlas ◽

Prognostic Indicators ◽

Gene Set ◽

Antigen Processing And Presentation

Background: Complex antigen processing and presentation processes are involved in the development and progression of breast cancer (BC). A single biomarker is unlikely to adequately reflect the complex interplay between immune cells and cancer; however, there have been few attempts to find a robust antigen processing and presentation-related signature to predict the survival outcome of BC patients with respect to tumor immunology. Therefore, we aimed to develop an accurate gene signature based on immune-related genes for prognosis prediction of BC.Methods: Information on BC patients was obtained from The Cancer Genome Atlas. Gene set enrichment analysis was used to confirm the gene set related to antigen processing and presentation that contributed to BC. Cox proportional regression, multivariate Cox regression, and stratified analysis were used to identify the prognostic power of the gene signature. Differentially expressed mRNAs between high- and low-risk groups were determined by KEGG analysis.Results: A three-gene signature comprising HSPA5 (heat shock protein family A member 5), PSME2 (proteasome activator subunit 2), and HLA-F (major histocompatibility complex, class I, F) was significantly associated with OS. HSPA5 and PSME2 were protective (hazard ratio (HR) < 1), and HLA-F was risky (HR > 1). Risk score, estrogen receptor (ER), progesterone receptor (PR) and PD-L1 were independent prognostic indicators. KIT and ACACB may have important roles in the mechanism by which the gene signature regulates prognosis of BC.Conclusion: The proposed three-gene signature is a promising biomarker for estimating survival outcomes in BC patients.

Download Full-text