scholarly journals Statistical power of gene-set enrichment analysis is a function of gene set correlation structure

2017 ◽  
Author(s):  
David M. Swanson

AbstractMotivation:We describe why statistical power for both self-contained and competitive gene-set tests is a function of the correlation structure of co-expressed genes, and why this characteristic is undesirable for gene-set analyses. Variable statistical power as a function of gene correlation structure has not been observed or studied previously. The observation is important in part because gene-set testing methodology is well-developed, yet this fundamental feature of many of its tests is unknown and has the potential to reinterpret past gene-set test results and guide future implementations, including those using sequence data. Type 1 error inflation is also amenable for study in our statistical framework; while it has been well-studied and described previously for both self-contained and competitive tests, it has less often been done in an analytical framework. Our observations apply to four commonly-used gene-set testing approaches for microarrays, including CAMERA, ROAST, SAFE, and GAGE, and a recently proposed one for RNAseq, MAST.Results:We characterize situations in which power is especially small relative to effect sizes of genes in a set for both competitive and self-contained gene-set tests. We propose three alternative tests, one of which replicates the properties of permutation-based self-contained tests, but avoids the need for even recently proposed, rotation-based approximations to permutations. The two other proposed tests have the unique property that statistical power is not a function of co-expression correlation in the gene-set and therefore is the preferred methodology. We compare our proposed tests to leading gene-set tests and apply them to an already-published study of smoking exposure on pregnant women.Contact:[email protected] Material:Online supplementary material includes additional simulation results supporting the relationship between the “mixed” and “directional” gene-set tests of ROAST and closed-form implementations of them.

2020 ◽  
Author(s):  
H. Robert Frost

AbstractSingle cell RNA sequencing (scRNA-seq) is a powerful tool for analyzing complex tissues with recent advances enabling the transcriptomic profiling of thousands to tens-of-thousands of individual cells. Although scRNA-seq provides unprecedented insights into the biology of heterogeneous cell populations, analyzing such data on a gene-by-gene basis is challenging due to the large number of tested hypotheses, high level of technical noise and inflated zero counts. One promising approach for addressing these challenges is gene set testing, or pathway analysis. By combining the expression data for all genes in a pathway, gene set testing can mitigate the impacts of sparsity and noise and improve interpretation, replication and statistical power. Unfortunately, statistical and biological differences between single cell and bulk expression measurements make it challenging to use gene set testing methods originally developed for bulk tissue on scRNA-seq data and progress on single cell-specific methods has been limited. To address this challenge, we have developed a new gene set testing method, variance-adjusted Mahalanobis (VAM), that seamlessly integrates with the Seurat framework and is designed to accommodate the technical noise, sparsity and large sample sizes characteristic of scRNA-seq data. The VAM method computes cell-specific pathway scores to transform a cell-by-gene matrix into a cell-by-pathway matrix that can be used for both exploratory data visualization and statistical gene set enrichment analysis. Because the distribution of these scores under the null of uncorrelated technical noise has an accurate gamma approximation, inference can be performed at both the population and single cell levels. As we demonstrate using both simulation studies and real data analyses, the VAM method provides superior classification accuracy at a lower computation cost relative to existing single sample gene set testing approaches.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jovana Maksimovic ◽  
Alicia Oshlack ◽  
Belinda Phipson

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalization, and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches, and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 2010 ◽  
Author(s):  
Monther Alhamdoosh ◽  
Charity W. Law ◽  
Luyi Tian ◽  
Julie M. Sheridan ◽  
Milica Ng ◽  
...  

Gene set enrichment analysis is a popular approach for prioritising the biological processes perturbed in genomic datasets. The Bioconductor project hosts over 80 software packages capable of gene set analysis. Most of these packages search for enriched signatures amongst differentially regulated genes to reveal higher level biological themes that may be missed when focusing only on evidence from individual genes. With so many different methods on offer, choosing the best algorithm and visualization approach can be challenging. The EGSEA package solves this problem by combining results from up to 12 prominent gene set testing algorithms to obtain a consensus ranking of biologically relevant results.This workflow demonstrates how EGSEA can extend limma-based differential expression analyses for RNA-seq and microarray data using experiments that profile 3 distinct cell populations important for studying the origins of breast cancer. Following data normalization and set-up of an appropriate linear model for differential expression analysis, EGSEA builds gene signature specific indexes that link a wide range of mouse or human gene set collections obtained from MSigDB, GeneSetDB and KEGG to the gene expression data being investigated. EGSEA is then configured and the ensemble enrichment analysis run, returning an object that can be queried using several S4 methods for ranking gene sets and visualizing results via heatmaps, KEGG pathway views, GO graphs, scatter plots and bar plots. Finally, an HTML report that combines these displays can fast-track the sharing of results with collaborators, and thus expedite downstream biological validation. EGSEA is simple to use and can be easily integrated with existing gene expression analysis pipelines for both human and mouse data.


Author(s):  
Jovana Maksimovic ◽  
Alicia Oshlack ◽  
Belinda Phipson

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalisation and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.


2019 ◽  
Vol 8 (10) ◽  
pp. 1580 ◽  
Author(s):  
Kyoung Min Moon ◽  
Kyueng-Whan Min ◽  
Mi-Hye Kim ◽  
Dong-Hoon Kim ◽  
Byoung Kwan Son ◽  
...  

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.


2021 ◽  
Vol 12 (1) ◽  
pp. 009-019
Author(s):  
Ying Yang ◽  
Jin Wang ◽  
Shihai Xu ◽  
Wen Lv ◽  
Fei Shi ◽  
...  

Abstract Background In cancer, kappa B-interacting protein (IKBIP) has rarely been reported. This study aimed at investigating its expression pattern and biological function in brain glioma at the transcriptional level. Methods We selected 301 glioma patients with microarray data from CGGA database and 697 glioma patients with RNAseq data from TCGA database. Transcriptional data and clinical data of 998 samples were analyzed. Statistical analysis and figure generating were performed with R language. Results We found that IKBIP expression showed positive correlation with WHO grade of glioma. IKBIP was increased in isocitrate dehydrogenase (IDH) wild type and mesenchymal molecular subtype of glioma. Gene ontology analysis demonstrated that IKBIP was profoundly associated with extracellular matrix organization, cell–substrate adhesion and response to wounding in both pan-glioma and glioblastoma. Subsequent gene set enrichment analysis revealed that IKBIP was particularly correlated with epithelial-to-mesenchymal transition (EMT). To further elucidate the relationship between IKBIP and EMT, we performed gene set variation analysis to screen the EMT-related signaling pathways and found that IKBIP expression was significantly associated with PI3K/AKT, hypoxia and TGF-β pathway. Moreover, IKBIP expression was found to be synergistic with key biomarkers of EMT, especially with N-cadherin, vimentin, snail, slug and TWIST1. Finally, higher IKBIP indicated significantly shorter survival for glioma patients. Conclusions IKBIP was associated with more aggressive phenotypes of gliomas. Furthermore, IKBIP was significantly involved in EMT and could serve as an independent prognosticator in glioma.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Mike Fang ◽  
Brian Richardson ◽  
Cheryl M. Cameron ◽  
Jean-Eudes Dazard ◽  
Mark J. Cameron

Abstract Background In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets. Results We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting. Conclusions dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.


2011 ◽  
Vol 10 (4) ◽  
pp. 3856-3887 ◽  
Author(s):  
Q.Y. Ning ◽  
J.Z. Wu ◽  
N. Zang ◽  
J. Liang ◽  
Y.L. Hu ◽  
...  

2021 ◽  
Vol 27 ◽  
Author(s):  
Aoshuang Qi ◽  
Mingyi Ju ◽  
Yinfeng Liu ◽  
Jia Bi ◽  
Qian Wei ◽  
...  

Background: Complex antigen processing and presentation processes are involved in the development and progression of breast cancer (BC). A single biomarker is unlikely to adequately reflect the complex interplay between immune cells and cancer; however, there have been few attempts to find a robust antigen processing and presentation-related signature to predict the survival outcome of BC patients with respect to tumor immunology. Therefore, we aimed to develop an accurate gene signature based on immune-related genes for prognosis prediction of BC.Methods: Information on BC patients was obtained from The Cancer Genome Atlas. Gene set enrichment analysis was used to confirm the gene set related to antigen processing and presentation that contributed to BC. Cox proportional regression, multivariate Cox regression, and stratified analysis were used to identify the prognostic power of the gene signature. Differentially expressed mRNAs between high- and low-risk groups were determined by KEGG analysis.Results: A three-gene signature comprising HSPA5 (heat shock protein family A member 5), PSME2 (proteasome activator subunit 2), and HLA-F (major histocompatibility complex, class I, F) was significantly associated with OS. HSPA5 and PSME2 were protective (hazard ratio (HR) < 1), and HLA-F was risky (HR > 1). Risk score, estrogen receptor (ER), progesterone receptor (PR) and PD-L1 were independent prognostic indicators. KIT and ACACB may have important roles in the mechanism by which the gene signature regulates prognosis of BC.Conclusion: The proposed three-gene signature is a promising biomarker for estimating survival outcomes in BC patients.


Sign in / Sign up

Export Citation Format

Share Document