Statistical and Biological Evaluation of Different Gene Set Analysis Methods

A Biological Evaluation of Six Gene Set Analysis Methods for Identification of Differentially Expressed Pathways in Microarray Data

Cancer Informatics ◽

10.4137/cin.s867 ◽

2008 ◽

Vol 6 ◽

pp. CIN.S867 ◽

Cited By ~ 10

Author(s):

Irina Dinu ◽

Qi Liu ◽

John D. Potter ◽

Adeniyi J. Adewale ◽

Gian S. Jhangri ◽

...

Keyword(s):

Differential Expression ◽

Microarray Data ◽

Biological Evaluation ◽

Gene Set Enrichment Analysis ◽

The Other ◽

Differentially Expressed ◽

Gene Set Analysis ◽

Gene Set ◽

Analysis Methods ◽

Gene Sets

Gene-set analysis of microarray data evaluates biological pathways, or gene sets, for their differential expression by a phenotype of interest. In contrast to the analysis of individual genes, gene-set analysis utilizes existing biological knowledge of genes and their pathways in assessing differential expression. This paper evaluates the biological performance of five gene-set analysis methods testing “self-contained null hypotheses” via subject sampling, along with the most popular gene-set analysis method, Gene Set Enrichment Analysis (GSEA). We use three real microarray analyses in which differentially expressed gene sets are predictable biologically from the phenotype. Two types of gene sets are considered for this empirical evaluation: one type contains “truly positive” sets that should be identified as differentially expressed; and the other type contains “truly negative” sets that should not be identified as differentially expressed. Our evaluation suggests advantages of SAM-GS, Global, and ANCOVA Global methods over GSEA and the other two methods.

Download Full-text

Performance Comparison of Two Gene Set Analysis Methods for Genome-wide Association Study Results: GSA-SNP vs i-GSEA4GWAS

Genomics & Informatics ◽

10.5808/gi.2012.10.2.123 ◽

2012 ◽

Vol 10 (2) ◽

pp. 123 ◽

Cited By ~ 3

Author(s):

Ji-sun Kwon ◽

Jihye Kim ◽

Dougu Nam ◽

Sangsoo Kim

Keyword(s):

Association Study ◽

Genome Wide Association Study ◽

Performance Comparison ◽

Genome Wide Association ◽

Gene Set Analysis ◽

Gene Set ◽

Analysis Methods ◽

Genome Wide ◽

Study Results

Download Full-text

Measuring consistency among gene set analysis methods: A systematic study

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720019400109 ◽

2019 ◽

Vol 17 (05) ◽

pp. 1940010 ◽

Cited By ~ 1

Author(s):

Farhad Maleki ◽

Katie L. Ovens ◽

Daniel J. Hogan ◽

Elham Rezaei ◽

Alan M. Rosenberg ◽

...

Keyword(s):

Gene Set Analysis ◽

Rna Seq ◽

Systematic Analysis ◽

Gene Set ◽

Large Gene ◽

Analysis Methods ◽

Gene Sets ◽

Significant Gene ◽

Biological Insight ◽

Relevant Gene

Gene set analysis is a quantitative approach for generating biological insight from gene expression datasets. The abundance of gene set analysis methods speaks to their popularity, but raises the question of the extent to which results are affected by the choice of method. Our systematic analysis of 13 popular methods using 6 different datasets, from both DNA microarray and RNA-Seq origin, shows that this choice matters a great deal. We observed that the overall number of gene sets reported by each method differed by up to 2 orders of magnitude, and there was a bias toward reporting large gene sets with some methods. Furthermore, there was substantial disagreement between the 20 most statistically significant gene sets reported by the methods. This was also observed when expanding to the 100 most statistically significant reported gene sets. For different datasets of the same phenotype/condition, the top 20 and top 100 most significant results also showed little to no agreement even when using the same method. GAGE, PAGE, and ORA were the only methods able to achieve relatively high reproducibility when comparing the 20 and 100 most statistically significant gene sets. Biological validation on a juvenile idiopathic arthritis (JIA) dataset showed wide variation in terms of the relevance of the top 20 and top 100 most significant gene sets to known biology of the disease, where GAGE predicted the most relevant gene sets, followed by GSEA, ORA, and PAGE.

Download Full-text

Gene set analysis methods: statistical models and methodological differences

Briefings in Bioinformatics ◽

10.1093/bib/bbt002 ◽

2013 ◽

Vol 15 (4) ◽

pp. 504-518 ◽

Cited By ~ 77

Author(s):

H. Maciejewski

Keyword(s):

Statistical Models ◽

Gene Set Analysis ◽

Gene Set ◽

Analysis Methods

Download Full-text

Comparative evaluation of gene-set analysis methods

BMC Bioinformatics ◽

10.1186/1471-2105-8-431 ◽

2007 ◽

Vol 8 (1) ◽

pp. 431 ◽

Cited By ~ 69

Author(s):

Qi Liu ◽

Irina Dinu ◽

Adeniyi J Adewale ◽

John D Potter ◽

Yutaka Yasui

Keyword(s):

Comparative Evaluation ◽

Gene Set Analysis ◽

Gene Set ◽

Analysis Methods

Download Full-text

Gene Set Overlap: An Impediment to Achieving High Specificity in Over-representation Analysis

10.1101/319145 ◽

2018 ◽

Cited By ~ 1

Author(s):

Farhad Maleki ◽

Anthony J. Kusalik

Keyword(s):

Systematic Approach ◽

False Positive Rate ◽

High Specificity ◽

Gene Set Analysis ◽

Strong Negative Correlation ◽

Gene Set ◽

Analysis Methods ◽

Positive Rate ◽

New Gene ◽

Analyze Data

AbstractGene set analysis methods are widely used to analyze data from high-throughput “omics” technologies. One drawback of these methods is their low specificity or high false positive rate. Over-representation analysis is one of the most commonly used gene set analysis methods. In this paper, we propose a systematic approach to investigate the hypothesis that gene set overlap is an underlying cause of low specificity in over-representation analysis. We quantify gene set overlap and show that it is a ubiquitous phenomenon across gene set databases. Statistical analysis indicates a strong negative correlation between gene set overlap and the specificity of over-representation analysis. We conclude that gene set overlap is an underlying cause of the low specificity. This result highlights the importance of considering gene set overlap in gene set analysis and explains the lack of specificity of methods that ignore gene set overlap. This research also establishes the direction for developing new gene set analysis methods.

Download Full-text

Effect of the absolute statistic on gene-sampling gene-set analysis methods

Statistical Methods in Medical Research ◽

10.1177/0962280215574014 ◽

2015 ◽

Vol 26 (3) ◽

pp. 1248-1260 ◽

Cited By ~ 2

Author(s):

Dougu Nam

Keyword(s):

False Positive ◽

Genome Wide Association Study ◽

False Positive Rate ◽

Gene Set Enrichment Analysis ◽

Gene Set Analysis ◽

Receiver Operating Curve ◽

Gene Set ◽

Analysis Methods ◽

Positive Rate ◽

The Absolute

Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.

Download Full-text

Silver: Forging almost Gold Standard Datasets

Genes ◽

10.3390/genes12101523 ◽

2021 ◽

Vol 12 (10) ◽

pp. 1523

Author(s):

Farhad Maleki ◽

Katie Ovens ◽

Ian McQuillan ◽

Anthony J. Kusalik

Keyword(s):

Gold Standard ◽

Best Practice ◽

Evaluation Studies ◽

A Priori ◽

Real Data ◽

Gene Set Analysis ◽

Gene Set ◽

Analysis Methods ◽

Gene Sets ◽

New Gene

Gene set analysis has been widely used to gain insight from high-throughput expression studies. Although various tools and methods have been developed for gene set analysis, there is no consensus among researchers regarding best practice(s). Most often, evaluation studies have reported contradictory recommendations of which methods are superior. Therefore, an unbiased quantitative framework for evaluations of gene set analysis methods will be valuable. Such a framework requires gene expression datasets where enrichment status of gene sets is known a priori. In the absence of such gold standard datasets, artificial datasets are commonly used for evaluations of gene set analysis methods; however, they often rely on oversimplifying assumptions that make them biased in favor of or against a given method. In this paper, we propose a quantitative framework for evaluation of gene set analysis methods by synthesizing expression datasets using real data, without relying on oversimplifying or unrealistic assumptions, while preserving complex gene–gene correlations and retaining the distribution of expression values. The utility of the quantitative approach is shown by evaluating ten widely used gene set analysis methods. An implementation of the proposed method is publicly available. We suggest using Silver to evaluate existing and new gene set analysis methods. Evaluation using Silver provides a better understanding of current methods and can aid in the development of gene set analysis methods to achieve higher specificity without sacrificing sensitivity.

Download Full-text

Gene set analysis methods: a systematic comparison

BioData Mining ◽

10.1186/s13040-018-0166-8 ◽

2018 ◽

Vol 11 (1) ◽

Cited By ~ 18

Author(s):

Ravi Mathur ◽

Daniel Rotroff ◽

Jun Ma ◽

Ali Shojaie ◽

Alison Motsinger-Reif

Keyword(s):

Gene Set Analysis ◽

Systematic Comparison ◽

Gene Set ◽

Analysis Methods

Download Full-text

Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16

BMC Proceedings ◽

10.1186/1753-6561-3-s7-s96 ◽

2009 ◽

Vol 3 (S7) ◽

Cited By ~ 30

Author(s):

Nathan L Tintle ◽

Bryce Borchers ◽

Marshall Brown ◽

Airat Bekmetjev

Keyword(s):

Single Nucleotide Polymorphism ◽

Genetic Analysis ◽

Genetic Analysis Workshop ◽

Gene Set Analysis ◽

Single Nucleotide Polymorphism Data ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Gene Set ◽

Analysis Methods ◽

Polymorphism Data

Download Full-text