Benchmarker: an unbiased, association-data-driven strategy to evaluate gene prioritization algorithms

Mapping Intimacies ◽

10.1101/497602 ◽

2018 ◽

Author(s):

Rebecca S. Fine ◽

Tune H. Pers ◽

Tiffany Amariuta ◽

Soumya Raychaudhuri ◽

Joel N. Hirschhorn

Keyword(s):

Association Studies ◽

Unmet Need ◽

Gene Prioritization ◽

Data Sources ◽

Data Driven ◽

Genome Wide Association Studies ◽

Gene Sets ◽

Genome Wide ◽

Combining Data ◽

Association Data

Genome-wide association studies (GWAS) are valuable for understanding human biology, but associated loci typically contain multiple associated variants and genes. Thus, algorithms that prioritize likely causal genes and variants for a given phenotype can provide biological interpretations of association data. However, a critical, currently missing capability is to objectively compare performance of such algorithms. Typical comparisons rely on "gold standard" genes harboring causal coding variants, but such gold standards may be biased and incomplete. To address this issue, we developed Benchmarker, an unbiased, data-driven benchmarking method that compares performance of prioritization strategies to each other (and to random chance) by leave-one-chromosome-out cross-validation with stratified linkage disequilibrium (LD) score regression. We first applied Benchmarker to twenty well-powered GWAS and compared gene prioritization based on strategies employing three different data sources, including annotated gene sets and gene expression. No individual strategy clearly outperformed the others, but genes prioritized by multiple strategies had higher per-SNP heritability than those prioritized by one strategy only. We also compared two gene prioritization methods, DEPICT and MAGMA; genes prioritized by both methods strongly outperformed genes prioritized by only one. Our results suggest that combining data sources and algorithms should pinpoint higher quality genes for follow-up. Benchmarker provides an unbiased approach to evaluate any method that provides genome-wide prioritization of gene sets, genes, or variants, and can determine the best such method for any particular GWAS. Our method addresses an important unmet need for rigorous tool assessment and can assist in mapping genetic associations to causal function.

Download Full-text

Genome-wide meta-analysis of insomnia in over 2.3 million individuals implicates involvement of specific biological pathways through gene-prioritization

10.1101/2020.12.07.20245209 ◽

2020 ◽

Author(s):

Kyoko Watanabe ◽

Philip R. Jansen ◽

Jeanne E. Savage ◽

Priyanka Nandakumar ◽

Xin Wang ◽

...

Keyword(s):

Statistical Power ◽

Association Studies ◽

Meta Analysis ◽

Tissue Expression ◽

Gene Prioritization ◽

Specific Gene ◽

Genome Wide Association Studies ◽

Gene Sets ◽

Genome Wide ◽

Novel Strategy

AbstractInsomnia is a heritable, highly prevalent sleep disorder, for which no sufficient treatment currently exists. Previous genome-wide association studies (GWASs) with up to 1.3 million subjects identified over 200 associated loci. This extreme polygenicity suggested many more loci to be discovered. The current study almost doubled the sample size to over 2.3 million individuals thereby increasing statistical power. We identified 554 risk loci (confirming 190 previously associated loci and detecting 364 novel), and capitalizing on this large number of loci, we propose a novel strategy to prioritize genes using external biological resources and information on functional interactions between genes across risk loci. Of all 3,898 genes naively implicated from the risk loci, we prioritize 289. For these, we find brain-tissue expression specificity and enrichment in specific gene-sets of synaptic signaling functions and neuronal differentiation. We show that the novel gene prioritization strategy yields specific hypotheses on causal mechanisms underlying insomnia, which would not fully have been detected using traditional approaches.

Download Full-text

Guilt by rewiring: gene prioritization through network rewiring in Genome Wide Association Studies

Human Molecular Genetics ◽

10.1093/hmg/ddt668 ◽

2013 ◽

Vol 23 (10) ◽

pp. 2780-2790 ◽

Cited By ~ 32

Author(s):

L. Hou ◽

M. Chen ◽

C. K. Zhang ◽

J. Cho ◽

H. Zhao

Keyword(s):

Association Studies ◽

Gene Prioritization ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Network Rewiring

Download Full-text

RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID

PLoS Genetics ◽

10.1371/journal.pgen.1009315 ◽

2021 ◽

Vol 17 (1) ◽

pp. e1009315

Author(s):

Ardalan Naseri ◽

Junjie Shi ◽

Xihong Lin ◽

Shaojie Zhang ◽

Degui Zhi

Keyword(s):

Large Scale ◽

Association Studies ◽

Scale Up ◽

Data Driven ◽

Genome Wide Association Studies ◽

Inference Method ◽

Genome Wide ◽

Familial Relationship ◽

Kinship Coefficients ◽

Data Driven Approach

Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.

Download Full-text

Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases

10.1101/2020.09.08.20190561 ◽

2020 ◽

Cited By ~ 1

Author(s):

Elle M Weeks ◽

Jacob C Ulirsch ◽

Nathan Y Cheng ◽

Brian L Trippe ◽

Rebecca S Fine ◽

...

Keyword(s):

Complex Traits ◽

Association Studies ◽

Gene Prioritization ◽

Protein Interaction Data ◽

Large Set ◽

Genome Wide Association Studies ◽

Protein Protein Interaction ◽

Genome Wide ◽

Causal Genes ◽

Red Blood Cell Count

Genome-wide association studies (GWAS) are a valuable tool for understanding the biology of complex traits, but the associations found rarely point directly to causal genes. Here, we introduce a new method to identify the causal genes by integrating GWAS summary statistics with gene expression, biological pathway, and predicted protein-protein interaction data. We further propose an approach that effectively leverages both polygenic and locus-specific genetic signals by combining results across multiple gene prioritization methods, increasing confidence in prioritized genes. Using a large set of gold standard genes to evaluate our approach, we prioritize 8,402 unique gene-trait pairs with greater than 75% estimated precision across 113 complex traits and diseases, including known genes such as SORT1 for LDL cholesterol, SMIM1 for red blood cell count, and DRD2 for schizophrenia, as well as novel genes such as TTC39B for cholelithiasis. Our results demonstrate that a polygenic approach is a powerful tool for gene prioritization and, in combination with locus-specific signal, improves upon existing methods.

Download Full-text

Informing disease modelling with brain-relevant functional genomic annotations

Brain ◽

10.1093/brain/awz295 ◽

2019 ◽

Vol 142 (12) ◽

pp. 3694-3712 ◽

Cited By ~ 4

Author(s):

Regina H Reynolds ◽

John Hardy ◽

Mina Ryten ◽

Sarah A Gagliano Taliun

Keyword(s):

Genetic Association ◽

Association Studies ◽

Therapeutic Targets ◽

Genome Wide Association ◽

Disease Modelling ◽

Genome Wide Association Studies ◽

Functional Genomic ◽

Genome Wide ◽

Neuropsychiatric Diseases ◽

Association Data

How can we best translate the success of genome-wide association studies for neurological and neuropsychiatric diseases into therapeutic targets? Reynolds et al. critically assess existing brain-relevant functional genomic annotations and the tools available for integrating such annotations with summary-level genetic association data.

Download Full-text

On Combining Data From Genome-Wide Association Studies to Discover Disease-Associated SNPs

Statistical Science ◽

10.1214/09-sts286 ◽

2009 ◽

Vol 24 (4) ◽

pp. 547-560 ◽

Cited By ~ 15

Author(s):

Ruth M. Pfeiffer ◽

Mitchell H. Gail ◽

David Pee

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Combining Data

Download Full-text

TWAS pathway method greatly enhances the number of leads for uncovering the molecular underpinnings of psychiatric disorders

10.1101/373050 ◽

2018 ◽

Author(s):

Chris Chatzinakos ◽

Donghyung Lee ◽

Na Cai ◽

Vladimir I. Vladimirov ◽

Anna Docherty ◽

...

Keyword(s):

Psychiatric Disorders ◽

Association Studies ◽

Genome Wide Association Studies ◽

Computational Burden ◽

Gene Sets ◽

Genome Wide ◽

Genetic Signal ◽

Meta Analyses ◽

Combine Information ◽

Or Genes

ABSTRACTGenetic signal detection in genome-wide association studies (GWAS) is enhanced by pooling small signals from multiple Single Nucleotide Polymorphism (SNP), e.g. across genes and pathways. Because genes are believed to influence traits via gene expression, it is of interest to combine information from expression Quantitative Trait Loci (eQTLs) in a gene or genes in the same pathway. Such methods, widely referred as transcriptomic wide association analysis (TWAS), already exist for gene analysis. Due to the possibility of eliminating most of the confounding effect of linkage disequilibrium (LD) from TWAS gene statistics, pathway TWAS methods would be very useful in uncovering the true molecular bases of psychiatric disorders. However, such methods are not yet available for arbitrarily large pathways/gene sets. This is possibly due to it quadratic (in the number of SNPs) computational burden for computing LD across large regions. To overcome this obstacle, we propose JEPEGMIX2-P, a novel TWAS pathway method that i) has a linear computational burden, ii) uses a large and diverse reference panel (33K subjects), iii) is competitive (adjusts for background enrichment in gene TWAS statistics) and iv) is applicable as-is to ethnically mixed cohorts. To underline its potential for increasing the power to uncover genetic signals over the state-of-the-art and commonly used non-transcriptomics methods, e.g. MAGMA, we applied JEPEGMIX2-P to summary statistics of most large meta-analyses from Psychiatric Genetics Consortium (PGC). While our work is just the very first step toward clinical translation of psychiatric disorders, PGC anorexia results suggest a possible avenue for treatment.

Download Full-text

Cascading epigenomic analysis for identifying disease genes from the regulatory landscape of GWAS variants

PLoS Genetics ◽

10.1371/journal.pgen.1009918 ◽

2021 ◽

Vol 17 (11) ◽

pp. e1009918

Author(s):

Bernard Ng ◽

William Casazza ◽

Nam Hee Kim ◽

Chendi Wang ◽

Farnush Farhadi ◽

...

Keyword(s):

Genetic Variants ◽

Target Genes ◽

Association Studies ◽

Disease Genes ◽

Genome Wide Association Studies ◽

Cascading Effects ◽

Risk Alleles ◽

Gene Sets ◽

Genome Wide ◽

Regulatory Landscape

The majority of genetic variants detected in genome wide association studies (GWAS) exert their effects on phenotypes through gene regulation. Motivated by this observation, we propose a multi-omic integration method that models the cascading effects of genetic variants from epigenome to transcriptome and eventually to the phenome in identifying target genes influenced by risk alleles. This cascading epigenomic analysis for GWAS, which we refer to as CEWAS, comprises two types of models: one for linking cis genetic effects to epigenomic variation and another for linking cis epigenomic variation to gene expression. Applying these models in cascade to GWAS summary statistics generates gene level statistics that reflect genetically-driven epigenomic effects. We show on sixteen brain-related GWAS that CEWAS provides higher gene detection rate than related methods, and finds disease relevant genes and gene sets that point toward less explored biological processes. CEWAS thus presents a novel means for exploring the regulatory landscape of GWAS variants in uncovering disease mechanisms.

Download Full-text

Abstract 236: Identification of novel cancer target genes by combining data from the cancer genome-wide association studies (GWAS), regulatory DNA elements and The Cancer Genome Atlas (TCGA)

10.1158/1538-7445.am2018-236 ◽

2018 ◽

Author(s):

Diptee A. Kulkarni ◽

Karl Guo ◽

Junping Jing ◽

Mugdha Khaladkar ◽

Kijoung Song ◽

...

Keyword(s):

Target Genes ◽

Association Studies ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Combining Data ◽

Dna Elements ◽

Cancer Genome Atlas ◽

Regulatory Dna

Download Full-text

Pathway Analysis of Genes Identified through Post-GWAS to Underpin Prostate Cancer Aetiology

Genes ◽

10.3390/genes11050526 ◽

2020 ◽

Vol 11 (5) ◽

pp. 526

Author(s):

Samaneh Farashi ◽

Thomas Kryza ◽

Jyotsna Batra

Keyword(s):

Prostate Cancer ◽

Gene Networks ◽

Association Studies ◽

Genome Wide Association Studies ◽

Functional Variants ◽

Pathway Gene ◽

Gene Sets ◽

Genome Wide ◽

Prostate Cancer Development ◽

Canonical Pathways

Understanding the functional role of risk regions identified by genome-wide association studies (GWAS) has made considerable recent progress and is referred to as the post-GWAS era. Annotation of functional variants to the genes, including cis or trans and understanding their biological pathway/gene network enrichments, is expected to give rich dividends by elucidating the mechanisms underlying prostate cancer. To this aim, we compiled and analysed currently available post-GWAS data that is validated through further studies in prostate cancer, to investigate molecular biological pathways enriched for assigned functional genes. In total, about 100 canonical pathways were significantly, at false discovery rate (FDR) < 0.05), enriched in assigned genes using different algorithms. The results have highlighted some well-known cancer signalling pathways, antigen presentation processes and enrichment in cell growth and development gene networks, suggesting risk loci may exert their functional effect on prostate cancer by acting through multiple gene sets and pathways. Additional upstream analysis of the involved genes identified critical transcription factors such as HDAC1 and STAT5A. We also investigated the common genes between post-GWAS and three well-annotated gene expression datasets to endeavour to uncover the main genes involved in prostate cancer development/progression. Post-GWAS generated knowledge of gene networks and pathways, although continuously evolving, if analysed further and targeted appropriately, will have an important impact on clinical management of the disease.

Download Full-text