scholarly journals Benchmarker: an unbiased, association-data-driven strategy to evaluate gene prioritization algorithms

2018 ◽  
Author(s):  
Rebecca S. Fine ◽  
Tune H. Pers ◽  
Tiffany Amariuta ◽  
Soumya Raychaudhuri ◽  
Joel N. Hirschhorn

Genome-wide association studies (GWAS) are valuable for understanding human biology, but associated loci typically contain multiple associated variants and genes. Thus, algorithms that prioritize likely causal genes and variants for a given phenotype can provide biological interpretations of association data. However, a critical, currently missing capability is to objectively compare performance of such algorithms. Typical comparisons rely on "gold standard" genes harboring causal coding variants, but such gold standards may be biased and incomplete. To address this issue, we developed Benchmarker, an unbiased, data-driven benchmarking method that compares performance of prioritization strategies to each other (and to random chance) by leave-one-chromosome-out cross-validation with stratified linkage disequilibrium (LD) score regression. We first applied Benchmarker to twenty well-powered GWAS and compared gene prioritization based on strategies employing three different data sources, including annotated gene sets and gene expression. No individual strategy clearly outperformed the others, but genes prioritized by multiple strategies had higher per-SNP heritability than those prioritized by one strategy only. We also compared two gene prioritization methods, DEPICT and MAGMA; genes prioritized by both methods strongly outperformed genes prioritized by only one. Our results suggest that combining data sources and algorithms should pinpoint higher quality genes for follow-up. Benchmarker provides an unbiased approach to evaluate any method that provides genome-wide prioritization of gene sets, genes, or variants, and can determine the best such method for any particular GWAS. Our method addresses an important unmet need for rigorous tool assessment and can assist in mapping genetic associations to causal function.

2020 ◽  
Author(s):  
Kyoko Watanabe ◽  
Philip R. Jansen ◽  
Jeanne E. Savage ◽  
Priyanka Nandakumar ◽  
Xin Wang ◽  
...  

AbstractInsomnia is a heritable, highly prevalent sleep disorder, for which no sufficient treatment currently exists. Previous genome-wide association studies (GWASs) with up to 1.3 million subjects identified over 200 associated loci. This extreme polygenicity suggested many more loci to be discovered. The current study almost doubled the sample size to over 2.3 million individuals thereby increasing statistical power. We identified 554 risk loci (confirming 190 previously associated loci and detecting 364 novel), and capitalizing on this large number of loci, we propose a novel strategy to prioritize genes using external biological resources and information on functional interactions between genes across risk loci. Of all 3,898 genes naively implicated from the risk loci, we prioritize 289. For these, we find brain-tissue expression specificity and enrichment in specific gene-sets of synaptic signaling functions and neuronal differentiation. We show that the novel gene prioritization strategy yields specific hypotheses on causal mechanisms underlying insomnia, which would not fully have been detected using traditional approaches.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (1) ◽  
pp. e1009315
Author(s):  
Ardalan Naseri ◽  
Junjie Shi ◽  
Xihong Lin ◽  
Shaojie Zhang ◽  
Degui Zhi

Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.


Author(s):  
Elle M Weeks ◽  
Jacob C Ulirsch ◽  
Nathan Y Cheng ◽  
Brian L Trippe ◽  
Rebecca S Fine ◽  
...  

Genome-wide association studies (GWAS) are a valuable tool for understanding the biology of complex traits, but the associations found rarely point directly to causal genes. Here, we introduce a new method to identify the causal genes by integrating GWAS summary statistics with gene expression, biological pathway, and predicted protein-protein interaction data. We further propose an approach that effectively leverages both polygenic and locus-specific genetic signals by combining results across multiple gene prioritization methods, increasing confidence in prioritized genes. Using a large set of gold standard genes to evaluate our approach, we prioritize 8,402 unique gene-trait pairs with greater than 75% estimated precision across 113 complex traits and diseases, including known genes such as SORT1 for LDL cholesterol, SMIM1 for red blood cell count, and DRD2 for schizophrenia, as well as novel genes such as TTC39B for cholelithiasis. Our results demonstrate that a polygenic approach is a powerful tool for gene prioritization and, in combination with locus-specific signal, improves upon existing methods.


Brain ◽  
2019 ◽  
Vol 142 (12) ◽  
pp. 3694-3712 ◽  
Author(s):  
Regina H Reynolds ◽  
John Hardy ◽  
Mina Ryten ◽  
Sarah A Gagliano Taliun

How can we best translate the success of genome-wide association studies for neurological and neuropsychiatric diseases into therapeutic targets? Reynolds et al. critically assess existing brain-relevant functional genomic annotations and the tools available for integrating such annotations with summary-level genetic association data.


2018 ◽  
Author(s):  
Chris Chatzinakos ◽  
Donghyung Lee ◽  
Na Cai ◽  
Vladimir I. Vladimirov ◽  
Anna Docherty ◽  
...  

ABSTRACTGenetic signal detection in genome-wide association studies (GWAS) is enhanced by pooling small signals from multiple Single Nucleotide Polymorphism (SNP), e.g. across genes and pathways. Because genes are believed to influence traits via gene expression, it is of interest to combine information from expression Quantitative Trait Loci (eQTLs) in a gene or genes in the same pathway. Such methods, widely referred as transcriptomic wide association analysis (TWAS), already exist for gene analysis. Due to the possibility of eliminating most of the confounding effect of linkage disequilibrium (LD) from TWAS gene statistics, pathway TWAS methods would be very useful in uncovering the true molecular bases of psychiatric disorders. However, such methods are not yet available for arbitrarily large pathways/gene sets. This is possibly due to it quadratic (in the number of SNPs) computational burden for computing LD across large regions. To overcome this obstacle, we propose JEPEGMIX2-P, a novel TWAS pathway method that i) has a linear computational burden, ii) uses a large and diverse reference panel (33K subjects), iii) is competitive (adjusts for background enrichment in gene TWAS statistics) and iv) is applicable as-is to ethnically mixed cohorts. To underline its potential for increasing the power to uncover genetic signals over the state-of-the-art and commonly used non-transcriptomics methods, e.g. MAGMA, we applied JEPEGMIX2-P to summary statistics of most large meta-analyses from Psychiatric Genetics Consortium (PGC). While our work is just the very first step toward clinical translation of psychiatric disorders, PGC anorexia results suggest a possible avenue for treatment.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (11) ◽  
pp. e1009918
Author(s):  
Bernard Ng ◽  
William Casazza ◽  
Nam Hee Kim ◽  
Chendi Wang ◽  
Farnush Farhadi ◽  
...  

The majority of genetic variants detected in genome wide association studies (GWAS) exert their effects on phenotypes through gene regulation. Motivated by this observation, we propose a multi-omic integration method that models the cascading effects of genetic variants from epigenome to transcriptome and eventually to the phenome in identifying target genes influenced by risk alleles. This cascading epigenomic analysis for GWAS, which we refer to as CEWAS, comprises two types of models: one for linking cis genetic effects to epigenomic variation and another for linking cis epigenomic variation to gene expression. Applying these models in cascade to GWAS summary statistics generates gene level statistics that reflect genetically-driven epigenomic effects. We show on sixteen brain-related GWAS that CEWAS provides higher gene detection rate than related methods, and finds disease relevant genes and gene sets that point toward less explored biological processes. CEWAS thus presents a novel means for exploring the regulatory landscape of GWAS variants in uncovering disease mechanisms.


Genes ◽  
2020 ◽  
Vol 11 (5) ◽  
pp. 526
Author(s):  
Samaneh Farashi ◽  
Thomas Kryza ◽  
Jyotsna Batra

Understanding the functional role of risk regions identified by genome-wide association studies (GWAS) has made considerable recent progress and is referred to as the post-GWAS era. Annotation of functional variants to the genes, including cis or trans and understanding their biological pathway/gene network enrichments, is expected to give rich dividends by elucidating the mechanisms underlying prostate cancer. To this aim, we compiled and analysed currently available post-GWAS data that is validated through further studies in prostate cancer, to investigate molecular biological pathways enriched for assigned functional genes. In total, about 100 canonical pathways were significantly, at false discovery rate (FDR) < 0.05), enriched in assigned genes using different algorithms. The results have highlighted some well-known cancer signalling pathways, antigen presentation processes and enrichment in cell growth and development gene networks, suggesting risk loci may exert their functional effect on prostate cancer by acting through multiple gene sets and pathways. Additional upstream analysis of the involved genes identified critical transcription factors such as HDAC1 and STAT5A. We also investigated the common genes between post-GWAS and three well-annotated gene expression datasets to endeavour to uncover the main genes involved in prostate cancer development/progression. Post-GWAS generated knowledge of gene networks and pathways, although continuously evolving, if analysed further and targeted appropriately, will have an important impact on clinical management of the disease.


Sign in / Sign up

Export Citation Format

Share Document