scholarly journals emeraLD: Rapid Linkage Disequilibrium Estimation with Massive Data Sets

2018 ◽  
Author(s):  
Corbin Quick ◽  
Christian Fuchsberger ◽  
Daniel Taliun ◽  
Gonçalo Abecasis ◽  
Michael Boehnke ◽  
...  

AbstractSummaryEstimating linkage disequilibrium (LD) is essential for a wide range of summary statistics-based association methods for genome-wide association studies (GWAS). Large genetic data sets, e.g. the TOPMed WGS project and UK Biobank, enable more accurate and comprehensive LD estimates, but increase the computational burden of LD estimation. Here, we describe emeraLD (Efficient Methods for Estimation and Random Access of LD), a computational tool that leverages sparsity and haplotype structure to estimate LD orders of magnitude faster than existing tools.Availability and ImplementationemeraLD is implemented in C++, and is open source under GPLv3. Source code, documentation, an R interface, and utilities for analysis of summary statistics are freely available at http://github.com/statgen/[email protected] informationSupplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Holly Trochet ◽  
Matti Pirinen ◽  
Gavin Band ◽  
Luke Jostins ◽  
Gilean McVean ◽  
...  

AbstractGenome-wide association studies (GWAS) are a powerful tool for understanding the genetic basis of diseases and traits, but most studies have been conducted in isolation, with a focus on either a single or a set of closely related phenotypes. We describe MetABF, a simple Bayesian framework for performing integrative meta-analysis across multiple GWAS using summary statistics. The approach is applicable across a wide range of study designs and can increase the power by 50% compared to standard frequentist tests when only a subset of studies have a true effect. We demonstrate its utility in a meta-analysis of 20 diverse GWAS which were part of the Wellcome Trust Case-Control Consortium 2. The novelty of the approach is its ability to explore, and assess the evidence for, a range of possible true patterns of association across studies in a computationally efficient framework.


2020 ◽  
Author(s):  
Jiangming Sun ◽  
Yunpeng Wang

ABSTRACTSummaryPost-GWAS studies using the results from large consortium meta-analysis often need to correctly take care of the overlapping sample issue. The gold standard approach for resolving this issue is to reperform the GWAS or meta-analysis excluding the overlapped participants. However, such approach is time-consuming and, sometimes, restricted by the available data. deMeta provides a user friendly and computationally efficient command-line implementation for removing the effect of a contributing sub-study to a consortium from the meta-analysis results. Only the summary statistics of the meta-analysis the sub-study to be removed are required. In addition, deMeta can generate contrasting Manhattan and quantile-quantile plots for users to visualize the impact of the sub-study on the meta-analysis results.Availability and ImplementationThe python source code, examples and documentations of deMeta are publicly available at https://github.com/Computational-NeuroGenetics/[email protected] (J. Sun); [email protected] (Y. Wang)Supplementary informationNone.


2019 ◽  
Vol 35 (22) ◽  
pp. 4837-4839 ◽  
Author(s):  
Hanna Julienne ◽  
Huwenbo Shi ◽  
Bogdan Pasaniuc ◽  
Hugues Aschard

Abstract Motivation Multi-trait analyses using public summary statistics from genome-wide association studies (GWASs) are becoming increasingly popular. A constraint of multi-trait methods is that they require complete summary data for all traits. Although methods for the imputation of summary statistics exist, they lack precision for genetic variants with small effect size. This is benign for univariate analyses where only variants with large effect size are selected a posteriori. However, it can lead to strong p-value inflation in multi-trait testing. Here we present a new approach that improve the existing imputation methods and reach a precision suitable for multi-trait analyses. Results We fine-tuned parameters to obtain a very high accuracy imputation from summary statistics. We demonstrate this accuracy for variants of all effect sizes on real data of 28 GWAS. We implemented the resulting methodology in a python package specially designed to efficiently impute multiple GWAS in parallel. Availability and implementation The python package is available at: https://gitlab.pasteur.fr/statistical-genetics/raiss, its accompanying documentation is accessible here http://statistical-genetics.pages.pasteur.fr/raiss/. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 35 (14) ◽  
pp. 2495-2497 ◽  
Author(s):  
Gregory McInnes ◽  
Yosuke Tanigawa ◽  
Chris DeBoever ◽  
Adam Lavertu ◽  
Julia Eve Olivieri ◽  
...  

Abstract Summary Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here, we present Global Biobank Engine (GBE), a web-based tool that enables exploration of the relationship between genotype and phenotype in biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities. Availability and implementation GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu.


2017 ◽  
Author(s):  
Chris Chatzinakos ◽  
Donghyung Lee ◽  
Bradley T Webb ◽  
Vladimir I Vladimirov ◽  
Kenneth S Kendler ◽  
...  

AbstractMotivationTo increase detection power, researchers use gene level analysis methods to aggregate weak marker signals. Due to gene expression controlling biological processes, researchers proposed aggregating signals for expression Quantitative Trait Loci (eQTL). Most gene-level eQTL methods make statistical inferences based on i) summary statistics from genome-wide association studies (GWAS) and ii) linkage disequilibrium (LD) patterns from a relevant reference panel. While most such tools assume homogeneous cohorts, our Gene-level Joint Analysis of functional SNPs in Cosmopolitan Cohorts (JEPEGMIX) method accommodates cosmopolitan cohorts by using heterogeneous panels. However, JEPGMIX relies on brain eQTLs from older gene expression studies and does not adjust for background enrichment in GWAS signals.ResultsWe propose JEPEGMIX2, an extension of JEPEGMIX. When compared to JPEGMIX, it uses i) cis-eQTL SNPs from the latest expression studies and ii) brains specific (sub)tissues and tissues other than brain. JEPEGMIX2 also i) avoids accumulating averagely enriched polygenic information by adjusting for background enrichment and ii), to avoid an increase in false positive rates for studies with numerous highly enriched (above the background) genes, it outputs gene q-values based on Holm adjustment of [email protected] informationSupplementary material is available at Bioinformatics online.


2016 ◽  
Author(s):  
Xiang Zhu ◽  
Matthew Stephens

Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously-proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously-unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss.


2018 ◽  
Author(s):  
Ruth Johnson ◽  
Huwenbo Shi ◽  
Bogdan Pasaniuc ◽  
Sriram Sankararaman

AbstractMotivationA large proportion of risk regions identified by genome-wide association studies (GWAS) are shared across multiple diseases and traits. Understanding whether this clustering is due to sharing of causal variants or chance colocalization can provide insights into shared etiology of complex traits and diseases.ResultsIn this work, we propose a flexible, unifying framework to quantify the overlap between a pair of traits called UNITY (Unifying Non-Infinitesimal Trait analYsis). We formulate a Bayesian generative model that relates the overlap between pairs of traits to GWAS summary statistic data under a non-infinitesimal genetic architecture underlying each trait. We propose a Metropolis-Hastings sampler to compute the posterior density of the genetic overlap parameters in this model. We validate our method through comprehensive simulations and analyze summary statistics from height and BMI GWAS to show that it produces estimates consistent with the known genetic makeup of both traits.AvailabilityThe UNITY software is made freely available to the research community at: https://github.com/bogdanlab/[email protected] informationSupplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Gregory McInnes ◽  
Yosuke Tanigawa ◽  
Chris DeBoever ◽  
Adam Lavertu ◽  
Julia Eve Olivieri ◽  
...  

Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here we present Global Biobank Engine (GBE), a web-based tool that enables the exploration of the relationship between genotype and phenotype in large biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests, and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities. GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Seema Yadav ◽  
Elizabeth M. Ross ◽  
Karen S. Aitken ◽  
Lee T. Hickey ◽  
Owen Powell ◽  
...  

Abstract Background High-density SNP arrays are now available for a wide range of crop species. Despite the development of many tools for generating genetic maps, the genome position of many SNPs from these arrays is unknown. Here we propose a linkage disequilibrium (LD)-based algorithm to allocate unassigned SNPs to chromosome regions from sparse genetic maps. This algorithm was tested on sugarcane, wheat, and barley data sets. We calculated the algorithm’s efficiency by masking SNPs with known locations, then assigning their position to the map with the algorithm, and finally comparing the assigned and true positions. Results In the 20-fold cross-validation, the mean proportion of masked mapped SNPs that were placed by the algorithm to a chromosome was 89.53, 94.25, and 97.23% for sugarcane, wheat, and barley, respectively. Of the markers that were placed in the genome, 98.73, 96.45 and 98.53% of the SNPs were positioned on the correct chromosome. The mean correlations between known and new estimated SNP positions were 0.97, 0.98, and 0.97 for sugarcane, wheat, and barley. The LD-based algorithm was used to assign 5920 out of 21,251 unpositioned markers to the current Q208 sugarcane genetic map, representing the highest density genetic map for this species to date. Conclusions Our LD-based approach can be used to accurately assign unpositioned SNPs to existing genetic maps, improving genome-wide association studies and genomic prediction in crop species with fragmented and incomplete genome assemblies. This approach will facilitate genomic-assisted breeding for many orphan crops that lack genetic and genomic resources.


Author(s):  
Zachary F Gerring ◽  
Angela Mina-Vargas ◽  
Eric R Gamazon ◽  
Eske M Derks

Abstract Motivation Genome-wide association studies have successfully identified multiple independent genetic loci that harbour variants associated with human traits and diseases, but the exact causal genes are largely unknown. Common genetic risk variants are enriched in non-protein-coding regions of the genome and often affect gene expression (expression quantitative trait loci, eQTL) in a tissue-specific manner. To address this challenge, we developed a methodological framework, E-MAGMA, which converts genome-wide association summary statistics into gene-level statistics by assigning risk variants to their putative genes based on tissue-specific eQTL information. Results We compared E-MAGMA to three eQTL informed gene-based approaches using simulated phenotype data. Phenotypes were simulated based on eQTL reference data using GCTA for all genes with at least one eQTL at chromosome 1. We performed 10 simulations per gene. The eQTL-h2 (i.e., the proportion of variation explained by the eQTLs) was set at 1%, 2%, and 5%. We found E-MAGMA outperforms other gene-based approaches across a range of simulated parameters (e.g. the number of identified causal genes). When applied to genome-wide association summary statistics for five neuropsychiatric disorders, E-MAGMA identified more putative candidate causal genes compared to other eQTL-based approaches. By integrating tissue-specific eQTL information, these results show E-MAGMA will help to identify novel candidate causal genes from genome-wide association summary statistics and thereby improve the understanding of the biological basis of complex disorders. Availability A tutorial and input files are made available in a github repository: https://github.com/eskederks/eMAGMA-tutorial. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document