scholarly journals Beyond power: Multivariate discovery, replication, and interpretation of pleiotropic loci using summary association statistics

2015 ◽  
Author(s):  
Zheng Ning ◽  
Yakov A. Tsepilov ◽  
Sodbo Zh. Sharapov ◽  
Alexander K. Grishenko ◽  
Xiao Feng ◽  
...  

AbstractThe ever-growing genome-wide association studies (GWAS) have revealed widespread pleiotropy. To exploit this, various methods which consider variant association with multiple traits jointly have been developed. However, most effort has been put on improving discovery power: how to replicate and interpret these discovered pleiotropic loci using multivariate methods has yet to be discussed fully. Using only multiple publicly available single-trait GWAS summary statistics, we develop a fast and flexible multi-trait framework that contains modules for (i) multi-trait genetic discovery, (ii) replication of locus pleiotropic profile, and (iii) multi-trait conditional analysis. The procedure is able to handle any level of sample overlap. As an empirical example, we discovered and replicated 23 novel pleiotropic loci for human anthropometry and evaluated their pleiotropic effects on other traits. By applying conditional multivariate analysis on the 23 loci, we discovered and replicated two additional multi-trait associated SNPs. Our results provide empirical evidence that multi-trait analysis allows detection of additional, replicable, highly pleiotropic genetic associations without genotyping additional individuals. The methods are implemented in a free and open source R package MultiABEL.Author summaryBy analyzing large-scale genomic data, geneticists have revealed widespread pleiotropy, i.e. single genetic variation can affect a wide range of complex traits. Methods have been developed to discover such genetic variants. However, we still lack insights into the relevant genetic architecture - What more can we learn from knowing the effects of these genetic variants?Here, we develop a fast and flexible statistical analysis procedure that includes discovery, replication, and interpretation of pleiotropic effects. The whole analysis pipeline only requires established genetic association study results. We also provide the mathematical theory behind the pleiotropic genetic effects testing.Most importantly, we show how a replication study can be essential to reveal new biology rather than solely increasing sample size in current genomic studies. For instance, we show that, using our proposed replication strategy, we can detect the difference in genetic effects between studies of different geographical origins.We applied the method to the GIANT consortium anthropometric traits to discover new genetic associations, replicated in the UK Biobank, and provided important new insights into growth and obesity.Our pipeline is implemented in an open-source R package MultiABEL, sufficiently efficient that allows researchers to immediately apply on personal computers in minutes.

2021 ◽  
Vol 12 ◽  
Author(s):  
Zheng Ning ◽  
Yakov A. Tsepilov ◽  
Sodbo Zh. Sharapov ◽  
Zhipeng Wang ◽  
Alexander K. Grishenko ◽  
...  

The ever-growing genome-wide association studies (GWAS) have revealed widespread pleiotropy. To exploit this, various methods that jointly consider associations of a genetic variant with multiple traits have been developed. Most efforts have been made concerning improving GWAS discovery power. However, how to replicate these discovered pleiotropic loci has yet to be discussed thoroughly. Unlike a single-trait scenario, multi-trait replication is not trivial considering the underlying genotype-multi-phenotype map of the associations. Here, we evaluate four methods for replicating multi-trait associations, corresponding to four levels of replication strength. Weak replication cannot justify pleiotropic genetic effects, whereas strong replication using our developed correlation methods can inform consistent pleiotropic genetic effects across the discovery and replication samples. We provide a protocol for replicating multi-trait genetic associations in practice. The described methods are implemented in the free and open-source R package MultiABEL.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 97 ◽  
Author(s):  
Ilya Y. Zhbannikov ◽  
Konstantin Arbeev ◽  
Anatoliy I. Yashin

There exists a set of web-based tools for integration and exploring information linked to annotated genetic variants. We developed haploR, an R-package for querying such web-based genome annotation tools (currently implementing on HaploReg and RegulomeDB) and gathering information in a format suitable for downstream bioinformatic analyses. This will facilitate post-genome wide association studies streamline analysis for rapid discovery and interpretation of genetic associations.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Nadav Brandes ◽  
Nathan Linial ◽  
Michal Linial

AbstractThe characterization of germline genetic variation affecting cancer risk, known as cancer predisposition, is fundamental to preventive and personalized medicine. Studies of genetic cancer predisposition typically identify significant genomic regions based on family-based cohorts or genome-wide association studies (GWAS). However, the results of such studies rarely provide biological insight or functional interpretation. In this study, we conducted a comprehensive analysis of cancer predisposition in the UK Biobank cohort using a new gene-based method for detecting protein-coding genes that are functionally interpretable. Specifically, we conducted proteome-wide association studies (PWAS) to identify genetic associations mediated by alterations to protein function. With PWAS, we identified 110 significant gene-cancer associations in 70 unique genomic regions across nine cancer types and pan-cancer. In 48 of the 110 PWAS associations (44%), estimated gene damage is associated with reduced rather than elevated cancer risk, suggesting a protective effect. Together with standard GWAS, we implicated 145 unique genomic loci with cancer risk. While most of these genomic regions are supported by external evidence, our results also highlight many novel loci. Based on the capacity of PWAS to detect non-additive genetic effects, we found that 46% of the PWAS-significant cancer regions exhibited exclusive recessive inheritance. These results highlight the importance of recessive genetic effects, without relying on familial studies. Finally, we show that many of the detected genes exert substantial cancer risk in the studied cohort determined by a quantitative functional description, suggesting their relevance for diagnosis and genetic consulting.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Matthias Munz ◽  
Inken Wohlers ◽  
Eric Simon ◽  
Tobias Reinberger ◽  
Hauke Busch ◽  
...  

AbstractExploration of genetic variant-to-gene relationships by quantitative trait loci such as expression QTLs is a frequently used tool in genome-wide association studies. However, the wide range of public QTL databases and the lack of batch annotation features complicate a comprehensive annotation of GWAS results. In this work, we introduce the tool “Qtlizer” for annotating lists of variants in human with associated changes in gene expression and protein abundance using an integrated database of published QTLs. Features include incorporation of variants in linkage disequilibrium and reverse search by gene names. Analyzing the database for base pair distances between best significant eQTLs and their affected genes suggests that the commonly used cis-distance limit of 1,000,000 base pairs might be too restrictive, implicating a substantial amount of wrongly and yet undetected eQTLs. We also ranked genes with respect to the maximum number of tissue-specific eQTL studies in which a most significant eQTL signal was consistent. For the top 100 genes we observed the strongest enrichment with housekeeping genes (P = 2 × 10–6) and with the 10% highest expressed genes (P = 0.005) after grouping eQTLs by r2 > 0.95, underlining the relevance of LD information in eQTL analyses. Qtlizer can be accessed via https://genehopper.de/qtlizer or by using the respective Bioconductor R-package (https://doi.org/10.18129/B9.bioc.Qtlizer).


2016 ◽  
Author(s):  
Ekaterina A Khramtsova ◽  
Barbara E. Stranger

AbstractSummaryOver the last decade, genome-wide association studies (GWAS) have generated vast amounts of analysis results, requiring development of novel tools for data visualization. Quantile-quantile plots and Manhattan plots are classical tools which have been utilized to visually summarize GWAS results and identify genetic variants significantly associated with traits of interest. However, static visualizations are limiting in the information that can be shown. Here we present Assocplots, a python package for viewing and exploring GWAS results not only using classic static Manhattan and quantile-quantile plots, but also through a dynamic extension which allows to visualize data interactively, and to visualize the relationships between GWAS results from multiple cohorts or studies.AvailabilityThe Assocplots package is open source and distributed under the MIT license via GitHub (https://github.com/khramts/assocplots) along with examples, documentation and installation [email protected], [email protected]


Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1514
Author(s):  
Wei-Min Ho ◽  
Yah-Yuan Wu ◽  
Yi-Chun Chen

Cardiovascular diseases (CVDs) and dementia are the leading causes of disability and mortality. Genetic connections between cardiovascular risk factors and dementia have not been elucidated. We conducted a scoping review and pathway analysis to reveal the genetic associations underlying both CVDs and dementia. In the PubMed database, literature was searched using keywords associated with diabetes mellitus, hypertension, dyslipidemia, white matter hyperintensities, cerebral microbleeds, and covert infarctions. Gene lists were extracted from these publications to identify shared genes and pathways for each group. This included high penetrance genes and single nucleotide polymorphisms (SNPs) identified through genome wide association studies. Most risk SNPs to both diabetes and dementia participate in the phospholipase C enzyme system and the downstream nositol 1,4,5-trisphosphate and diacylglycerol activities. Interestingly, AP-2 (TFAP2) transcription factor family and metabolism of vitamins and cofactors were associated with genetic variants that were shared by white matter hyperintensities and dementia, and by microbleeds and dementia. Variants shared by covert infarctions and dementia were related to VEGF ligand–receptor interactions and anti-inflammatory cytokine pathways. Our review sheds light on future investigations into the causative relationships behind CVDs and dementia, and can be a paradigm of the identification of dementia treatments.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Georgina Donati ◽  
Iroise Dumontheil ◽  
Oliver Pain ◽  
Kathryn Asbury ◽  
Emma L. Meaburn

AbstractHow well one does at school is predictive of a wide range of important cognitive, socioeconomic, and health outcomes. The last few years have shown marked advancement in our understanding of the genetic contributions to, and correlations with, academic attainment. However, there exists a gap in our understanding of the specificity of genetic associations with performance in academic subjects during adolescence, a critical developmental period. To address this, the Avon Longitudinal Study of Parents and Children was used to conduct genome-wide association studies of standardised national English (N = 5983), maths (N = 6017) and science (N = 6089) tests. High SNP-based heritabilities (h2SNP) for all subjects were found (41–53%). Further, h2SNP for maths and science remained after removing shared variance between subjects or IQ (N = 3197–5895). One genome-wide significant single nucleotide polymorphism (rs952964, p = 4.86 × 10–8) and four gene-level associations with science attainment (MEF2C, BRINP1, S100A1 and S100A13) were identified. Rs952964 remained significant after removing the variance shared between academic subjects. The findings highlight the benefits of using environmentally homogeneous samples for genetic analyses and indicate that finer-grained phenotyping will help build more specific biological models of variance in learning processes and abilities.


Genetics ◽  
2020 ◽  
Vol 215 (1) ◽  
pp. 267-284 ◽  
Author(s):  
Alice H. MacQueen ◽  
Jeffrey W. White ◽  
Rian Lee ◽  
Juan M. Osorno ◽  
Jeremy Schmutz ◽  
...  

Multienvironment trials (METs) are widely used to assess the performance of promising crop germplasm. Though seldom designed to elucidate genetic mechanisms, MET data sets are often much larger than could be duplicated for genetic research and, given proper interpretation, may offer valuable insights into the genetics of adaptation across time and space. The Cooperative Dry Bean Nursery (CDBN) is a MET for common bean (Phaseolus vulgaris) grown for > 70 years in the United States and Canada, consisting of 20–50 entries each year at 10–20 locations. The CDBN provides a rich source of phenotypic data across entries, years, and locations that is amenable to genetic analysis. To study stable genetic effects segregating in this MET, we conducted genome-wide association studies (GWAS) using best linear unbiased predictions derived across years and locations for 21 CDBN phenotypes and genotypic data (1.2 million SNPs) for 327 CDBN genotypes. The value of this approach was confirmed by the discovery of three candidate genes and genomic regions previously identified in balanced GWAS. Multivariate adaptive shrinkage (mash) analysis, which increased our power to detect significant correlated effects, found significant effects for all phenotypes. Mash found two large genomic regions with effects on multiple phenotypes, supporting a hypothesis of pleiotropic or linked effects that were likely selected on in pursuit of a crop ideotype. Overall, our results demonstrate that statistical genomics approaches can be used on MET phenotypic data to discover significant genetic effects and to define genomic regions associated with crop improvement.


2015 ◽  
Vol 282 (1821) ◽  
pp. 20151684 ◽  
Author(s):  
Alkes L. Price ◽  
Chris C. A. Spencer ◽  
Peter Donnelly

Susceptibility to common human diseases is influenced by both genetic and environmental factors. The explosive growth of genetic data, and the knowledge that it is generating, are transforming our biological understanding of these diseases. In this review, we describe the technological and analytical advances that have enabled genome-wide association studies to be successful in identifying a large number of genetic variants robustly associated with common disease. We examine the biological insights that these genetic associations are beginning to produce, from functional mechanisms involving individual genes to biological pathways linking associated genes, and the identification of functional annotations, some of which are cell-type-specific, enriched in disease associations. Although most efforts have focused on identifying and interpreting genetic variants that are irrefutably associated with disease, it is increasingly clear that—even at large sample sizes—these represent only the tip of the iceberg of genetic signal, motivating polygenic analyses that consider the effects of genetic variants throughout the genome, including modest effects that are not individually statistically significant. As data from an increasingly large number of diseases and traits are analysed, pleiotropic effects (defined as genetic loci affecting multiple phenotypes) can help integrate our biological understanding. Looking forward, the next generation of population-scale data resources, linking genomic information with health outcomes, will lead to another step-change in our ability to understand, and treat, common diseases.


2021 ◽  
Author(s):  
Konrad Karczewski ◽  
Matthew Solomonson ◽  
Katherine R Chao ◽  
Julia K Goodrich ◽  
Grace Tiao ◽  
...  

Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variation in human disease has not been explored at scale. Exome sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variation across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 3,700 phenotypes using single-variant and gene tests of 281,850 individuals in the UK Biobank with exome sequence data. We find that the discovery of genetic associations is tightly linked to frequency as well as correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside a browser framework for rapidly exploring rare variant association results.


Sign in / Sign up

Export Citation Format

Share Document