Imputation of canine genotype array data using 365 whole-genome sequences improves power of genome-wide association studies

Mapping Intimacies ◽

10.1101/540559 ◽

2019 ◽

Author(s):

Jessica J. Hayward ◽

Michelle E. White ◽

Michael Boyle ◽

Laura M. Shannon ◽

Margret L. Casal ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Association Studies ◽

Complex Trait ◽

Reference Panel ◽

Genome Wide Association ◽

Domestic Dog ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Trait Mapping ◽

Genome Wide

AbstractGenomic resources for the domestic dog have improved with the widespread adoption of a 173k SNP array platform and updated reference genome. SNP arrays of this density are sufficient for detecting genetic associations within breeds but are underpowered for finding associations across multiple breeds or in mixed-breed dogs, where linkage disequilibrium rapidly decays between markers, even though such studies would hold particular promise for mapping complex diseases and traits. Here we introduce an imputation reference panel, consisting of 365 diverse, whole-genome sequenced dogs and wolves, which increases the number of markers that can be queried in genome-wide association studies approximately 130-fold. Using previously genotyped dogs, we show the utility of this reference panel in identifying novel associations and fine-mapping for canine body size and blood phenotypes, even when causal loci are not in strong linkage disequilibrium with any single array marker. This reference panel resource will improve future genome-wide association studies for canine complex diseases and other phenotypes.Author SummaryComplex traits are controlled by more than one gene and as such are difficult to map. For complex trait mapping in the domestic dog, researchers use the current array of 173,000 variants, with only minimal success. Here, we use a method called imputation to increase the number of variants – from 173,000 to 24 million – that can be queried in canine association studies. We use sequence data from the whole genomes of 365 dogs and wolves to accurately predict variants, in a separate cohort of dogs, that are not present on the array. Using dog body size, we show that the increase in variants results in an increase in mapping power, through the identification of new associations and the narrowing of regions of interest. This imputation panel is particularly important because of its usefulness in improving complex trait mapping in the dog, which has significant implications for discovery of variants in humans with similar diseases.

Download Full-text

The construction of a haplotype reference panel using extremely low coverage whole genome sequences and its application in genome-wide association studies and genomic prediction in Duroc pigs

Genomics ◽

10.1016/j.ygeno.2021.12.016 ◽

2021 ◽

Author(s):

Zhe Zhang ◽

Peipei Ma ◽

Zhenyang Zhang ◽

Zhen Wang ◽

Qishan Wang ◽

...

Keyword(s):

Genomic Prediction ◽

Association Studies ◽

Reference Panel ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Genome Sequences ◽

Genome Wide ◽

Low Coverage

Download Full-text

CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies

Nucleic Acids Research ◽

10.1093/nar/gkz1026 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jianhua Wang ◽

Dandan Huang ◽

Yao Zhou ◽

Hongcheng Yao ◽

Huanhuan Liu ◽

...

Keyword(s):

Fine Mapping ◽

Genetic Variants ◽

Association Studies ◽

Complex Trait ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Credible Sets ◽

Causal Variants

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.

Download Full-text

The Impact of Incomplete Linkage Disequilibrium and Genetic Model Choice on the Analysis and Interpretation of Genome-wide Association Studies

Annals of Human Genetics ◽

10.1111/j.1469-1809.2010.00579.x ◽

2010 ◽

Vol 74 (4) ◽

pp. 375-379 ◽

Cited By ~ 6

Author(s):

Mark M. Iles

Keyword(s):

Linkage Disequilibrium ◽

Genetic Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Model Choice ◽

Genome Wide ◽

The Impact

Download Full-text

Integration of genome wide association studies and whole genome sequencing provides novel insights into fat deposition in chicken

Scientific Reports ◽

10.1038/s41598-018-34364-0 ◽

2018 ◽

Vol 8 (1) ◽

Cited By ~ 8

Author(s):

Gabriel Costa Monteiro Moreira ◽

Clarissa Boschiero ◽

Aline Silva Mello Cesar ◽

James M. Reecy ◽

Thaís Fernanda Godoy ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Association Studies ◽

Fat Deposition ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Genome Wide

Download Full-text

Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa068 ◽

2020 ◽

Vol 27 (9) ◽

pp. 1425-1430

Author(s):

Inès Krissaane ◽

Carlos De Niz ◽

Alba Gutiérrez-Sacristán ◽

Gabor Korodi ◽

Nneka Ede ◽

...

Keyword(s):

Web Services ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Cloud Platform ◽

Human Genomics ◽

Genome Wide ◽

Innovative Methodology ◽

Amazon Web Services

Abstract Objective Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. Methods We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset. Results Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics. Conclusions We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost?

Download Full-text

Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data

Genome Biology ◽

10.1186/s13059-017-1216-0 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 46

Author(s):

Yang Wu ◽

Zhili Zheng ◽

Peter M. Visscher ◽

Jian Yang

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Association Studies ◽

Genome Wide Association ◽

Whole Genome Sequencing Data ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Sequencing Data ◽

Genome Wide

Download Full-text

A New Diversity Panel for Winter Rapeseed (Brassica napus L.) Genome-Wide Association Studies

Agronomy ◽

10.3390/agronomy10122006 ◽

2020 ◽

Vol 10 (12) ◽

pp. 2006

Author(s):

David P. Horvath ◽

Michael Stamm ◽

Zahirul I. Talukder ◽

Jason Fiedler ◽

Aidan P. Horvath ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Brassica Napus ◽

Association Studies ◽

Decay Rates ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

High Quality ◽

Brassica Napus L ◽

Genome Wide ◽

Quality Markers

A diverse population (429 member) of canola (Brassica napus L.) consisting primarily of winter biotypes was assembled and used in genome-wide association studies. Genotype by sequencing analysis of the population identified and mapped 290,972 high-quality markers ranging from 18.5 to 82.4% missing markers per line and an average of 36.8%. After interpolation, 251,575 high-quality markers remained. After filtering for markers with low minor allele counts (count > 5), we were left with 190,375 markers. The average distance between these markers is 4463 bases with a median of 69 and a range from 1 to 281,248 bases. The heterozygosity among the imputed population ranges from 0.9 to 11.0% with an average of 5.4%. The filtered and imputed dataset was used to determine population structure and kinship, which indicated that the population had minimal structure with the best K value of 2–3. These results also indicated that the majority of the population has substantial sequence from a single population with sub-clusters of, and admixtures with, a very small number of other populations. Analysis of chromosomal linkage disequilibrium decay ranged from ~7 Kb for chromosome A01 to ~68 Kb for chromosome C01. Local linkage decay rates determined for all 500 kb windows with a 10kb sliding step indicated a wide range of linkage disequilibrium decay rates, indicating numerous crossover hotspots within this population, and provide a resource for determining the likely limits of linkage disequilibrium from any given marker in which to identify candidate genes. This population and the resources provided here should serve as helpful tools for investigating genetics in winter canola.

Download Full-text

A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies

BMC Bioinformatics ◽

10.1186/1471-2105-12-16 ◽

2011 ◽

Vol 12 (1) ◽

Cited By ~ 26

Author(s):

Raphaël Mourad ◽

Christine Sinoquet ◽

Philippe Leray

Keyword(s):

Linkage Disequilibrium ◽

Dimensionality Reduction ◽

Bayesian Network ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Hierarchical Bayesian ◽

Network Approach ◽

Genome Wide ◽

Data Dimensionality Reduction

Download Full-text

On the Threshold from Genome-Wide Association Studies to Whole-Genome Sequencing. Looking for Signal in All the Right Places

American Journal of Respiratory and Critical Care Medicine ◽

10.1164/rccm.201401-0048ed ◽

2014 ◽

Vol 189 (4) ◽

pp. 381-383 ◽

Cited By ~ 1

Author(s):

Nadia N. Hansel ◽

Rasika A. Mathias

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Genome Wide ◽

The Right

Download Full-text

Beyond SNP Heritability: Polygenicity and Discoverability of Phenotypes Estimated with a Univariate Gaussian Mixture Model

10.1101/133132 ◽

2017 ◽

Cited By ~ 8

Author(s):

Dominic Holland ◽

Oleksandr Frei ◽

Rahul Desikan ◽

Chun-Chieh Fan ◽

Alexey A. Shadrin ◽

...

Keyword(s):

Association Studies ◽

Causal Snps ◽

Reference Panel ◽

Causal Effects ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Common Variants ◽

Genome Wide ◽

Causal Variants

AbstractEstimating the polygenicity (proportion of causally associated single nucleotide polymorphisms (SNPs)) and discoverability (effect size variance) of causal SNPs for human traits is currently of considerable interest. SNP-heritability is proportional to the product of these quantities. We present a basic model, using detailed linkage disequilibrium structure from an extensive reference panel, to estimate these quantities from genome-wide association studies (GWAS) summary statistics. We apply the model to diverse phenotypes and validate the implementation with simulations. We find model polygenicities ranging from ≃ 2 × 10−5to ≃ 4 × 10−3, with discoverabilities similarly ranging over two orders of magnitude. A power analysis allows us to estimate the proportions of phenotypic variance explained additively by causal SNPs reaching genome-wide significance at current sample sizes, and map out sample sizes required to explain larger portions of additive SNP heritability. The model also allows for estimating residual inflation (or deflation from over-correcting of z-scores), and assessing compatibility of replication and discovery GWAS summary statistics.Author SummaryThere are ~10 million common variants in the genome of humans with European ancestry. For any particular phenotype a number of these variants will have some causal effect. It is of great interest to be able to quantify the number of these causal variants and the strength of their effect on the phenotype.Genome wide association studies (GWAS) produce very noisy summary statistics for the association between subsets of common variants and phenotypes. For any phenotype, these statistics collectively are difficult to interpret, but buried within them is the true landscape of causal effects. In this work, we posit a probability distribution for the causal effects, and assess its validity using simulations. Using a detailed reference panel of ~11 million common variants – among which only a small fraction are likely to be causal, but allowing for non-causal variants to show an association with the phenotype due to correlation with causal variants – we implement an exact procedure for estimating the number of causal variants and their mean strength of association with the phenotype. We find that, across different phenotypes, both these quantities – whose product allows for lower bound estimates of heritability – vary by orders of magnitude.

Download Full-text