Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits

Mapping Intimacies ◽

10.1101/115527 ◽

2017 ◽

Cited By ~ 10

Author(s):

Luke M. Evans ◽

Rasool Tahmasbi ◽

Scott I. Vrieze ◽

Gonçalo R. Abecasis ◽

Sayantan Das ◽

...

Keyword(s):

Complex Traits ◽

Snp Array ◽

Causal Variant ◽

Whole Genome Sequence ◽

Whole Genome ◽

Narrow Sense Heritability ◽

Frequency Spectra ◽

Genome Wide ◽

Variant Frequency ◽

Causal Variants

ABSTRACTHeritability, h2, is a foundational concept in genetics, critical to understanding the genetic basis of complex traits. Recently-developed methods that estimate heritability from genotyped SNPs, h2SNP, explain substantially more genetic variance than genome-wide significant loci, but less than classical estimates from twins and families. However, h2SNP estimates have yet to be comprehensively compared under a range of genetic architectures, making it difficult to draw conclusions from sometimes conflicting published estimates. Here, we used thousands of real whole genome sequences to simulate realistic phenotypes under a variety of genetic architectures, including those from very rare causal variants. We compared the performance of ten methods across different types of genotypic data (commercial SNP array positions, whole genome sequence variants, and imputed variants) and under differing causal variant frequencies, levels of stratification, and relatedness thresholds. These results provide guidance in interpreting past results and choosing optimal approaches for future studies. We then chose two methods (GREML-MS and GREML-LDMS) that best estimated overall h2SNP and the causal variant frequency spectra to six phenotypes in the UK Biobank using imputed genome-wide variants. Our results suggest that as imputation reference panels become larger and more diverse, estimates of the frequency distribution of causal variants will become increasingly unbiased and the vast majority of trait narrow-sense heritability will be accounted for.

Download Full-text

Predicting causal variants affecting expression using whole genome sequence and RNA-seq from multiple human tissues

10.1101/088872 ◽

2016 ◽

Cited By ~ 2

Author(s):

Andrew Anand Brown ◽

Ana Viñuela ◽

Olivier Delaneau ◽

Tim Spector ◽

Kerrin Small ◽

...

Keyword(s):

Genome Sequence ◽

Complex Traits ◽

Causal Variant ◽

Whole Genome Sequence ◽

Open Chromatin ◽

Whole Genome ◽

Rna Seq ◽

Derived Properties ◽

Causal Variants ◽

Genomic Regions

Genetic association mapping produces statistical links between phenotypes and genomic regions, but identifying the causal variants themselves remains difficult. Complete knowledge of all genetic variants, as provided by whole genome sequence (WGS), will help, but is currently financially prohibitive for well powered GWAS studies. To explore the advantages of WGS in a well powered setting, we performed eQTL mapping using WGS and RNA-seq, and showed that the lead eQTL variants called using WGS are more likely to be causal. We derived properties of the causal variant from simulation studies, and used these to propose a method for implicating likely causal SNPs. This method predicts that 25% - 70% of the causal variants lie in open chromatin regions, depending on tissue and experiment. Finally, we identify a set of high confidence causal variants and show that they are more enriched in GWAS associations than other eQTL. Of these, we find 65 associations with GWAS traits and show examples where the gene implicated by expression has been functionally validated as relevant for complex traits.

Download Full-text

Investigating the Effect of Imputed Structural Variants from Whole-Genome Sequence on Genome-Wide Association and Genomic Prediction in Dairy Cattle

Animals ◽

10.3390/ani11020541 ◽

2021 ◽

Vol 11 (2) ◽

pp. 541

Author(s):

Long Chen ◽

Jennie E. Pryce ◽

Ben J. Hayes ◽

Hans D. Daetwyler

Keyword(s):

Dairy Cattle ◽

Genomic Prediction ◽

Complex Traits ◽

Prediction Accuracy ◽

Association Studies ◽

Genome Wide Association ◽

Whole Genome Sequence ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Genome Wide

Structural variations (SVs) are large DNA segments of deletions, duplications, copy number variations, inversions and translocations in a re-sequenced genome compared to a reference genome. They have been found to be associated with several complex traits in dairy cattle and could potentially help to improve genomic prediction accuracy of dairy traits. Imputation of SVs was performed in individuals genotyped with single-nucleotide polymorphism (SNP) panels without the expense of sequencing them. In this study, we generated 24,908 high-quality SVs in a total of 478 whole-genome sequenced Holstein and Jersey cattle. We imputed 4489 SVs with R2 > 0.5 into 35,568 Holstein and Jersey dairy cattle with 578,999 SNPs with two pipelines, FImpute and Eagle2.3-Minimac3. Genome-wide association studies for production, fertility and overall type with these 4489 SVs revealed four significant SVs, of which two were highly linked to significant SNP. We also estimated the variance components for SNP and SV models for these traits using genomic best linear unbiased prediction (GBLUP). Furthermore, we assessed the effect on genomic prediction accuracy of adding SVs to GBLUP models. The estimated percentage of genetic variance captured by SVs for production traits was up to 4.57% for milk yield in bulls and 3.53% for protein yield in cows. Finally, no consistent increase in genomic prediction accuracy was observed when including SVs in GBLUP.

Download Full-text

Genome-wide profiling of microRNAs and prediction of mRNA targets in 17 bovine tissues

10.21203/rs.2.9876/v1 ◽

2019 ◽

Author(s):

Min Wang ◽

Amanda J Chamberlain ◽

Claire P Prowse-Wilkins ◽

Christy J Vander Jagt ◽

Timothy P Hancock ◽

...

Keyword(s):

Complex Traits ◽

Whole Genome Sequence ◽

Messenger Rnas ◽

Mrna Targets ◽

Genome Wide ◽

Target Sites ◽

Causal Variants ◽

Mature Micrornas ◽

Temporal And Spatial ◽

Animal Genomes

Abstract Background MicroRNAs regulate many eukaryotic biological processes in a temporal- and spatial-specific manner. Yet in cattle it is not fully known which microRNAs are expressed in each tissue, which genes they regulate, or which sites a given microRNA bind to within messenger RNAs (mRNAs). An improved annotation of tissue-specific microRNA network may in the future assist with the identification of causal variants affecting complex traits. Results We report findings from analysing short RNA sequence from 17 tissues from a single lactating dairy cow. Using miRDeep2, we identified 699 expressed mature microRNAs. Using TargetScan, known (60%) and novel (40%) microRNAs were predicted to interact with 780,481 sites in bovine mRNAs homologous with human. Putative interactions between microRNA families and targets were significantly enriched for interactions from previous experimental and computational identification. Characterizing features of microRNAs and targets, we showed that (1) mature microRNAs derived from different arms of the same precursor targeted different genes in different tissues; (2) miRNA target sites preferentially occurred within gene regions undergoing active histone modification; (3) variants within microRNAs and targets had lower allele frequencies than variants across the genome, as identified from 65 million whole genome sequence variants; (4) no significant correlation was found between the abundance of microRNAs and mRNAs differentially expressed in the same tissue; (5) microRNAs and target sites weren’t significantly associated with allelic imbalance of gene targets. Conclusion This study contributes to the goals of Functional Annotation of Animal Genomes consortium to improve the annotation of genomes of domestic animals.

Download Full-text

Assessing Runs of Homozygosity: A comparison of SNP Array and Whole Genome Sequence low coverage data

10.1101/160705 ◽

2017 ◽

Author(s):

Francisco C. Ceballos ◽

Scott Hazelhurst ◽

Michèle Ramsay

Keyword(s):

Genome Sequence ◽

Complex Traits ◽

Genetic Basis ◽

Snp Array ◽

Population History ◽

Whole Genome Sequence ◽

Whole Genome ◽

Runs Of Homozygosity ◽

Technological Advances ◽

Low Coverage

AbstractRuns of Homozygosity (ROH) are sequences that arise when identical haplotypes are inherited from each parent. Since their first detection due to technological advances in the late 1990s, ROHs have been shedding light on human population history and deciphering the genetic basis of monogenic and complex traits and diseases. ROH studies have predominantly exploited SNP array data, but are gradually moving to whole genome sequence (WGS) data as it becomes available. WGS data, covering more genetic variability, can add value to ROH studies, but require additional considerations during analysis. Using SNP array and low coverage WGS data from 1885 individuals from 20 world populations, our aims were to compare ROH from the two datasets and to establish software conditions to get comparable results, thus providing guidelines for combining disparate datasets in joint ROH analyses. Using the PLINK Homozygosity functions, we found that by allowing 3 heterozygous SNPs per window when dealing with WGS low coverage data, it is possible to establish meaningful comparisons between data using the two technologies.

Download Full-text

Genome-wide profiling of microRNAs and prediction of mRNA targets in 17 bovine tissues

10.1101/574954 ◽

2019 ◽

Author(s):

Min Wang ◽

Amanda J Chamberlain ◽

Claire P Prowse-Wilkins ◽

Christy J Vander Jagt ◽

Timothy P Hancock ◽

...

Keyword(s):

Complex Traits ◽

Whole Genome Sequence ◽

Messenger Rnas ◽

Mrna Targets ◽

Genome Wide ◽

Mature Microrna ◽

Target Sites ◽

Causal Variants ◽

Mature Micrornas ◽

Animal Genomes

AbstractMicroRNAs regulate many eukaryotic biological processes in a temporal- and spatial-specific manner. Yet in cattle it is not fully known which microRNAs are expressed in each tissue, which genes they regulate, or which sites a given microRNA bind to within messenger RNAs. An improved annotation of tissue-specific microRNA network may in the future assist with the identification of causal variants affecting complex traits. Here, we report findings from analysing short RNA sequence from 17 tissues from a single lactating dairy cow. Using miRDeep2, we identified 699 expressed mature microRNA sequences. Using TargetScan, known (60%) and novel (40%) microRNAs were predicted to interact with 780,481 sites in bovine messenger RNAs homologous with human. Putative interactions between microRNA families and targets were significantly enriched for interactions from previous experimental and computational identification. Characterizing features of microRNAs and targets, we showed that (1) mature microRNAs derived from different arms of the same precursor targeted different genes in different tissues; (2) miRNA target sites preferentially occurred within gene regions marked with active histone modification; (3) variants within microRNAs and targets had lower allele frequencies than variants across the genome, as identified from 65 million whole genome sequence variants; (4) no significant correlation was found between the abundance of microRNAs and messenger RNAs differentially expressed in the same tissue; (5) microRNAs and target sites weren’t significantly associated with allelic imbalance of gene targets. This study contributes to the goals of Functional Annotation of Animal Genomes consortium to improve the annotation of genomes of domestic animals.

Download Full-text

Family-based gene-environment interaction using sequence kernel association test (FGE-SKAT) for complex quantitative traits

Scientific Reports ◽

10.1038/s41598-021-86871-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chao-Yu Guo ◽

Reng-Hong Wang ◽

Hsin-Chou Yang

Keyword(s):

Complex Traits ◽

Association Studies ◽

Association Test ◽

Whole Genome Sequence ◽

Environment Interaction ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Sequence Kernel Association Test ◽

Gene Environment ◽

Family Based

AbstractAfter the genome-wide association studies (GWAS) era, whole-genome sequencing is highly engaged in identifying the association of complex traits with rare variations. A score-based variance-component test has been proposed to identify common and rare genetic variants associated with complex traits while quickly adjusting for covariates. Such kernel score statistic allows for familial dependencies and adjusts for random confounding effects. However, the etiology of complex traits may involve the effects of genetic and environmental factors and the complex interactions between genes and the environment. Therefore, in this research, a novel method is proposed to detect gene and gene-environment interactions in a complex family-based association study with various correlated structures. We also developed an R function for the Fast Gene-Environment Sequence Kernel Association Test (FGE-SKAT), which is freely available as supplementary material for easy GWAS implementation to unveil such family-based joint effects. Simulation studies confirmed the validity of the new strategy and the superior statistical power. The FGE-SKAT was applied to the whole genome sequence data provided by Genetic Analysis Workshop 18 (GAW18) and discovered concordant and discordant regions compared to the methods without considering gene by environment interactions.

Download Full-text

Genome-wide association analysis of milk yield traits in Nordic Red Cattle using imputed whole genome sequence variants

BMC Genetics ◽

10.1186/s12863-016-0363-8 ◽

2016 ◽

Vol 17 (1) ◽

Cited By ~ 42

Author(s):

T. Iso-Touru ◽

G. Sahana ◽

B. Guldbrandtsen ◽

M. S. Lund ◽

J. Vilkki

Keyword(s):

Association Analysis ◽

Milk Yield ◽

Genome Sequence ◽

Genome Wide Association ◽

Whole Genome Sequence ◽

Sequence Variants ◽

Whole Genome ◽

Yield Traits ◽

Genome Wide Association Analysis ◽

Genome Wide

Download Full-text

Whole genome sequence analysis of rice genotypes with contrasting response to salinity stress

Scientific Reports ◽

10.1038/s41598-020-78256-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Prasanta K. Subudhi ◽

Rama Shankar ◽

Mukesh Jain

Keyword(s):

Salt Tolerance ◽

Dna Polymorphisms ◽

Whole Genome Sequence ◽

Whole Genome ◽

Salt Tolerant ◽

Tolerance Mechanisms ◽

Rice Varieties ◽

Genome Wide ◽

Rice Genotypes ◽

Genome Level

AbstractSalinity is a major abiotic constraint for rice farming. Abundant natural variability exists in rice germplasm for salt tolerance traits. Since few studies focused on the genome level variation in rice genotypes with contrasting response to salt stress, genomic resequencing in diverse genetic materials is needed to elucidate the molecular basis of salt tolerance mechanisms. The whole genome sequences of two salt tolerant (Pokkali and Nona Bokra) and three salt sensitive (Bengal, Cocodrie, and IR64) rice genotypes were analyzed. A total of 413 million reads were generated with a mean genome coverage of 93% and mean sequencing depth of 18X. Analysis of the DNA polymorphisms revealed that 2347 nonsynonymous SNPs and 51 frameshift mutations could differentiate the salt tolerant from the salt sensitive genotypes. The integration of genome-wide polymorphism information with the QTL mapping and expression profiling data led to identification of 396 differentially expressed genes with large effect variants in the coding regions. These genes were involved in multiple salt tolerance mechanisms, such as ion transport, oxidative stress tolerance, signal transduction, and transcriptional regulation. The genome-wide DNA polymorphisms and the promising candidate genes identified in this study represent a valuable resource for molecular breeding of salt tolerant rice varieties.

Download Full-text

Towards Response Prediction Using Integrated Genomics in Chronic Lymphocytic Leukaemia: Results on 250 First-Line FCR Treated Patients from UK Clinical Trials

Blood ◽

10.1182/blood.v124.21.1942.1942 ◽

2014 ◽

Vol 124 (21) ◽

pp. 1942-1942

Author(s):

Ruth M Clifford ◽

Pauline Robbe ◽

Susanne Weller ◽

Adele T Timbs ◽

Michalis Titsias ◽

...

Keyword(s):

Outcome Measure ◽

Lymphocytic Leukaemia ◽

Snp Array ◽

Whole Genome ◽

Line Treatment ◽

First Line ◽

Genome Wide ◽

Recurrent Mutations ◽

Integrated Genomics ◽

First Line Treatment

Abstract Background: Major progress has been made in understanding disease biology and therapeutic options for patients with chronic lymphocytic leukaemia (CLL). Recurrent mutations have been discovered using next generation sequencing, but with the exception of TP53 disruption their potential impact on response to treatment is unknown. In order to address this question, we characterised the genomic landscape of 250 first-line chemo-immunotherapy treated CLL patients within UK clinical trials using targeted resequencing and whole-genome SNP array. Methods: We studied patients from two UK-based Phase II randomised controlled trials (AdMIRe and ARCTIC) receiving FCR-based treatment in a first-line treatment setting. A TruSeq Custom Amplicon panel (TSCA, Illumina) was designed targeting 10 genes recurrently mutated in CLL based on recent publications.Average sequencing depth was 2260X. The cumulated length of targets sequenced was 7.87 kb from 330 amplicons covering 160 exons. Alignment and variant calling included a combination of three pipelines to confidently detect SNVs, indels and low level frequency mutations. SNP array testing was performed using HumanOmni2.5-8 BeadChips, (Illumina) and data analysed using Nexus 6.1 Discovery Edition, Biodiscovery. We performed targeted resequencing and genome-wide SNP arrays using selected samples’ germline material to confirm somatic mutations (n=40). Univariate and multivariate analyses using minimal residual disease (MRD) as the outcome measure were performed for 220 of the 250 patients. Results: Pathogenic mutations were identified in 165 (66%) patients, totalling 268 mutations in 10 genes. ATM was the most frequently mutated gene affecting 67 patients (29%) followed by SF3B1 (n=56, 24%), NOTCH1 (n= 32, 14%), TP53 (n= 21, 9%), BIRC3 (n= 17, 7%) and XPO1 (n=14, 6%). Less frequently recurrent mutations were seen in SAMHD1 (n=8, 3%), MYD88 (n= 4, 2%), MED12 (n=7, 3%) and ZFPM2 (n=5, 2%). Integrating sequencing and array results increased the patients with one or more CLL driver mutation from 66% to 94%. As previously reported del17p and TP53 mutations are co-occurring and associate with MRD positivity in all cases (n=15, p=0.0002). We report on minor TP53 subclones in 11 patients (VAF 1-5%), 8 of whom have MRD data available and were also associated with MRD positivity. Deletions of 11q were present in 44 patients. These lesions always included ATM but not always BIRC3. Bialleleic disruption was present in ATM for 27 patients (significantly associated with MRD positivity) and in BIRC3 for 4 patients. Rather surprisingly, trisomy 12 (n=33) and NOTCH1 mutations (n=28) were associated with MRD negativity (p=0.006 and 0.097, respectively). Analysing clonal and subclonal mutations per gene revealed the majority of mutations in SF3B1 and BIRC3 were subclonal (65% and 87% respectively). In contrast almost all SAMHD1 and MYD88 mutations were clonally distributed. There was an association between NOTCH1 subclonal mutations and MRD negativity, compared to clonal mutations, but this difference was not seen in the remaining mutated genes. From our copy number data, the presence of subclones was associated with MRD positivity (p=0.05). Combining important lesions in a multiple logistic regression analysis to predict MRD positivity, bialleleic ATM disruption, together with TP53 disruption, were the strongest predictors, followed by SAMHD1, whereas BIRC3 monoalleleic mutations were a medium predictor for MRD negativity. Conclusion: This is the first integrated genome-wide analysis of the distribution and associations of CLL drivers, using targeted deep resequencing and whole genome SNP arrays in an FCR-based first-line treatment setting. We have shown subclonal and clonal mutation profiles in all patients. For patients with two or more CLL-associated mutations we have begun to unravel clonal hierarchies. We have developed a comprehensive model using MRD as an outcome measure and have found bialleleic ATM mutations and SAMHD1 disruption to strongly predict for MRD positivity. Using MRD status as a robust proxy for PFS not only enables us to confirm results of previous studies, but is advantageous also in considerably reducing the timeframe for results. Indeed, we suggest that MRD status should be assessed routinely in future studies to complement modern integrated genomics approaches. Disclosures Hillmen: Pharmacyclics, Janssen, Gilead, Roche: Honoraria, Research Funding.

Download Full-text