scholarly journals Quantification of GC-biased gene conversion in the human genome

2014 ◽  
Author(s):  
Sylvain Glemin ◽  
Peter F Arndt ◽  
Philipp W Messer ◽  
Dmitri Petrov ◽  
Nicolas Galtier ◽  
...  

Many lines of evidence indicate GC-biased gene conversion (gBGC) has a major impact on the evolution of mammalian genomes. However, up to now, this process had not been properly quantified. In principle, the strength of gBGC can be measured from the analysis of derived allele frequency spectra. However, this approach is sensitive to a number of confounding factors. In particular, we show by simulations that the inference is pervasively affected by polymorphism polarization errors, especially at hypermutable sites, and spatial heterogeneity in gBGC strength. Here we propose a new method to quantify gBGC from DAF spectra, incorporating polarization errors and taking spatial heterogeneity into account. This method is very general in that it does not require any prior knowledge about the source of polarization errors and also provides information about mutation patterns. We apply this approach to human polymorphism data from the 1000 genomes project. We show that the strength of gBGC does not differ between hypermutable CpG sites and non-CpG sites, suggesting that in humans gBGC is not caused by the base-excision repair machinery. We further find that the impact of gBGC is concentrated primarily within recombination hotspots: genome-wide, the strength of gBGC is in the nearly neutral area, but 2% of the human genome is subject to strong gBGC, with population-scaled gBGC coefficients above 5. Given that the location of recombination hotspots evolves very rapidly, our analysis predicts that in the long term, a large fraction of the genome is affected by short episodes of strong gBGC.

2011 ◽  
Vol 29 (3) ◽  
pp. 1047-1057 ◽  
Author(s):  
D. Kostka ◽  
M. J. Hubisz ◽  
A. Siepel ◽  
K. S. Pollard

2018 ◽  
Author(s):  
Toni I. Gossmann ◽  
Mathias Bockwoldt ◽  
Lilith Diringer ◽  
Friedrich Schwarz ◽  
Vic-Fabienne Schumann

ABSTRACTIt is well established that GC content varies across the genome in many species and that GC biased gene conversion, one form of meiotic recombination, is likely to contribute to this heterogeneity. Bird genomes provide an extraordinary system to study the impact of GC biased gene conversion owed to their specific genomic features. They are characterised by a high karyotype conservation with substantial heterogeneity in chromosome sizes, with up to a dozen large macrochromosomes and many smaller microchromosomes common across all bird species. This heterogeneity in chromosome morphology is also reflected by other genomic features, such as smaller chromosomes being gene denser, more compact and more GC rich relative to their macrochromosomal counterparts - illustrating that the intensity of GC biased gene conversion varies across the genome. Here we study whether it is possible to infer heterogeneity in GC biased gene conversion rates across the genome using a recently published method that accounts for GC biased gene conversion when estimating branch lengths in a phylogenetic context. To infer the strength of GC biased gene conversion we contrast branch length estimates across the genome both taking and not taking non-stationary GC composition into account. Using simulations we show that this approach works well when GC fixation bias is strong and note that the number of substitutions along a branch is consistently overestimated when GC biased gene conversion is not accounted for. We use this predictable feature to infer the strength of GC dynamics across the great tit genome by applying our new test statistic to data at 4-fold degenerate sites from three bird species - great tit, zebra finch and chicken - three species that are among the best annotated bird genomes to date. We show that using a simple one-dimensional binning we fail to capture a signal of fixation bias as observed in our simulations. However, using a multidimensional binning strategy, we find evidence for heterogeneity in the strength of fixation bias, including AT fixation bias. This highlights the difficulties when combining sequence data across different regions in the genome.


2018 ◽  
Author(s):  
David Castellano ◽  
Adam Eyre-Walker ◽  
Kasper Munch

AbstractDNA diversity varies across the genome of many species. Variation in diversity across a genome might arise for one of three reasons; regional variation in the mutation rate, selection and biased gene conversion. We show that both non-coding and non-synonymous diversity are correlated to a measure of the mutation rate, the recombination rate and the density of conserved sequences in 50KB windows across the genomes of humans and non-human homininae. We show these patterns persist even when we restrict our analysis to GC-conservative mutations, demonstrating that the patterns are not driven by biased gene conversion. The positive correlation between diversity and our measure of the mutation rate seems to be largely a direct consequence of regions with higher mutation rates having more diversity. However, the positive correlation with recombination rate and the negative correlation with the density of conserved sequences suggests that selection at linked sites affect levels of diversity. This is supported by the observation that the ratio of the number of non-synonymous to non-coding polymorphisms is negatively correlated to a measure of the effective population size across the genome. Furthermore, we find evidence that these genomic variables are better predictors of non-coding diversity in large homininae populations than in small populations, after accounting for statistical power. This is consistent with genetic drift decreasing the impact of selection at linked sites in small populations. In conclusion, our comparative analyses describe for the first time how recombination rate, gene density, mutation rate and genetic drift interact to produce the patterns of DNA diversity that we observe along and between homininae genomes.


2016 ◽  
Vol 113 (38) ◽  
pp. 10607-10612 ◽  
Author(s):  
Priya Moorjani ◽  
Carlos Eduardo G. Amorim ◽  
Peter F. Arndt ◽  
Molly Przeworski

Events in primate evolution are often dated by assuming a constant rate of substitution per unit time, but the validity of this assumption remains unclear. Among mammals, it is well known that there exists substantial variation in yearly substitution rates. Such variation is to be expected from differences in life history traits, suggesting it should also be found among primates. Motivated by these considerations, we analyze whole genomes from 10 primate species, including Old World Monkeys (OWMs), New World Monkeys (NWMs), and apes, focusing on putatively neutral autosomal sites and controlling for possible effects of biased gene conversion and methylation at CpG sites. We find that substitution rates are up to 64% higher in lineages leading from the hominoid–NWM ancestor to NWMs than to apes. Within apes, rates are ∼2% higher in chimpanzees and ∼7% higher in the gorilla than in humans. Substitution types subject to biased gene conversion show no more variation among species than those not subject to it. Not all mutation types behave similarly, however; in particular, transitions at CpG sites exhibit a more clocklike behavior than do other types, presumably because of their nonreplicative origin. Thus, not only the total rate, but also the mutational spectrum, varies among primates. This finding suggests that events in primate evolution are most reliably dated using CpG transitions. Taking this approach, we estimate the human and chimpanzee divergence time is 12.1 million years,​ and the human and gorilla divergence time is 15.1 million years​.


2015 ◽  
Vol 5 (3) ◽  
pp. 441-447 ◽  
Author(s):  
Carina F Mugal ◽  
Peter F Arndt ◽  
Lena Holm ◽  
Hans Ellegren

Abstract The genomes of many vertebrates show a characteristic variation in GC content. To explain its origin and evolution, mainly three mechanisms have been proposed: selection for GC content, mutation bias, and GC-biased gene conversion. At present, the mechanism of GC-biased gene conversion, i.e., short-scale, unidirectional exchanges between homologous chromosomes in the neighborhood of recombination-initiating double-strand breaks in favor for GC nucleotides, is the most widely accepted hypothesis. We here suggest that DNA methylation also plays an important role in the evolution of GC content in vertebrate genomes. To test this hypothesis, we investigated one mammalian (human) and one avian (chicken) genome. We used bisulfite sequencing to generate a whole-genome methylation map of chicken sperm and made use of a publicly available whole-genome methylation map of human sperm. Inclusion of these methylation maps into a model of GC content evolution provided significant support for the impact of DNA methylation on the local equilibrium GC content. Moreover, two different estimates of equilibrium GC content, one that neglects and one that incorporates the impact of DNA methylation and the concomitant CpG hypermutability, give estimates that differ by approximately 15% in both genomes, arguing for a strong impact of DNA methylation on the evolution of GC content. Thus, our results put forward that previous estimates of equilibrium GC content, which neglect the hypermutability of CpG dinucleotides, need to be reevaluated.


2015 ◽  
Vol 112 (7) ◽  
pp. 2109-2114 ◽  
Author(s):  
Barbara Arbeithuber ◽  
Andrea J. Betancourt ◽  
Thomas Ebner ◽  
Irene Tiemann-Boege

Meiosis is a potentially important source of germline mutations, as sites of meiotic recombination experience recurrent double-strand breaks (DSBs). However, evidence for a local mutagenic effect of recombination from population sequence data has been equivocal, likely because mutation is only one of several forces shaping sequence variation. By sequencing large numbers of single crossover molecules obtained from human sperm for two recombination hotspots, we find direct evidence that recombination is mutagenic: Crossovers carry more de novo mutations than nonrecombinant DNA molecules analyzed for the same donors and hotspots. The observed mutations were primarily CG to TA transitions, with a higher frequency of transitions at CpG than non-CpGs sites. This enrichment of mutations at CpG sites at hotspots could predominate in methylated regions involving frequent single-stranded DNA processing as part of DSB repair. In addition, our data set provides evidence that GC alleles are preferentially transmitted during crossing over, opposing mutation, and shows that GC-biased gene conversion (gBGC) predominates over mutation in the sequence evolution of hotspots. These findings are consistent with the idea that gBGC could be an adaptation to counteract the mutational load of recombination.


2016 ◽  
Author(s):  
Priya Moorjani ◽  
Carlos Eduardo G. Amorim ◽  
Peter F. Arndt ◽  
Molly Przeworski

Events in primate evolution are often dated by assuming a "molecular clock", i.e., a constant rate of substitution per unit time, but the validity of this assumption remains unclear. Among mammals, it is well known that there exists substantial variation in yearly substitution rates. Such variation is to be expected from differences in life-history traits, suggesting that it should also be found among primates. Motivated by these considerations, we analyze whole genomes from ten primate species, including Old World Monkeys (OWMs), New World Monkeys (NWMs) and apes, focusing on putatively neutral autosomal sites and controlling for possible effects of biased gene conversion and methylation at CpG sites. We find that substitution rates are ~65% higher in lineages leading from the hominoid-NWM ancestor to NWMs than to apes. Within apes, rates are ~2% higher in chimpanzees and ~7% higher in the gorilla than in humans. Substitution types subject to biased gene conversion show no more variation among species than those not subject to it. Not all mutation types behave similarly, however: in particular, transitions at CpG sites exhibit a more clock-like behavior than do other types, presumably due to their non-replicative origin. Thus, not only the total rate, but also the mutational spectrum varies among primates. This finding suggests that events in primate evolution are most reliably dated using CpG transitions. Taking this approach, we estimate that the average time to the most recent common ancestor of human and chimpanzee is 12.1 million years and their split time 7.9 million years.


Sign in / Sign up

Export Citation Format

Share Document