scholarly journals Variance in Estimated Pairwise Genetic Distance Under High versus Low Coverage Sequencing: the Contribution of Linkage Disequilibrium

2017 ◽  
Author(s):  
Max Shpak ◽  
Yang Ni ◽  
Jie Lu ◽  
Peter Müller

AbstractThe mean pairwise genetic distance among haplotypes is an estimator of the population mutation rate θ and a standard measure of variation in a population. With the advent of next-generation sequencing (NGS) methods, this and other population parameters can be estimated under different modes of sampling. One approach is to sequence individual genomes with high coverage, and to calculate genetic distance over all sample pairs. The second approach, typically used for microbial samples or for tumor cells, is sequencing a large number of pooled genomes with very low individual coverage. With low coverage, pairwise genetic distances are calculated across independently sampled sites rather than across individual genomes. In this study, we show that the variance in genetic distance estimates is reduced with low coverage sampling if the mean pairwise linkage disequilibrium weighted by allele frequencies is positive. Practically, this means that if on average the most frequent alleles over pairs of loci are in positive linkage disequilibrium, low coverage sequencing results in improved estimates of θ, assuming similar per-site read depths. We show that this result holds under the expected distribution of allele frequencies and linkage disequilibria for an infinite sites model at mutation-drift equilibrium. From simulations, we find that the conditions for reduced variance only fail to hold in cases where variant alleles are few and at very low frequency. These results are applied to haplotype frequencies from a lung cancer tumor to compute the weighted linkage disequilibria and the expected error in estimated genetic distance using high versus low coverage.

Genetics ◽  
2002 ◽  
Vol 161 (3) ◽  
pp. 1269-1278 ◽  
Author(s):  
Bernhard Haubold ◽  
Jürgen Kroymann ◽  
Andreas Ratzka ◽  
Thomas Mitchell-Olds ◽  
Thomas Wiehe

Abstract Arabidopsis thaliana is a highly selfing plant that nevertheless appears to undergo substantial recombination. To reconcile its selfing habit with the observations of recombination, we have sampled the genetic diversity of A. thaliana at 14 loci of ~500 bp each, spread across 170 kb of genomic sequence centered on a QTL for resistance to herbivory. A total of 170 of the 6321 nucleotides surveyed were polymorphic, with 169 being biallelic. The mean silent genetic diversity (πs) varied between 0.001 and 0.03. Pairwise linkage disequilibria between the polymorphisms were negatively correlated with distance, although this effect vanished when only pairs of polymorphisms with four haplotypes were included in the analysis. The absence of a consistent negative correlation between distance and linkage disequilibrium indicated that gene conversion might have played an important role in distributing genetic diversity throughout the region. We tested this by coalescent simulations and estimate that up to 90% of recombination is due to gene conversion.


2018 ◽  
Author(s):  
Susanne Tilk ◽  
Alan Bergland ◽  
Aaron Goodman ◽  
Paul Schmidt ◽  
Dmitri Petrov ◽  
...  

AbstractEvolve-and-resequence (E+R) experiments leverage next-generation sequencing technology to track the allele frequency dynamics of populations as they evolve. While previous work has shown that adaptive alleles can be detected by comparing frequency trajectories from many replicate populations, this power comes at the expense of high-coverage (>100x) sequencing of many pooled samples, which can be cost-prohibitive. Here, we show that accurate estimates of allele frequencies can be achieved with very shallow sequencing depths (<5x) via inference of known founder haplotypes in small genomic windows. This technique can be used to efficiently estimate frequencies for any number of bi-allelic SNPs in populations of any model organism founded with sequenced homozygous strains. Using both experimentally-pooled and simulated samples of Drosophila melanogaster, we show that haplotype inference can improve allele frequency accuracy by orders of magnitude for up to 50 generations of recombination, and is robust to moderate levels of missing data, as well as different selection regimes. Finally, we show that a simple linear model generated from these simulations can predict the accuracy of haplotype-derived allele frequencies in other model organisms and experimental designs. To make these results broadly accessible for use in E+R experiments, we introduce HAF-pipe, an open-source software tool for calculating haplotype-derived allele frequencies from raw sequencing data. Ultimately, by reducing sequencing costs without sacrificing accuracy, our method facilitates E+R designs with higher replication and resolution, and thereby, increased power to detect adaptive alleles.


Genetics ◽  
1991 ◽  
Vol 129 (3) ◽  
pp. 925-930 ◽  
Author(s):  
W P Robinson ◽  
M A Asmussen ◽  
G Thomson

Abstract Combinations of allele frequencies and pairwise linkage disequilibrium terms, each of which is permissible at the two-locus level, may not always be permissible at the three-locus level. These additional constraints on the possible maximum and minimum values for the pairwise disequilibrium terms are formally determined and numerically analyzed. In some cases, the three-locus constraints on a pairwise disequilibrium (D) may be equivalent to the usual two-locus constraints, while in others, the positive or negative range may be restricted. This can result in situations where the allowable values of D are limited to only positive or only negative values up to the extreme case where there is only a single admissible value. No additional restrictions are placed on pairwise disequilibrium values when four loci are considered, other than those imposed by the three-way combinations containing the two loci of interest. A new measure of normalized pairwise linkage disequilibrium, allowing for the three-locus constraints, is defined and illustrated by an application to data from the human histocompatibility antigen (HLA) system. An analogous normalized three-way disequilibrium measure is also formulated.


Author(s):  
Bruce Walsh ◽  
Michael Lynch

Selection changes the additive-genetic variance (and hence the response in the mean) by both changing allele frequencies and by generating correlations among alleles at different loci (linkage disequilibrium). Such selection-induced correlations can be generated even between unlinked loci, and (generally) are negative, such that alleles increasing trait values tend to become increasingly negative correlated under direction or stabilizing selection, and positively correlated under disruptive selection. Such changes in the additive-genetic variance from disequilibrium is called the Bulmer effects. For a large number of loci, the amount of change can be predicted from the Bulmer equation, the analog of the breeder's equation, but now for the change in the variance. Upon cessation of selection, any disequilibrium decays away, and the variances revert back to their additive-genic variances (the additive variance in the absence of disequilibrium). Assortative mating also generates such disequilibrium.


2014 ◽  
Author(s):  
Lin Huang ◽  
Bo Wang ◽  
Ruitang Chen ◽  
Sivan Bercovici ◽  
Serafim Batzoglou

Population low-coverage whole-genome sequencing is rapidly emerging as a prominent approach for discovering genomic variation and genotyping a cohort. This approach combines substantially lower cost than full-coverage sequencing with whole-genome discovery of low-allele-frequency variants, to an extent that is not possible with array genotyping or exome sequencing. However, a challenging computational problem arises when attempting to discover variants and genotype the entire cohort. Variant discovery and genotyping are relatively straightforward on a single individual that has been sequenced at high coverage, because the inference decomposes into the independent genotyping of each genomic position for which a sufficient number of confidently mapped reads are available. However, in cases where low-coverage population data are given, the joint inference requires leveraging the complex linkage disequilibrium patterns in the cohort to compensate for sparse and missing data in each individual. The potentially massive computation time for such inference, as well as the missing data that confound low-frequency allele discovery, need to be overcome for this approach to become practical. Here, we present Reveel, a novel method for single nucleotide variant calling and genotyping of large cohorts that have been sequenced at low coverage. Reveel introduces a novel technique for leveraging linkage disequilibrium that deviates from previous Markov-based models. We evaluate Reveel???s performance through extensive simulations as well as real data from the 1000 Genomes Project, and show that it achieves higher accuracy in low-frequency allele discovery and substantially lower computation cost than previous state-of-the-art methods.


2017 ◽  
Author(s):  
Timothy P. Bilton ◽  
John C. McEwan ◽  
Shannon M. Clarke ◽  
Rudiger Brauning ◽  
Tracey C. van Stijn ◽  
...  

AbstractHigh-throughput sequencing methods that multiplex a large number of individuals have provided a cost-effective approach for discovering genome-wide genetic variation in large populations. These sequencing methods are increasingly being utilized in population genetic studies across a diverse range of species. One side-effect of these methods, however, is that one or more alleles at a particular locus may not be sequenced, particularly when the sequencing depth is low, resulting in some heterozygous genotypes being called as homozygous. Under-called heterozygous genotypes have a profound effect on the estimation of linkage disequilibrium and, if not taken into account, leads to inaccurate estimates. We developed a new likelihood method, GUS-LD, to estimate pairwise linkage disequilibrium using low coverage sequencing data that accounts for under-called heterozygous genotypes. Our findings show that accurate estimates were obtained using GUS-LD on low coverage sequencing data, whereas underestimation of linkage disequilibrium results if no adjustment is made for under-called heterozygotes.


Genetics ◽  
1985 ◽  
Vol 109 (3) ◽  
pp. 611-631
Author(s):  
Richard R Hudson

ABSTRACT The sampling distributions of several statistics that measure the association of alleles on gametes (linkage disequilibrium) are estimated under a two-locus neutral infinite allele model using an efficient Monte Carlo method. An often used approximation for the mean squared linkage disequilibrium is shown to be inaccurate unless the proper statistical conditioning is used. The joint distribution of linkage disequilibrium and the allele frequencies in the sample is studied. This estimated joint distribution is sufficient for obtaining an approximate maximum likelihood estimate of C = 4Nc, where N is the population size and c is the recombination rate. It has been suggested that observations of high linkage disequilibrium might be a good basis for rejecting a neutral model in favor of a model in which natural selection maintains genetic variation. It is found that a single sample of chromosomes, examined at two loci cannot provide sufficient information for such a test if C &lt; 10, because with C this small, very high levels of linkage disequilibrium are not unexpected under the neutral model. In samples of size 50, it is found that, even when C is as large as 50, the distribution of linkage disequilibrium conditional on the allele frequencies is substantially different from the distribution when there is no linkage between the loci. When conditioned on the number of alleles at each locus in the sample, all of the sample statistics examined are nearly independent of λ = 4Nμ, where μ is the neutral mutation rate.


Genetics ◽  
1999 ◽  
Vol 151 (3) ◽  
pp. 1053-1063 ◽  
Author(s):  
Ilik J Saccheri ◽  
Ian J Wilson ◽  
Richard A Nichols ◽  
Michael W Bruford ◽  
Paul M Brakefield

Abstract Polymorphic enzyme and minisatellite loci were used to estimate the degree of inbreeding in experimentally bottlenecked populations of the butterfly, Bicyclus anynana (Satyridae), three generations after founding events of 2, 6, 20, or 300 individuals, each bottleneck size being replicated at least four times. Heterozygosity fell more than expected, though not significantly so, but this traditional measure of the degree of inbreeding did not make full use of the information from genetic markers. It proved more informative to estimate directly the probability distribution of a measure of inbreeding, σ2, the variance in the number of descendants left per gene. In all bottlenecked lines, σ2 was significantly larger than in control lines (300 founders). We demonstrate that this excess inbreeding was brought about both by an increase in the variance of reproductive success of individuals, but also by another process. We argue that in bottlenecked lines linkage disequilibrium generated by the small number of haplotypes passing through the bottleneck resulted in hitchhiking of particular marker alleles with those haplotypes favored by selection. In control lines, linkage disequilibrium was minimal. Our result, indicating more inbreeding than expected from demographic parameters, contrasts with the findings of previous (Drosophila) experiments in which the decline in observed heterozygosity was slower than expected and attributed to associative overdominance. The different outcomes may both be explained as a consequence of linkage disequilibrium under different regimes of inbreeding. The likelihood-based method to estimate inbreeding should be of wide applicability. It was, for example, able to resolve small differences in σ2 among replicate lines within bottleneck-size treatments, which could be related to the observed variation in reproductive viability.


2021 ◽  
Vol 25 ◽  
pp. 233121652110101
Author(s):  
Dmitry I. Nechaev ◽  
Olga N. Milekhina ◽  
Marina S. Tomozova ◽  
Alexander Y. Supin

The goal of the study was to investigate the role of combination products in the higher ripple-density resolution estimates obtained by discrimination between a spectrally rippled and a nonrippled noise signal than that obtained by discrimination between two rippled signals. To attain this goal, a noise band was used to mask the frequency band of expected low-frequency combination products. A three-alternative forced-choice procedure with adaptive ripple-density variation was used. The mean background (unmasked) ripple-density resolution was 9.8 ripples/oct for rippled reference signals and 21.8 ripples/oct for nonrippled reference signals. Low-frequency maskers reduced the ripple-density resolution. For masker levels from −10 to 10 dB re. signal, the ripple-density resolution for nonrippled reference signals was approximately twice as high as that for rippled reference signals. At a masker level as high as 20 dB re. signal, the ripple-density resolution decreased in both discrimination tasks. This result leads to the conclusion that low-frequency combination products are not responsible for the task-dependent difference in ripple-density resolution estimates.


Sign in / Sign up

Export Citation Format

Share Document