The nonlinear structure of linkage disequilibrium

Mapping Intimacies ◽

10.1101/566208 ◽

2019 ◽

Author(s):

Reginald D. Smith

Keyword(s):

Linkage Disequilibrium ◽

Frequency Dependence ◽

Allele Frequency ◽

Allele Frequencies ◽

Nonlinear Dependence ◽

Mathematical Framework ◽

Linkage Disequilibrium Structure ◽

Nonlinear Structure ◽

Linear And Nonlinear ◽

Fundamental Shift

AbstractThe allele frequency dependence of the ranges of all measures of linkage disequilibrium is well-known. The maximum values of commonly used parameters such as r2 and D vary depending on the allele frequencies at each locus. However, though this phenomenon is recognized and accounted for in many studies, the comprehensive mathematical framework underlying the limits of linkage disequilibrium measures at various frequency combinations is often heuristic or empirical. Here, it is demonstrated that underlying this behavior is the fundamental shift between linear and nonlinear dependence in the linkage disequilibrium structure between loci. The proportion of linear and nonlinear dependence can be estimated and it demonstrates how even the same values of r2 can have different implications for the nature of the overall dependence. One result of this is the value of D′, when defined as only a positive number, has a minimum value of |r|. Understanding this dependence is crucial to making correct inferences about the relationships between two loci in linkage disequilibrium.

Download Full-text

Exact Distribution of Linkage Disequilibrium in the Presence of Mutation, Selection or Minor Allele Frequency Filtering

10.1101/794347 ◽

2019 ◽

Author(s):

Jiayi Qu ◽

Stephen D Kachman ◽

Dorian Garrick ◽

Rohan L Fernando ◽

Hao Cheng

Keyword(s):

Linkage Disequilibrium ◽

Allele Frequency ◽

Minor Allele Frequency ◽

Finite Population ◽

Exact Distribution ◽

Minor Allele ◽

Allele Frequencies ◽

Frequency Filtering ◽

Genetics And Genomics ◽

Deterministic Formula

ABSTRACTLinkage disequilibrium (LD), often expressed in terms of the squared correlation (r2) between allelic values at two loci, is an important concept in many branches of genetics and genomics. Genetic drift and recombination have opposite effects on LD, and thus r2 will keep changing until the effects of these two forces are counterbalanced. Several approximations have been used to determine the expected value of r2 at equilibrium in the presence or absence of mutation. In this paper, we propose a probability-based approach to compute the exact distribution of allele frequencies at two loci in a finite population at any generation t conditional on the distribution at generation t − 1. As r2 is a function of this distribution of allele frequencies, this approach can be used to examine the distribution of r2 over generations as it approaches equilibrium. The exact distribution of LD from our method is used to describe, quantify and compare LD at different equilibria, including equilibrium in the absence or presence of mutation, selection, and filtering by minor allele frequency. We also propose a deterministic formula for expected LD in the presence of mutation at equilibrium based on the exact distribution of LD.

Download Full-text

Inbreeding of Bottlenecked Butterfly Populations: Estimation Using the Likelihood of Changes in Marker Allele Frequencies

Genetics ◽

10.1093/genetics/151.3.1053 ◽

1999 ◽

Vol 151 (3) ◽

pp. 1053-1063 ◽

Cited By ~ 3

Author(s):

Ilik J Saccheri ◽

Ian J Wilson ◽

Richard A Nichols ◽

Michael W Bruford ◽

Paul M Brakefield

Keyword(s):

Linkage Disequilibrium ◽

Probability Distribution ◽

Reproductive Success ◽

Genetic Markers ◽

Allele Frequencies ◽

Demographic Parameters ◽

Bicyclus Anynana ◽

Marker Allele ◽

Wide Applicability ◽

Per Gene

Abstract Polymorphic enzyme and minisatellite loci were used to estimate the degree of inbreeding in experimentally bottlenecked populations of the butterfly, Bicyclus anynana (Satyridae), three generations after founding events of 2, 6, 20, or 300 individuals, each bottleneck size being replicated at least four times. Heterozygosity fell more than expected, though not significantly so, but this traditional measure of the degree of inbreeding did not make full use of the information from genetic markers. It proved more informative to estimate directly the probability distribution of a measure of inbreeding, σ2, the variance in the number of descendants left per gene. In all bottlenecked lines, σ2 was significantly larger than in control lines (300 founders). We demonstrate that this excess inbreeding was brought about both by an increase in the variance of reproductive success of individuals, but also by another process. We argue that in bottlenecked lines linkage disequilibrium generated by the small number of haplotypes passing through the bottleneck resulted in hitchhiking of particular marker alleles with those haplotypes favored by selection. In control lines, linkage disequilibrium was minimal. Our result, indicating more inbreeding than expected from demographic parameters, contrasts with the findings of previous (Drosophila) experiments in which the decline in observed heterozygosity was slower than expected and attributed to associative overdominance. The different outcomes may both be explained as a consequence of linkage disequilibrium under different regimes of inbreeding. The likelihood-based method to estimate inbreeding should be of wide applicability. It was, for example, able to resolve small differences in σ2 among replicate lines within bottleneck-size treatments, which could be related to the observed variation in reproductive viability.

Download Full-text

Inferring Linkage Disequilibrium Between a Polymorphic Marker Locus and a Trait Locus in Natural Populations

Genetics ◽

10.1093/genetics/156.1.457 ◽

2000 ◽

Vol 156 (1) ◽

pp. 457-467 ◽

Cited By ~ 1

Author(s):

Z W Luo ◽

S H Tao ◽

Z-B Zeng

Keyword(s):

Linkage Disequilibrium ◽

Allele Frequency ◽

Random Mating ◽

Natural Populations ◽

Polymorphic Marker ◽

Marker Locus ◽

Model Parameters ◽

Phenotypic Variance ◽

Wide Range ◽

Trait Locus

Abstract Three approaches are proposed in this study for detecting or estimating linkage disequilibrium between a polymorphic marker locus and a locus affecting quantitative genetic variation using the sample from random mating populations. It is shown that the disequilibrium over a wide range of circumstances may be detected with a power of 80% by using phenotypic records and marker genotypes of a few hundred individuals. Comparison of ANOVA and regression methods in this article to the transmission disequilibrium test (TDT) shows that, given the genetic variance explained by the trait locus, the power of TDT depends on the trait allele frequency, whereas the power of ANOVA and regression analyses is relatively independent from the allelic frequency. The TDT method is more powerful when the trait allele frequency is low, but much less powerful when it is high. The likelihood analysis provides reliable estimation of the model parameters when the QTL variance is at least 10% of the phenotypic variance and the sample size of a few hundred is used. Potential use of these estimates in mapping the trait locus is also discussed.

Download Full-text

Estimation of Cry3Bb1 resistance allele frequency in field populations of western corn rootworm using a genetic marker

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkaa013 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Alan Willse ◽

Lex Flagel ◽

Graham Head

Keyword(s):

Allele Frequency ◽

Western Corn Rootworm ◽

Allele Frequencies ◽

Chromosome 8 ◽

Corn Belt ◽

Corn Rootworm ◽

Resistance Allele ◽

Response To Selection ◽

The Us ◽

Causal Allele

Abstract Following the discovery of western corn rootworm (WCR; Diabrotica virgifera virgifera) populations resistant to the Bacillus thuringiensis (Bt) protein Cry3Bb1, resistance was genetically mapped to a single locus on WCR chromosome 8 and linked SNP markers were shown to correlate with the frequency of resistance among field-collected populations from the US Corn Belt. The purpose of this paper is to further investigate the relationship between one of these resistance-linked markers and the causal resistance locus. Using data from laboratory bioassays and field experiments, we show that one allele of the resistance-linked marker increased in frequency in response to selection, but was not perfectly linked to the causal resistance allele. By coupling the response to selection data with a genetic model of the linkage between the marker and the causal allele, we developed a model that allowed marker allele frequencies to be mapped to causal allele frequencies. We then used this model to estimate the resistance allele frequency distribution in the US Corn Belt based on collections from 40 populations. These estimates suggest that chromosome 8 Cry3Bb1 resistance allele frequency was generally low (<10%) for 65% of the landscape, though an estimated 13% of landscape has relatively high (>25%) resistance allele frequency.

Download Full-text

A simple method to estimate the in-house limit of detection for genetic mutations with low allele frequencies in whole-exome sequencing analysis by next-generation sequencing

BMC Genomic Data ◽

10.1186/s12863-020-00956-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Takumi Miura ◽

Satoshi Yasuda ◽

Yoji Sato

Keyword(s):

Next Generation Sequencing ◽

Allele Frequency ◽

Somatic Mutations ◽

Limit Of Detection ◽

Allele Frequencies ◽

Genetic Mutations ◽

Sequencing Data ◽

Simple Method ◽

Whole Exome ◽

Generation Sequencing

Abstract Background Next-generation sequencing (NGS) has profoundly changed the approach to genetic/genomic research. Particularly, the clinical utility of NGS in detecting mutations associated with disease risk has contributed to the development of effective therapeutic strategies. Recently, comprehensive analysis of somatic genetic mutations by NGS has also been used as a new approach for controlling the quality of cell substrates for manufacturing biopharmaceuticals. However, the quality evaluation of cell substrates by NGS largely depends on the limit of detection (LOD) for rare somatic mutations. The purpose of this study was to develop a simple method for evaluating the ability of whole-exome sequencing (WES) by NGS to detect mutations with low allele frequency. To estimate the LOD of WES for low-frequency somatic mutations, we repeatedly and independently performed WES of a reference genomic DNA using the same NGS platform and assay design. LOD was defined as the allele frequency with a relative standard deviation (RSD) value of 30% and was estimated by a moving average curve of the relation between RSD and allele frequency. Results Allele frequencies of 20 mutations in the reference material that had been pre-validated by droplet digital PCR (ddPCR) were obtained from 5, 15, 30, or 40 G base pair (Gbp) sequencing data per run. There was a significant association between the allele frequencies measured by WES and those pre-validated by ddPCR, whose p-value decreased as the sequencing data size increased. By this method, the LOD of allele frequency in WES with the sequencing data of 15 Gbp or more was estimated to be between 5 and 10%. Conclusions For properly interpreting the WES data of somatic genetic mutations, it is necessary to have a cutoff threshold of low allele frequencies. The in-house LOD estimated by the simple method shown in this study provides a rationale for setting the cutoff.

Download Full-text

Selection, Subdivision and Extinction and Recolonization

Genetics ◽

10.1093/genetics/166.2.1105 ◽

2004 ◽

Vol 166 (2) ◽

pp. 1105-1114 ◽

Cited By ~ 1

Author(s):

Joshua L Cherry

Keyword(s):

Frequency Dependence ◽

Allele Frequency ◽

Deleterious Alleles ◽

Subdivided Population ◽

Theoretical Predictions ◽

Subdivided Populations ◽

Fixation Probabilities ◽

Selection Parameters ◽

To Come ◽

Do So

Abstract In a subdivided population, the interaction between natural selection and stochastic change in allele frequency is affected by the occurrence of local extinction and subsequent recolonization. The relative importance of selection can be diminished by this additional source of stochastic change in allele frequency. Results are presented for subdivided populations with extinction and recolonization where there is more than one founding allele after extinction, where these may tend to come from the same source deme, where the number of founding alleles is variable or the founders make unequal contributions, and where there is dominance for fitness or local frequency dependence. The behavior of a selected allele in a subdivided population is in all these situations approximately the same as that of an allele with different selection parameters in an unstructured population with a different size. The magnitude of the quantity Nese, which determines fixation probability in the case of genic selection, is always decreased by extinction and recolonization, so that deleterious alleles are more likely to fix and advantageous alleles less likely to do so. The importance of dominance or frequency dependence is also altered by extinction and recolonization. Computer simulations confirm that the theoretical predictions of both fixation probabilities and mean times to fixation are good approximations.

Download Full-text

The Dynamics of Gynodioecy in Plantago lanceolatu L. II. Mode of Action and Frequencies of Restorer Alleles

Genetics ◽

10.1093/genetics/147.3.1317 ◽

1997 ◽

Vol 147 (3) ◽

pp. 1317-1328

Author(s):

Anita A de Haan ◽

Hans P Koelewijn ◽

Maria P J Hundscheid ◽

Jos M M Van Damme

Keyword(s):

Cytoplasmic Male Sterility ◽

Male Sterility ◽

Allele Frequency ◽

Mode Of Action ◽

Male Fertility ◽

Fertility Restoration ◽

Plantago Lanceolata ◽

Allele Frequencies ◽

Male Sterile ◽

Male Fertility Restoration

Male fertility in Plantago lanceolata is controlled by the interaction of cytoplasmic and nuclear genes. Different cytoplasmic male sterility (CMS) types can be either male sterile or hermaphrodite, depending on the presence of nuclear restorer alleles. In three CMS types of P. lanceolata (CMSI, CMSIIa, and CMSIIb) the number of loci involved in male fertility restoration was determined. In each CMS type, male fertility was restored by multiple genes with either dominant or recessive action and capable either of restoring male fertility independently or in interaction with each other (epistasis). Restorer allele frequencies for CMSI, CMSIIa and CMSIIb were determined by crossing hermaphrodites with “standard” male steriles. Segregation of male steriles vs. non-male steriles was used to estimate overall restorer allele frequency. The frequency of restorer alleles was different for the CMS types: restorer alleles for CMSI were less frequent than for CMSIIa and CMSIIb. On the basis of the frequencies of male steriles and the CMS types an “expected” restorer allele frequency could be calculated. The correlation between estimated and expected restorer allele frequency was significant.

Download Full-text

Allele frequency divergence reveals ubiquitous influence of positive selection in Drosophila

10.1101/2021.03.15.435474 ◽

2021 ◽

Author(s):

Jason Bertram

Keyword(s):

Allele Frequency ◽

Evolutionary Biology ◽

Polymorphic Locus ◽

Purifying Selection ◽

Allele Frequencies ◽

Intermediate Allele ◽

Genome Wide ◽

Signature Of Selection ◽

Basic Objective ◽

Genomic Regions

Resolving the role of natural selection is a basic objective of evolutionary biology. It is generally difficult to detect the influence of selection because ubiquitous non-selective stochastic change in allele frequencies (genetic drift) degrades evidence of selection. As a result, selection scans typically only identify genomic regions that have undergone episodes of intense selection. Yet it seems likely such episodes are the exception; the norm is more likely to involve subtle, concurrent selective changes at a large number of loci. We develop a new theoretical approach that uncovers a previously undocumented genome-wide signature of selection in the collective divergence of allele frequencies over time. Applying our approach to temporally-resolved allele frequency measurements from laboratory and wild Drosophila populations, we quantify the selective contribution to allele frequency divergence and find that selection has substantial effects on much of the genome. We further quantify the magnitude of the total selection coefficient (a measure of the combined effects of direct and linked selection) at a typical polymorphic locus, and find this to be large (of order 1%) even though most mutations are not directly under selection. We find that selective allele frequency divergence is substantial at intermediate allele frequencies, which we argue is most parsimoniously explained by positive --- not purifying --- selection. Thus, in these populations most mutations are far from evolving neutrally in the short term (tens of generations), including mutations with neutral fitness effects, and the result cannot be explained simply as a purging of deleterious mutations.

Download Full-text

Accurate allele frequencies from ultra-low coverage pool-seq samples in evolve-and-resequence experiments

10.1101/244004 ◽

2018 ◽

Author(s):

Susanne Tilk ◽

Alan Bergland ◽

Aaron Goodman ◽

Paul Schmidt ◽

Dmitri Petrov ◽

...

Keyword(s):

Allele Frequency ◽

Model Organism ◽

Software Tool ◽

Allele Frequencies ◽

Model Organisms ◽

Sequencing Data ◽

High Coverage ◽

Next Generation Sequencing Technology ◽

Low Coverage ◽

Pooled Samples

AbstractEvolve-and-resequence (E+R) experiments leverage next-generation sequencing technology to track the allele frequency dynamics of populations as they evolve. While previous work has shown that adaptive alleles can be detected by comparing frequency trajectories from many replicate populations, this power comes at the expense of high-coverage (>100x) sequencing of many pooled samples, which can be cost-prohibitive. Here, we show that accurate estimates of allele frequencies can be achieved with very shallow sequencing depths (<5x) via inference of known founder haplotypes in small genomic windows. This technique can be used to efficiently estimate frequencies for any number of bi-allelic SNPs in populations of any model organism founded with sequenced homozygous strains. Using both experimentally-pooled and simulated samples of Drosophila melanogaster, we show that haplotype inference can improve allele frequency accuracy by orders of magnitude for up to 50 generations of recombination, and is robust to moderate levels of missing data, as well as different selection regimes. Finally, we show that a simple linear model generated from these simulations can predict the accuracy of haplotype-derived allele frequencies in other model organisms and experimental designs. To make these results broadly accessible for use in E+R experiments, we introduce HAF-pipe, an open-source software tool for calculating haplotype-derived allele frequencies from raw sequencing data. Ultimately, by reducing sequencing costs without sacrificing accuracy, our method facilitates E+R designs with higher replication and resolution, and thereby, increased power to detect adaptive alleles.

Download Full-text

Mutation Rate Model Used in the DNA VIEW Program

Applied Sciences ◽

10.3390/app10103585 ◽

2020 ◽

Vol 10 (10) ◽

pp. 3585 ◽

Cited By ~ 1

Author(s):

Tomasz Krajka

Keyword(s):

Computer Simulation ◽

Allele Frequency ◽

Mutation Rate ◽

Allele Frequencies ◽

Rate Model ◽

Dna Database ◽

Mutation Model ◽

Population Allele Frequency ◽

Frequency Changes

The first problem considered in this paper is the problem of correctness of a mutation model used in the DNA VIEW program. To this end, we theoretically predict population allele frequency changes in time according to this and similar models (we determine the limit frequencies of alleles—they are uniformly distributed). Furthermore, we evaluate the speed of the above changes using computer simulation applied to our DNA database. Comparing uniformly distributed allele frequencies with these existing in the population (for example, using entropy), we conclude that this mutation model is not correct. The evolution does not follow this direction (direction of uniformly distributed frequencies). The second problem relates to the determination of the extent to which an incorrect mutation model can disturb DNA VIEW program results. We show that in typical computations (simple paternity testing without maternal mutation) this influence is negligible, but in the case of maternal mutation, this should be taken into account. Furthermore, we show that this model is inconsistent from a theoretical viewpoint. Equivalent methods result in different error levels.

Download Full-text