On the prospect of identifying adaptive loci in recently bottlenecked populations

Mapping Intimacies ◽

10.1101/009456 ◽

2014 ◽

Author(s):

Yu-Ping Poh ◽

Vera S Domingues ◽

Hopi Hoekstra ◽

Jeffrey Jensen

Keyword(s):

Frequency Spectrum ◽

Population Genetic ◽

Null Model ◽

Large Parameter ◽

Independent Evidence ◽

Standing Variation ◽

In The Wild ◽

Site Frequency Spectrum ◽

Genomic Regions ◽

Large Parameter Space

Identifying adaptively important loci in recently bottlenecked populations?be it natural selection acting on a population following the colonization of novel habitats in the wild, or artificial selection during the domestication of a breed?remains a major challenge. Here we report the results of a simulation study examining the performance of available population-genetic tools for identifying genomic regions under selection. To illustrate our findings, we examined the interplay between selection and demography in two species of Peromyscus mice, for which we have independent evidence of selection acting on phenotype as well as functional evidence identifying the underlying genotype. With this unusual information, we tested whether population-genetic-based approaches could have been utilized to identify the adaptive locus. Contrary to published claims, we conclude that the use of the background site frequency spectrum as a null model is largely ineffective in bottlenecked populations. Results are quantified both for site frequency spectrum and linkage disequilibrium-based predictions, and are found to hold true across a large parameter space that encompasses many species and populations currently under study. These results suggest that the genomic footprint left by selection on both new and standing variation in strongly bottlenecked populations will be difficult, if not impossible, to find using current approaches.

Download Full-text

polyDFEv2.0: Testing for invariance of the distribution of fitness effects within and across species

10.1101/363887 ◽

2018 ◽

Author(s):

Paula Tataru ◽

Thomas Bataillon

Keyword(s):

Likelihood Ratio ◽

Frequency Spectrum ◽

Likelihood Ratio Tests ◽

Fitness Effects ◽

Site Frequency Spectrum ◽

Genomic Regions

AbstractDistributions of fitness effects (DFE) of mutations can be inferred from site frequency spectrum (SFS) data. There is mounting interest to determine whether distinct genomic regions and/or species share a common DFE, or whether evidence exists for differences among them. polyDFEv2.0 fits multiple SFS datasets at once and provides likelihood ratio tests for DFE invariance across datasets. Simulations show that testing for DFE invariance across genomic regions within a species requires models accounting for heterogeneous genealogical histories underlying SFS data in these regions. Not accounting for these heterogeneities will result in the spurious detection of DFE differences.

Download Full-text

Decomposing the site frequency spectrum: the impact of tree topology on neutrality tests

10.1101/042044 ◽

2016 ◽

Cited By ~ 1

Author(s):

Luca Ferretti ◽

Alice Ledda ◽

Thomas Wiehe ◽

Guillaume Achaz ◽

Sebastian E. Ramos-Onsins

Keyword(s):

Frequency Spectrum ◽

Topological Structure ◽

Population Genetic ◽

Extreme Values ◽

Waiting Times ◽

Tree Topology ◽

Neutrality Tests ◽

Expected Values ◽

Site Frequency Spectrum ◽

The Impact

AbstractWe investigate the dependence of the site frequency spectrum (SFS) on the topological structure of genealogical trees. We show that basic population genetic statistics – for instance estimators ofθor neutrality tests such as Tajima’sD– can be decomposed into components of waiting times between coalescent events and of tree topology. Our results clarify the relative impact of the two components on these statistics. We provide a rigorous interpretation of positive or negative values of an important class of neutrality tests in terms of the underlying tree shape. In particular, we show that values of Tajima’sDand Fay and Wu’sHdepend in a direct way on a peculiar measure of tree balance which is mostly determined by the root balance of the tree. We present a new test for selection in the same class as Fay and Wu’sHand discuss its interpretation and power. Finally, we determine the trees corresponding to extreme expected values of these neutrality tests and present formulae for these extreme values as a function of sample size and number of segregating sites.

Download Full-text

The site-frequency spectrum associated with Xi-coalescents

10.1101/025684 ◽

2015 ◽

Cited By ~ 1

Author(s):

Jochen Blath ◽

Mathias C Cronjager ◽

Bjarki Eldon ◽

Matthias Hammer

Keyword(s):

Frequency Spectrum ◽

Population Genetic ◽

Atlantic Cod ◽

Genetic Data ◽

Population Models ◽

Parameter Estimates ◽

Frequency Spectra ◽

Population Genetic Data ◽

Site Frequency Spectrum

We give recursions for the expected site-frequency spectrum associated with so-calledXi-coalescents, that is exchangeable coalescents which admitsimultaneous multiple mergersof ancestral lineages. Xi-coalescents arise, for example, in association with population models of skewed offspring distributions with diploidy, recurrent advantageous mutations, or strong bottlenecks. In contrast, the simplerLambda-coalescentsadmit multiple mergers of lineages, but at most one such merger each time. Xi-coalescents, as well as Lambda-coalescents, can predict an excess of singletons, compared to the Kingman coalescent. We compare estimates of coalescent parameters when Xi-coalescents are applied to data generated by Lambda-coalescents, and vice versa. In general, Xi-coalescents predict fewer singletons than corresponding Lambda-coalescents, but a higher count of mutations of size larger than singletons. We fit examples of Xi-coalescents to unfolded site-frequency spectra obtained for autosomal loci of the diploid Atlantic cod, and obtain different coalescent parameter estimates than obtained with corresponding Lambda-coalescents. Our results provide new inference tools, and suggest that for autosomal population genetic data from diploid or polyploid highly fecund populations who may have skewed offspring distributions, one should not apply Lambda-coalescents, but Xi-coalescents.

Download Full-text

Signatures of introgression across the allele frequency spectrum

10.1101/2020.07.06.189043 ◽

2020 ◽

Author(s):

Simon H. Martin ◽

William Amos

Keyword(s):

Frequency Spectrum ◽

Empirical Data ◽

Low Frequency ◽

Genomic Data ◽

Simple Extension ◽

Allele Sharing ◽

Large Parameter ◽

Low Frequencies ◽

Allele Frequency Spectrum ◽

Large Parameter Space

ABSTRCTThe detection of introgression from genomic data is transforming our view of species and the origins of adaptive variation. Among the most widely used approaches to detect introgression is the so-called ABBA BABA test or D statistic, which identifies excess allele sharing between non-sister taxa. Part of the appeal of D is its simplicity, but this also limits its informativeness, particularly about the timing and direction of introgression. Here we present a simple extension, D frequency spectrum or DFS, in which D is partitioned according to the frequencies of derived alleles. We use simulations over a large parameter space to show how DFS caries information about various factors. In particular, recent introgression reliably leads to a peak in DFS among low-frequency derived alleles, whereas violation of model assumptions can lead to a lack of signal at low-frequencies. We also reanalyse published empirical data from six different animal and plant taxa, and interpret the results in the light of our simulations, showing how DFS provides novel insights. We currently see DFS as a descriptive tool that will augment both simple and sophisticated tests for introgression, but in the future it may be usefully incorporated into probabilistic inference frameworks.

Download Full-text

Inferring the model and onset of natural selection under varying population size from the site frequency spectrum and haplotype structure

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2018.2541 ◽

2019 ◽

Vol 286 (1896) ◽

pp. 20182541 ◽

Cited By ~ 3

Author(s):

Shigeki Nakagome ◽

Richard R. Hudson ◽

Anna Di Rienzo

Keyword(s):

Natural Selection ◽

Population Size ◽

Allele Frequency ◽

Frequency Spectrum ◽

Goodness Of Fit ◽

Haplotype Structure ◽

New Mutation ◽

Standing Variation ◽

Site Frequency Spectrum ◽

Varying Population

A fundamental question about adaptation in a population is the time of onset of the selective pressure acting on beneficial alleles. Inferring this time, in turn, depends on the selection model. We develop a framework of approximate Bayesian computation (ABC) that enables the use of the full site frequency spectrum and haplotype structure to test the goodness-of-fit of selection models and estimate the timing of selection under varying population size scenarios. We show that our method has sufficient power to distinguish natural selection from neutrality even if relatively old selection increased the frequency of a pre-existing allele from 20% to 50% or from 40% to 80%. Our ABC can accurately estimate the time of onset of selection on a new mutation. However, estimates are prone to bias under the standing variation model, possibly due to the uncertainty in the allele frequency at the onset of selection. We further extend our approach to take advantage of ancient DNA data that provides information on the allele frequency path of the beneficial allele. Applying our ABC, including both modern and ancient human DNA data, to four pigmentation alleles in Europeans, we detected selection on standing variants that occurred after the dispersal from Africa even though models of selection on a new mutation were initially supported for two of these alleles without the ancient data.

Download Full-text

Fast and accurate approximation of the joint site frequency spectrum of multiple populations

10.1101/2020.05.01.073213 ◽

2020 ◽

Author(s):

Ethan M. Jewett

Keyword(s):

Genetic Variation ◽

Sample Size ◽

Frequency Spectrum ◽

Dna Sequences ◽

Population Genetic ◽

Accurate Approximation ◽

Computationally Efficient ◽

Multiple Populations ◽

Site Frequency Spectrum ◽

Approximate Formulas

AbstractThe site frequency spectrum (SFS) is a statistic that summarizes the distribution of derived allele frequencies in a sample of DNA sequences. The SFS provides useful information about genetic variation within and among populations and it can used to make population genetic inferences. Methods for computing the SFS based on the diffusion approximation are computationally efficient when computing all terms of the SFS simultaneously and they can handle complicated demographic scenarios. However, in practice it is sometimes only necessary to compute a subset of terms of the SFS, in which case coalescent-based methods can achieve greater computational efficiency. Here, we present simple and accurate approximate formulas for the expected joint SFS for multiple populations connected by migration. Compared with existing exact approaches, our approximate formulas greatly reduce the complexity of computing each entry of the SFS and have simple forms. The computational complexity of our method depends on the index of the entry to be computed, rather than on the sample size, and the accuracy of our approximation improves as the sample size increases.

Download Full-text

Signatures of Introgression across the Allele Frequency Spectrum

Molecular Biology and Evolution ◽

10.1093/molbev/msaa239 ◽

2020 ◽

Author(s):

Simon H Martin ◽

William Amos

Keyword(s):

Frequency Spectrum ◽

Empirical Data ◽

Low Frequency ◽

Genomic Data ◽

Simple Extension ◽

Allele Sharing ◽

Large Parameter ◽

Low Frequencies ◽

Allele Frequency Spectrum ◽

Large Parameter Space

Abstract The detection of introgression from genomic data is transforming our view of species and the origins of adaptive variation. Among the most widely used approaches to detect introgression is the so-called ABBA–BABA test or D-statistic, which identifies excess allele sharing between nonsister taxa. Part of the appeal of D is its simplicity, but this also limits its informativeness, particularly about the timing and direction of introgression. Here we present a simple extension, D frequency spectrum or DFS, in which D is partitioned according to the frequencies of derived alleles. We use simulations over a large parameter space to show how DFS carries information about various factors. In particular, recent introgression reliably leads to a peak in DFS among low-frequency derived alleles, whereas violation of model assumptions can lead to a lack of signal at low frequencies. We also reanalyze published empirical data from six different animal and plant taxa, and interpret the results in the light of our simulations, showing how DFS provides novel insights. We currently see DFS as a descriptive tool that will augment both simple and sophisticated tests for introgression, but in the future it may be usefully incorporated into probabilistic inference frameworks.

Download Full-text

Recurrent bottlenecks in the malaria life cycle obscure signals of positive selection

Parasitology ◽

10.1017/s0031182014000067 ◽

2014 ◽

Vol 142 (S1) ◽

pp. S98-S107 ◽

Cited By ~ 9

Author(s):

HSIAO-HAN CHANG ◽

DANIEL L. HARTL

Keyword(s):

Life Cycle ◽

Positive Selection ◽

Frequency Spectrum ◽

Population Genetic ◽

Vaccine Development ◽

Selective Advantage ◽

Life Cycles ◽

Malaria Parasites ◽

Complex Life Cycles ◽

Site Frequency Spectrum

SUMMARYDetecting signals of selection in the genome of malaria parasites is a key to identify targets for drug and vaccine development. Malaria parasites have a unique life cycle alternating between vector and host organism with a population bottleneck at each transition. These recurrent bottlenecks could influence the patterns of genetic diversity and the power of existing population genetic tools to identify sites under positive selection. We therefore simulated the site-frequency spectrum of a beneficial mutant allele through time under the malaria life cycle. We investigated the power of current population genetic methods to detect positive selection based on the site-frequency spectrum as well as temporal changes in allele frequency. We found that a within-host selective advantage is difficult to detect using these methods. Although a between-host transmission advantage could be detected, the power is decreased when compared with the classical Wright–Fisher (WF) population model. Using an adjusted null site-frequency spectrum that takes the malaria life cycle into account, the power of tests based on the site-frequency spectrum to detect positive selection is greatly improved. Our study demonstrates the importance of considering the life cycle in genetic analysis, especially in parasites with complex life cycles.

Download Full-text

Faculty Opinions recommendation of The site frequency spectrum for general coalescents.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726151440.793522507 ◽

2016 ◽

Author(s):

Alon Keinan

Keyword(s):

Frequency Spectrum ◽

Site Frequency Spectrum

Download Full-text

Effects of Selection at Linked Sites on Patterns of Genetic Variability

Annual Review of Ecology Evolution and Systematics ◽

10.1146/annurev-ecolsys-010621-044528 ◽

2021 ◽

Vol 52 (1) ◽

pp. 177-197

Author(s):

Brian Charlesworth ◽

Jeffrey D. Jensen

Keyword(s):

Genetic Variability ◽

Population Genetic ◽

Selective Sweeps ◽

Recombination Rates ◽

Frequency Distributions ◽

A Genome ◽

Demographic Processes ◽

Dna Sequence Variants ◽

Genomic Regions ◽

Functional Components

Patterns of variation and evolution at a given site in a genome can be strongly influenced by the effects of selection at genetically linked sites. In particular, the recombination rates of genomic regions correlate with their amount of within-population genetic variability, the degree to which the frequency distributions of DNA sequence variants differ from their neutral expectations, and the levels of adaptation of their functional components. We review the major population genetic processes that are thought to lead to these patterns, focusing on their effects on patterns of variability: selective sweeps, background selection, associative overdominance, and Hill–Robertson interference among deleterious mutations. We emphasize the difficulties in distinguishing among the footprints of these processes and disentangling them from the effects of purely demographic factors such as population size changes. We also discuss how interactions between selective and demographic processes can significantly affect patterns of variability within genomes.

Download Full-text