scholarly journals The hitchhiking effect on the site frequency spectrum of DNA polymorphisms.

Genetics ◽  
1995 ◽  
Vol 140 (2) ◽  
pp. 783-796 ◽  
Author(s):  
J M Braverman ◽  
R R Hudson ◽  
N L Kaplan ◽  
C H Langley ◽  
W Stephan

Abstract The level of DNA sequence variation is reduced in regions of the Drosophila melanogaster genome where the rate of crossing over per physical distance is also reduced. This observation has been interpreted as support for the simple model of genetic hitchhiking, in which directional selection on rare variants, e.g., newly arising advantageous mutants, sweeps linked neutral alleles to fixation, thus eliminating polymorphisms near the selected site. However, the frequency spectra of segregating sites of several loci from some populations exhibiting reduced levels of nucleotide diversity and reduced numbers of segregating sites did not appear different from what would be expected under a neutral equilibrium model. Specifically, a skew toward an excess of rare sites was not observed in these samples, as measured by Tajima's D. Because this skew was predicted by a simple hitchhiking model, yet it had never been expressed quantitatively and compared directly to DNA polymorphism data, this paper investigates the hitchhiking effect on the site frequency spectrum, as measured by Tajima's D and several other statistics, using a computer simulation model based on the coalescent process and recurrent hitchhiking events. The results presented here demonstrate that under the simple hitchhiking model (1) the expected value of Tajima's D is large and negative (indicating a skew toward rare variants), (2) that Tajima's test has reasonable power to detect a skew in the frequency spectrum for parameters comparable to those from actual data sets, and (3) that the Tajima's Ds observed in several data sets are very unlikely to have been the result of simple hitchhiking. Consequently, the simple hitchhiking model is not a sufficient explanation for the DNA polymorphism at those loci exhibiting a decreased number of segregating sites yet not exhibiting a skew in the frequency spectrum.

2018 ◽  
Author(s):  
Daniel P. Rice ◽  
John Novembre ◽  
Michael M. Desai

AbstractDemographic inference methods in population genetics typically assume that the ancestry of a sample can be modeled by the Kingman coalescent. A defining feature of this stochastic process is that it generates genealogies that are binary trees: no more than two ancestral lineages may coalesce at the same time. However, this assumption breaks down under several scenarios. For example, pervasive natural selection and extreme variation in offspring number can both generate genealogies with “multiple-merger” events in which more than two lineages coalesce instantaneously. Therefore, detecting multiple mergers is important both for understanding which forces have shaped the diversity of a population and for avoiding fitting misspecified models to data. Current methods to detect multiple mergers in genomic data rely on the site frequency spectrum (SFS). However, the signatures of multiple mergers in the SFS are also consistent with a Kingman coalescent with a time-varying population size. Here, we present a new method for detecting multiple mergers based on the pointwise mutual information of the two-site frequency spectrum for pairs of linked sites. Unlike the SFS, the pointwise mutual information depends mostly on the topologies of genealogies rather than on their branch lengths and is therefore largely insensitive to population size change. This statistic is global in the sense that it can detect when the genome-wide genetic diversity is inconsistent with the Kingman coalescent, rather than detecting outlier regions, as in selection scan methods. Finally, we demonstrate a graphical model-checking procedure based on the point-wise mutual information using genomic diversity data from Drosophila melanogaster.


2015 ◽  
Author(s):  
Jochen Blath ◽  
Mathias C Cronjager ◽  
Bjarki Eldon ◽  
Matthias Hammer

We give recursions for the expected site-frequency spectrum associated with so-calledXi-coalescents, that is exchangeable coalescents which admitsimultaneous multiple mergersof ancestral lineages. Xi-coalescents arise, for example, in association with population models of skewed offspring distributions with diploidy, recurrent advantageous mutations, or strong bottlenecks. In contrast, the simplerLambda-coalescentsadmit multiple mergers of lineages, but at most one such merger each time. Xi-coalescents, as well as Lambda-coalescents, can predict an excess of singletons, compared to the Kingman coalescent. We compare estimates of coalescent parameters when Xi-coalescents are applied to data generated by Lambda-coalescents, and vice versa. In general, Xi-coalescents predict fewer singletons than corresponding Lambda-coalescents, but a higher count of mutations of size larger than singletons. We fit examples of Xi-coalescents to unfolded site-frequency spectra obtained for autosomal loci of the diploid Atlantic cod, and obtain different coalescent parameter estimates than obtained with corresponding Lambda-coalescents. Our results provide new inference tools, and suggest that for autosomal population genetic data from diploid or polyploid highly fecund populations who may have skewed offspring distributions, one should not apply Lambda-coalescents, but Xi-coalescents.


2017 ◽  
Author(s):  
A. Klassmann ◽  
L. Ferretti

AbstractThe analysis of patterns of segregating (i.e. polymorphic) sites in aligned sequences is routine in population genetics. Quantities of interest include the total number of segregating sites and the number of sites with mutations of different frequencies, the so-called site frequency spectrum. For neutrally evolving sequences, some classical results are available, including the expected value and variance of the spectrum in the Kingman coalescent model without recombination as calculated by Fu (1995).In this work, we use similar techniques to compute the third moments of the site frequency spectrum without recombination. We also account for the linkage pattern of mutations, yielding the full haplotype spectrum of three polymorphic sites. Based on these results, we derive analytical results for the bias of Tajima’s D and other neutrality tests.As an application, we obtain the second moments of the spectrum of linked sites, which is related to the neutral spectrum of chromosomal inversions and other structural variants. These moments can be used for the normalisation of new neutrality tests relying on these spectra.


Genetics ◽  
1989 ◽  
Vol 123 (3) ◽  
pp. 597-601 ◽  
Author(s):  
F Tajima

Abstract The expected number of segregating sites and the expectation of the average number of nucleotide differences among DNA sequences randomly sampled from a population, which is not in equilibrium, have been developed. The results obtained indicate that, in the case where the population size has changed drastically, the number of segregating sites is influenced by the size of the current population more strongly than is the average number of nucleotide differences, while the average number of nucleotide differences is affected by the size of the original population more severely than is the number of segregating sites. The results also indicate that the average number of nucleotide differences is affected by a population bottleneck more strongly than is the number of segregating sites.


2004 ◽  
Vol 64 (2) ◽  
pp. 201-209 ◽  
Author(s):  
M. A. Batalha ◽  
F. R. Martins

We used Raunkiaer's system to classify in life-forms the vascular plants present in 12 random 25 m² quadrats of a cerrado site. The study area is covered by cerrado sensu stricto and is located in the Valério fragment, at about 22º13'S and 47º51'W, 760 m above sea level, in the Itirapina Ecological and Experimental Station, São Paulo State, southeastern Brazil. The floristic spectrum considers the life-form of each species, while in the frequency spectrum, each species is weighted by its frequency. The vegetation spectrum does not consider the species at all, but only the individuals in each life-form class. In the floristic spectrum, the most represented life-forms were the phanerophytes and the hemicryptophytes, as in other cerrado sites. This spectrum differed significantly from Raunkiaer's normal spectrum, mainly due to under-representation of therophytes and over-representation of phanerophytes. The floristic and frequency spectra were similar, but both differed from the vegetation spectrum. We recommend the floristic spectrum when working at larger scales and a description of the phytoclimate is wanted. The vegetation spectrum is preferable when working at smaller scales and wanting a quantitative description of the physiognomy. The frequency spectrum is not recommended at all.


2017 ◽  
Author(s):  
David Curtis

SummaryPolygenic risk scores obtained as a weighted sum of associated variants can be used to explore association in additional data sets and to assign risk scores to individuals. The methods used to derive polygenic risk scores from common SNPs are not suitable for variants detected in whole exome sequencing studies. Rare variants which may have major effects are seen too infrequently to judge whether they are associated and may not be shared between training and test subjects. A method is proposed whereby variants are weighted according to their frequency, their annotations and to the genes they affect. A weighted sum across all variants provides an individual risk score. Scores constructed in this way are used in a weighted burden test and are shown to be significantly different between schizophrenia cases and controls using a five-way cross validation procedure. This approach represents a first attempt to summarise exome sequence variation into a summary risk score, which could be combined with risk scores from common variants and from environmental factors. It is hoped that the method could be developed further.


2017 ◽  
Author(s):  
Berit Lindum Waltoft ◽  
Asger Hobolth

AbstractThe variability in population size is a key quantity for understanding the evolutionary history of a species. We present a new method, CubSFS, for estimating the changes in population size of a panmictic population from the site frequency spectrum. First, we provide a straightforward proof for the expression of the expected site frequency spectrum depending only on the population size. Our derivation is based on an eigenvalue decomposition of the instantaneous coalescent rate matrix. Second, we solve the inverse problem of determining the variability in population size from an observed SFS. Our solution is based on a cubic spline for the population size. The cubic spline is determined by minimizing the weighted average of two terms, namely (i) the goodness of fit to the SFS, and (ii) a penalty term based on the smoothness of the changes. The weight is determined by cross-validation. The new method is validated on simulated demographic histories and applied on data from nine different human populations.


2018 ◽  
Author(s):  
Christelle Fraïsse ◽  
Camille Roux ◽  
Pierre-Alexandre Gagnaire ◽  
Jonathan Romiguier ◽  
Nicolas Faivre ◽  
...  

AbstractGenome-scale diversity data are increasingly available in a variety of biological systems, and can be used to reconstruct the past evolutionary history of species divergence. However, extracting the full demographic information from these data is not trivial, and requires inferential methods that account for the diversity of coalescent histories throughout the genome. Here, we evaluate the potential and limitations of one such approach. We reexamine a well-known system of mussel sister species, using the joint site frequency spectrum (jSFS) of synonymous mutations computed either from exome capture or RNA-seq, in an Approximate Bayesian Computation (ABC) framework. We first assess the best sampling strategy (number of: individuals, loci, and bins in the jSFS), and show that model selection is robust to variation in the number of individuals and loci. In contrast, different binning choices when summarizing the joint site frequency spectrum, strongly affect the results: including classes of low and high frequency shared polymorphisms can more effectively reveal recent migration events. We then take advantage of the flexibility of ABC to compare more realistic models of speciation, including variation in migration rates through time (i.e. periodic connectivity) and across genes (i.e. genome-wide heterogeneity in migration rates). We show that these models were consistently selected as the most probable, suggesting that mussels have experienced a complex history of gene flow during divergence and that the species boundary is semi-permeable. Our work provides a comprehensive evaluation of ABC demographic inference in mussels based on the coding site frequency spectrum, and supplies guidelines for employing different sequencing techniques and sampling strategies. We emphasize, perhaps surprisingly, that inferences are less limited by the volume of data, than by the way in which they are analyzed.


2020 ◽  
Author(s):  
David Curtis

Rare genetic variants in LDLR, APOB and PCSK9 are known causes of familial hypercholesterolaemia and it is expected that rare variants in other genes will also have effects on hyperlipidaemia risk although such genes remain to be identified. The UK Biobank consists of a sample of 500,000 volunteers and exome sequence data is available for 50,000 of them. 11,490 of these were classified as hyperlipidaemia cases on the basis of having a relevant diagnosis recorded and/or taking lipid-lowering medication while the remaining 38,463 were treated as controls. Variants in each gene were assigned weights according to rarity and predicted impact and overall weighted burden scores were compared between cases and controls, including population principal components as covariates. One biologically plausible gene, HUWE1, produced statistically significant evidence for association after correction for testing 22,028 genes with a signed log10 p value (SLP) of -6.15, suggesting a protective effect of variants in this gene. Other genes with uncorrected p<0.001 are arguably also of interest, including LDLR (SLP=3.67), RBP2 (SLP=3.14), NPFFR1 (SLP=3.02) and ACOT9 (SLP=-3.19). Gene set analysis indicated that rare variants in genes involved in metabolism and energy can influence hyperlipidaemia risk. Overall, the results provide some leads which might be followed up with functional studies and which could be tested in additional data sets as these become available. This research has been conducted using the UK Biobank Resource.


Sign in / Sign up

Export Citation Format

Share Document