scholarly journals Inference of super-exponential human population growth via efficient computation of the site frequency spectrum for generalized models

2015 ◽  
Author(s):  
Feng Gao ◽  
Alon Keinan

The site frequency spectrum (SFS) and other genetic summary statistics are at the heart of many population genetics studies. Previous studies have shown that human populations had undergone a recent epoch of fast growth in effective population size. These studies assumed that growth is exponential, and the ensuing models leave unexplained excess amount of extremely rare variants. This suggests that human populations might have experienced a recent growth with speed faster than exponential. Recent studies have introduced a generalized growth model where the growth speed can be faster or slower than exponential. However, only simulation approaches were available for obtaining summary statistics under such models. In this study, we provide expressions to accurately and efficiently evaluate the SFS and other summary statistics under generalized models, which we further implement in a publicly available software. Investigating the power to infer deviation of growth from being exponential, we observed that decent sample sizes facilitate accurate inference, e.g. a sample of 3000 individuals with the amount of data expected from exome sequencing allows observing and accurately estimating growth with speed deviating by 10% or more from that of exponential. Applying our inference framework to data from the NHLBI Exome Sequencing Project, we found that a model with a generalized growth epoch fits the observed SFS significantly better than the equivalent model with exponential growth (p-value = 3.85 × 10-6). The estimated growth speed significantly deviates from exponential (p-value << 10-12), with the best-fit estimate being of growth speed 12% faster than exponential.

2017 ◽  
Author(s):  
Berit Lindum Waltoft ◽  
Asger Hobolth

AbstractThe variability in population size is a key quantity for understanding the evolutionary history of a species. We present a new method, CubSFS, for estimating the changes in population size of a panmictic population from the site frequency spectrum. First, we provide a straightforward proof for the expression of the expected site frequency spectrum depending only on the population size. Our derivation is based on an eigenvalue decomposition of the instantaneous coalescent rate matrix. Second, we solve the inverse problem of determining the variability in population size from an observed SFS. Our solution is based on a cubic spline for the population size. The cubic spline is determined by minimizing the weighted average of two terms, namely (i) the goodness of fit to the SFS, and (ii) a penalty term based on the smoothness of the changes. The weight is determined by cross-validation. The new method is validated on simulated demographic histories and applied on data from nine different human populations.


Author(s):  
Daniel L. Hartl

Chapter 7 is an introduction to molecular population genetics that includes the principal concepts of nucleotide polymorphism and divergence, the site frequency spectrum, and tests of selection and their limitations. Highlighted are rates of nucleotide substitution in coding and noncoding DNA, nucleotide and amino acid divergence between species, corrections for multiple substitutions, and the molecular clock. Discussion of the folded and unfolded site frequency spectrum includes the strengths and limitations of Tajima’s D, Fay and Wu’s H, and other measures. The chapter also discusses an emerging consensus to resolve the celebrated selection–neutrality controversy. It also includes examination of demographic history through the use of ancient DNA with special emphasis on the surprising findings in regard to the ancestral makeup of contemporary human populations. Also discussed are the population dynamics of transposable elements in prokaryotes and eukaryotes.


Author(s):  
Berit Lindum Waltoft ◽  
Asger Hobolth

Abstract Changes in population size is a useful quantity for understanding the evolutionary history of a species. Genetic variation within a species can be summarized by the site frequency spectrum (SFS). For a sample of size n, the SFS is a vector of length n − 1 where entry i is the number of sites where the mutant base appears i times and the ancestral base appears n − i times. We present a new method, CubSFS, for estimating the changes in population size of a panmictic population from an observed SFS. First, we provide a straightforward proof for the expression of the expected site frequency spectrum depending only on the population size. Our derivation is based on an eigenvalue decomposition of the instantaneous coalescent rate matrix. Second, we solve the inverse problem of determining the changes in population size from an observed SFS. Our solution is based on a cubic spline for the population size. The cubic spline is determined by minimizing the weighted average of two terms, namely (i) the goodness of fit to the observed SFS, and (ii) a penalty term based on the smoothness of the changes. The weight is determined by cross-validation. The new method is validated on simulated demographic histories and applied on unfolded and folded SFS from 26 different human populations from the 1000 Genomes Project.


2019 ◽  
Author(s):  
Ethan Linck ◽  
C.J. Battey

AbstractCommon models of speciation with gene flow consider constant migration or admixture on secondary contact, but earth’s recent climatic history suggests many populations have experienced cycles of isolation and contact over the last million years. How does this process impact the rate of speciation, and how much can we learn about its dynamics by analyzing the genomes of modern populations? Here we develop a simple model of speciation through Bateson-Dobzhansky-Muller incompatibilities in the face of periodic gene flow and validate our model with forward time simulations. We then use empirical atmospheric CO2 concentration data from the Vostok Ice Cores to simulate cycles of isolation and secondary contact in a tropical montane landscape, and ask whether they can be distinguished from a standard isolation-with-migration model by summary statistics or joint site frequency spectrum-based demographic inference. We find speciation occurs much faster under periodic than constant gene flow with equivalent effective migration rates (Nm). These processes can be distinguished through combinations of summary statistics or demographic inference from the site frequency spectrum, but parameter estimates appear to have little resolution beyond the most recent cycle of isolation and migration. Our results suggest speciation with periodic gene flow is a common force in generating species diversity through Pleistocene climate cycles, and highlight the limits of current inference techniques for demographic models mimicking the complexity of earth’s recent climatic history.


Genetics ◽  
1997 ◽  
Vol 147 (4) ◽  
pp. 1977-1982
Author(s):  
Stephen T Sherry ◽  
Henry C Harpending ◽  
Mark A Batzer ◽  
Mark Stoneking

Abstract There are estimated to be ~1000 members of the Ya5 Alu subfamily of retroposons in humans. This Subfamily has a distribution restricted to humans, with a few copies in gorillas and chimpanzees. Fifty-seven Ya5 elements were previously cloned from a HeLaderived randomly sheared total genomic library, sequenced, and screened for polymorphism in a panel of 120 unrelated humans. Forty-four of the 57 cloned Alu repeats were monomorphic in the sample and 13 Alu repeats were dimorphic for insertion presence/absence. The observed distribution of sample frequencies of the 13 dimorphic elements is consistent with the theoretical expectation for elements ascertained in a single diploid cell line. Coalescence theory is used to compute expected total pedigree branch lengths for monomorphic and dimorphic elements, leading to an estimate of human effective population size of ~18,000 during the last one to two million years.


2001 ◽  
Vol 77 (2) ◽  
pp. 153-166 ◽  
Author(s):  
BRIAN CHARLESWORTH

Formulae for the effective population sizes of autosomal, X-linked, Y-linked and maternally transmitted loci in age-structured populations are developed. The approximations used here predict both asymptotic rates of increase in probabilities of identity, and equilibrium levels of neutral nucleotide site diversity under the infinite-sites model. The applications of the results to the interpretation of data on DNA sequence variation in Drosophila, plant, and human populations are discussed. It is concluded that sex differences in demographic parameters such as adult mortality rates generally have small effects on the relative effective population sizes of loci with different modes of inheritance, whereas differences between the sexes in variance in reproductive success can have major effects, either increasing or reducing the effective population size for X-linked loci relative to autosomal or Y-linked loci. These effects need to be accounted for when trying to understand data on patterns of sequence variation for genes with different transmission modes.


2020 ◽  
Author(s):  
Patrick Sin-Chan ◽  
Nehal Gosalia ◽  
Chuan Gao ◽  
Cristopher V. Van Hout ◽  
Bin Ye ◽  
...  

SUMMARYAging is characterized by degeneration in cellular and organismal functions leading to increased disease susceptibility and death. Although our understanding of aging biology in model systems has increased dramatically, large-scale sequencing studies to understand human aging are now just beginning. We applied exome sequencing and association analyses (ExWAS) to identify age-related variants on 58,470 participants of the DiscovEHR cohort. Linear Mixed Model regression analyses of age at last encounter revealed variants in genes known to be linked with clonal hematopoiesis of indeterminate potential, which are associated with myelodysplastic syndromes, as top signals in our analysis, suggestive of age-related somatic mutation accumulation in hematopoietic cells despite patients lacking clinical diagnoses. In addition to APOE, we identified rare DISP2 rs183775254 (p = 7.40×10−10) and ZYG11A rs74227999 (p = 2.50×10−08) variants that were negatively associated with age in either both sexes combined and females, respectively, which were replicated with directional consistency in two independent cohorts. Epigenetic mapping showed these variants are located within cell-type-specific enhancers, suggestive of important transcriptional regulatory functions. To discover variants associated with extreme age, we performed exome-sequencing on persons of Ashkenazi Jewish descent ascertained for extensive lifespans. Case-Control analyses in 525 Ashkenazi Jews cases (Males ≥ 92 years, Females ≥ 95years) were compared to 482 controls. Our results showed variants in APOE (rs429358, rs6857), and TMTC2 (rs7976168) passed Bonferroni-adjusted p-value, as well as several nominally-associated population-specific variants. Collectively, our Age-ExWAS, the largest performed to date, confirmed and identified previously unreported candidate variants associated with human age.


2018 ◽  
Author(s):  
Christelle Fraïsse ◽  
Camille Roux ◽  
Pierre-Alexandre Gagnaire ◽  
Jonathan Romiguier ◽  
Nicolas Faivre ◽  
...  

AbstractGenome-scale diversity data are increasingly available in a variety of biological systems, and can be used to reconstruct the past evolutionary history of species divergence. However, extracting the full demographic information from these data is not trivial, and requires inferential methods that account for the diversity of coalescent histories throughout the genome. Here, we evaluate the potential and limitations of one such approach. We reexamine a well-known system of mussel sister species, using the joint site frequency spectrum (jSFS) of synonymous mutations computed either from exome capture or RNA-seq, in an Approximate Bayesian Computation (ABC) framework. We first assess the best sampling strategy (number of: individuals, loci, and bins in the jSFS), and show that model selection is robust to variation in the number of individuals and loci. In contrast, different binning choices when summarizing the joint site frequency spectrum, strongly affect the results: including classes of low and high frequency shared polymorphisms can more effectively reveal recent migration events. We then take advantage of the flexibility of ABC to compare more realistic models of speciation, including variation in migration rates through time (i.e. periodic connectivity) and across genes (i.e. genome-wide heterogeneity in migration rates). We show that these models were consistently selected as the most probable, suggesting that mussels have experienced a complex history of gene flow during divergence and that the species boundary is semi-permeable. Our work provides a comprehensive evaluation of ABC demographic inference in mussels based on the coding site frequency spectrum, and supplies guidelines for employing different sequencing techniques and sampling strategies. We emphasize, perhaps surprisingly, that inferences are less limited by the volume of data, than by the way in which they are analyzed.


Sign in / Sign up

Export Citation Format

Share Document