Inferring the Demographic History of Inbred Species from Genome-Wide SNP Frequency Data

Paul D Blischak; Michael S Barker; Ryan N Gutenkunst

doi:10.1093/molbev/msaa042

Inferring the Demographic History of Inbred Species from Genome-Wide SNP Frequency Data

Molecular Biology and Evolution ◽

10.1093/molbev/msaa042 ◽

2020 ◽

Vol 37 (7) ◽

pp. 2124-2136

Author(s):

Paul D Blischak ◽

Michael S Barker ◽

Ryan N Gutenkunst

Keyword(s):

Puma Concolor ◽

Demographic History ◽

Random Mating ◽

Simulated Data ◽

Conservation Agriculture ◽

Model Fit ◽

Parameter Estimates ◽

History Of ◽

Demographic Inference ◽

Inbred Populations

Abstract Demographic inference using the site frequency spectrum (SFS) is a common way to understand historical events affecting genetic variation. However, most methods for estimating demography from the SFS assume random mating within populations, precluding these types of analyses in inbred populations. To address this issue, we developed a model for the expected SFS that includes inbreeding by parameterizing individual genotypes using beta-binomial distributions. We then take the convolution of these genotype probabilities to calculate the expected frequency of biallelic variants in the population. Using simulations, we evaluated the model’s ability to coestimate demography and inbreeding using one- and two-population models across a range of inbreeding levels. We also applied our method to two empirical examples, American pumas (Puma concolor) and domesticated cabbage (Brassica oleracea var. capitata), inferring models both with and without inbreeding to compare parameter estimates and model fit. Our simulations showed that we are able to accurately coestimate demographic parameters and inbreeding even for highly inbred populations (F = 0.9). In contrast, failing to include inbreeding generally resulted in inaccurate parameter estimates in simulated data and led to poor model fit in our empirical analyses. These results show that inbreeding can have a strong effect on demographic inference, a pattern that was especially noticeable for parameters involving changes in population size. Given the importance of these estimates for informing practices in conservation, agriculture, and elsewhere, our method provides an important advancement for accurately estimating the demographic histories of these species.

Download Full-text

Inferring the Demographic History of Inbred Species From Genome-Wide SNP Frequency Data

10.1101/2019.12.20.881474 ◽

2019 ◽

Author(s):

Paul D. Blischak ◽

Michael S. Barker ◽

Ryan N. Gutenkunst

Keyword(s):

Puma Concolor ◽

Demographic History ◽

Random Mating ◽

Simulated Data ◽

Conservation Agriculture ◽

Model Fit ◽

Parameter Estimates ◽

History Of ◽

Demographic Inference ◽

Inbred Populations

AbstractDemographic inference using the site frequency spectrum (SFS) is a common way to understand historical events affecting genetic variation. However, most methods for estimating demography from the SFS assume random mating within populations, precluding these types of analyses in inbred populations. To address this issue, we developed a model for the expected SFS that includes inbreeding by parameterizing individual genotypes using beta-binomial distributions. We then take the convolution of these genotype probabilities to calculate the expected frequency of biallelic variants in the population. Using simulations, we evaluated the model’s ability to co-estimate demography and inbreeding using one- and two-population models across a range of inbreeding levels. We also applied our method to two empirical examples, American pumas (Puma concolor) and domesticated cabbage (Brassica oleracea var. capitata), inferring models both with and without inbreeding to compare parameter estimates and model fit. Our simulations showed that we are able to accurately co-estimate demographic parameters and inbreeding even for highly inbred populations (F = 0.9). In contrast, failing to include inbreeding generally resulted in inaccurate parameter estimates in simulated data and led to poor model fit in our empirical analyses. These results show that inbreeding can have a strong effect on demographic inference, a pattern that was especially noticeable for parameters involving changes in population size. Given the importance of these estimates for informing practices in conservation, agriculture, and elsewhere, our method provides an important advancement for accurately estimating the demographic histories of these species.

Download Full-text

Inferring the Joint Demographic History of Multiple Populations: Beyond the Diffusion Approximation

10.1101/103275 ◽

2017 ◽

Cited By ~ 2

Author(s):

Julien Jouganous ◽

Will Long ◽

Simon Gravel

Keyword(s):

Diffusion Approximation ◽

Sequence Data ◽

Demographic History ◽

Allele Frequencies ◽

Human Sequence ◽

Medical Study ◽

Joint Frequency ◽

Classical Models ◽

History Of ◽

Demographic Inference

AbstractUnderstanding variation in allele frequencies across populations is a central goal of population genetics. Classical models for the distribution of allele frequencies, using forward simulation, coalescent theory, or the diffusion approximation, have been applied extensively for demographic inference, medical study design, and evolutionary studies. Here we propose a tractable model of ordinary differential equations for the evolution of allele frequencies that is closely related to the diffusion approximation but avoids many of its limitations and approximations. We show that the approach is typically faster, more numerically stable, and more easily generalizable than the state-of-the-art software implementation of the diffusion approximation. We present a number of applications to human sequence data, including demographic inference with a five-population joint frequency spectrum and a discussion of the transferability of demographic histories across populations.

Download Full-text

Blockwise Site Frequency Spectra for Inferring Complex Population Histories and Recombination

10.1101/077958 ◽

2016 ◽

Cited By ~ 3

Author(s):

Champak R. Beeravolu ◽

Michael J. Hickerson ◽

Laurent A.F. Frantz ◽

Konrad Lohse

Keyword(s):

Demographic History ◽

Composite Likelihood ◽

Model Organisms ◽

Secondary Contact ◽

Parameter Estimates ◽

Genome Sequences ◽

Frequency Spectra ◽

Genome Wide ◽

Complex Population ◽

History Of

AbstractWe introduce ABLE (Approximate Blockwise Likelihood Estimation), a novel composite likelihood framework based on a recently introduced summary of sequence variation: the blockwise site frequency spectrum (bSFS). This simulation-based framework uses the the frequencies of bSFS configurations to jointly model demographic history and recombination and is explicitly designed to make inference using multiple whole genomes or genome-wide multi-locus data (e.g. RADSeq) catering to the needs of researchers studying model or non-model organisms respectively. The flexible nature of our method further allows for arbitrarily complex population histories using unphased and unpolarized whole genome sequences. In silico experiments demonstrate accurate parameter estimates across a range of divergence models with increasing complexity, and as a proof of principle, we infer the demographic history of the two species of orangutan from multiple genome sequences (over 160 Mbp in length) from each species. Our results indicate that the two orangutan species split approximately 650-950 thousand years ago but experienced a pulse of secondary contact much more recently, most likely during a period of low sea-level South East Asia (∼300,000 years ago). Unlike previous analyses we can reject a history of continuous gene flow and co-estimate genome-wide recombination. ABLE is available for download at https://github.com/champost/ABLE.

Download Full-text

Inferring number of populations and changes in connectivity under the n-island model

Heredity ◽

10.1038/s41437-021-00426-9 ◽

2021 ◽

Author(s):

Armando Arredondo ◽

Beatriz Mourato ◽

Khoa Nguyen ◽

Simon Boitard ◽

Willy Rodríguez ◽

...

Keyword(s):

Population Structure ◽

Gene Flow ◽

Population Size ◽

Demographic History ◽

Simulated Data ◽

Fixed Number ◽

Ancestral Population ◽

Demographic Model ◽

History Of ◽

Time Periods

AbstractInferring the demographic history of species is one of the greatest challenges in populations genetics. This history is often represented as a history of size changes, ignoring population structure. Alternatively, when structure is assumed, it is defined a priori as a population tree and not inferred. Here we propose a framework based on the IICR (Inverse Instantaneous Coalescence Rate). The IICR can be estimated for a single diploid individual using the PSMC method of Li and Durbin (2011). For an isolated panmictic population, the IICR matches the population size history, and this is how the PSMC outputs are generally interpreted. However, it is increasingly acknowledged that the IICR is a function of the demographic model and sampling scheme with limited connection to population size changes. Our method fits observed IICR curves of diploid individuals with IICR curves obtained under piecewise stationary symmetrical island models. In our models we assume a fixed number of time periods during which gene flow is constant, but gene flow is allowed to change between time periods. We infer the number of islands, their sizes, the periods at which connectivity changes and the corresponding rates of connectivity. Validation with simulated data showed that the method can accurately recover most of the scenario parameters. Our application to a set of five human PSMCs yielded demographic histories that are in agreement with previous studies using similar methods and with recent research suggesting ancient human structure. They are in contrast with the view of human evolution consisting of one ancestral population branching into three large continental and panmictic populations with varying degrees of connectivity and no population structure within each continent.

Download Full-text

A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots

10.1101/066431 ◽

2016 ◽

Cited By ~ 26

Author(s):

Daniel J Lawson ◽

Lucy van Dorp ◽

Daniel Falush

Keyword(s):

Goodness Of Fit ◽

Demographic History ◽

Clustering Algorithms ◽

Simulated Data ◽

Genetic Data ◽

Inference Model ◽

Robust Analysis ◽

Genetic History ◽

History Of ◽

Genetic Clustering

AbstractGenetic clustering algorithms, implemented in popular programs such as STRUCTURE and ADMIXTURE, have been used extensively in the characterisation of individuals and populations based on genetic data. A successful example is the reconstruction of the genetic history of African Americans who are a product of recent admixture between highly differentiated populations. Histories can also be reconstructed using the same procedure for groups which do not have admixture in their recent history, where recent genetic drift is strong or that deviate in other ways from the underlying inference model. Unfortunately, such histories can be misleading. We have implemented an approach (badMIXTURE, available at github.com/danjlawson/badMIXTURE) to assess the goodness of fit of the model using the ancestry “palettes” estimated by CHROMOPAINTER and apply it to both simulated data and real case studies. Combining these complementary analyses with additional methods that are designed to test specific hypotheses allows a richer and more robust analysis of recent demographic history based on genetic data.

Download Full-text

Phylogeographic Inference Using Approximate Likelihoods

10.1101/025353 ◽

2015 ◽

Cited By ~ 3

Author(s):

Brian C O'Meara ◽

Nathan D Jackson ◽

Ariadna E Morales-Garcia ◽

Bryan C Carstens

Keyword(s):

Gene Flow ◽

Demographic History ◽

A Priori ◽

Evolutionary Process ◽

Population Divergence ◽

Parameter Estimates ◽

Single Model ◽

History Of ◽

Tree Methods

The demographic history of most species is complex, with multiple evolutionary processes combining to shape the observed patterns of genetic diversity. To infer this history, the discipline of phylogeography has (to date) used models that simplify the historical demography of the focal organism, for example by assuming or ignoring ongoing gene flow between populations or by requiring a priori specification of divergence history. Since no single model incorporates every possible evolutionary process, researchers rely on intuition to choose the models that they use to analyze their data. Here, we develop an approach to circumvent this reliance on intuition. PHRAPL allows users to calculate the probability of a large number of demographic histories given their data, enabling them to identify the optimal model and produce accurate parameter estimates for a given system. Using PHRAPL, we reanalyze data from 19 recent phylogeographic investigations. Results indicate that the optimal models for most datasets parameterize both gene flow and population divergence, and suggest that species tree methods (which do not consider gene flow) are overly simplistic for most phylogeographic systems. These results highlight the importance of phylogeographic model selection, and reinforce the role of phylogeography as a bridge between population genetics and phylogenetics.

Download Full-text

New Interval-Specific Phylodynamic Models Improve Inference of the Geographic History of Disease Outbreaks

10.1101/2021.12.02.21267221 ◽

2021 ◽

Author(s):

Jiansi Gao ◽

Michael R. May ◽

Bruce Rannala ◽

Brian R. Moore

Keyword(s):

Temporal Dynamics ◽

Disease Outbreaks ◽

Simulated Data ◽

Study Data ◽

Parameter Estimates ◽

Intervention Measures ◽

History Of ◽

Specific Variation ◽

Key Aspects ◽

Dispersal Events

Phylodynamic methods reveal the spatial and temporal dynamics of viral geographic spread, and have featured prominently in studies of the COVID-19 pandemic. Virtually all previous studies are based on phylodynamic models that assume—despite direct and compelling evidence to the contrary—that rates of viral geographic dispersal are constant through time. Here, we: (1) extend phylodynamic models to allow both the average and relative rates of viral dispersal to vary independently between pre-specified time intervals; (2) implement methods to infer the number and timing of viral dispersal events between areas; and (3) develop statistics to assess the absolute fit of phylodynamic models to empirical datasets. We first validate our new methods using analyses of simulated data, and then apply them to a SARS-CoV-2 dataset from the early phase of the COVID-19 pandemic. We show that: (1) under simulation, failure to accommodate interval-specific variation in the study data will severely bias parameter estimates; (2) in practice, our interval-specific phylodynamic models can significantly improve the relative and absolute fit to empirical data; and (3) the increased realism of our interval-specific phylodynamic models provides qualitatively different inferences regarding key aspects of the COVID-19 pandemic—revealing significant temporal variation in global viral dispersal rates, viral dispersal routes, and number of viral dispersal events between areas—and alters interpretations regarding the efficacy of intervention measures to mitigate the pandemic.

Download Full-text

Inferring number of populations and changes in connectivity under the n-island model

10.1101/2020.09.03.282251 ◽

2020 ◽

Author(s):

Armando Arredondo ◽

Beatriz Mourato ◽

Khoa Nguyen ◽

Simon Boitard ◽

Willy Rodríguez ◽

...

Keyword(s):

Population Structure ◽

Demographic History ◽

A Priori ◽

Simulated Data ◽

Fixed Number ◽

Ancestral Population ◽

Demographic Model ◽

Automated Method ◽

History Of ◽

Diploid Individual

AbstractInferring the demographic history of species is one of the greatest challenges in populations genetics. This history is often represented as a history of size changes, thus ignoring population structure. Alternatively, structure is defined a priori as a population tree and not inferred. Here we propose a framework based on the IICR (Inverse Instantaneous Coalescence Rate), which can be estimated using the PSMC method of Li and Durbin (2011) for a single diploid individual. For an isolated population, the IICR matches the population size history, which is how the PSMC outputs are generally interpreted. However, it is increasingly acknowledged that the IICR is a function of the demographic model and sampling scheme. Our automated method fits observed IICR curves of diploid individuals with IICR curves obtained under piecewise-stationary symmetrical island models, in which we assume a fixed number of time periods during which gene flow is constant. We infer the number of islands, their sizes, the periods at which connectivity changes and the corresponding rates of connectivity. Validation with simulated data showed that the method can accurately recover most of the scenario parameters. Our application to a set of five human PSMCs yielded demographic histories that are in agreement with previous studies using similar methods and with recent research suggesting ancient human structure. They are in contrast with the widely accepted view of human evolution consisting of one ancestral population branching into three large continental and panmictic populations with varying degrees of connectivity and no population structure within each continent.

Download Full-text

The demographic history of African Drosophila melanogaster

10.1101/340406 ◽

2018 ◽

Cited By ~ 1

Author(s):

Adamandia Kapopoulou ◽

Susanne P. Pfeifer ◽

Jeffrey D. Jensen ◽

Stefan Laurent

Keyword(s):

Drosophila Melanogaster ◽

Demographic History ◽

Research Question ◽

Model Organism ◽

Quantitative Model ◽

Population History ◽

Parameter Estimates ◽

Demographic Model ◽

African Populations ◽

History Of

ABSTRACTAs one of the most commonly utilized organisms in the study of local adaptation, an accurate characterization of the demographic history of Drosophila melanogaster remains as an important research question. This owes both to the inherent interest in characterizing the population history of this model organism, as well as to the well-established importance of an accurate null demographic model for increasing power and decreasing false positive rates in genomic scans for positive selection. While considerable attention has been afforded to this issue in non-African populations, less is known about the demographic history of African populations, including from the ancestral range of the species. While qualitative predictions and hypotheses have previously been forwarded, we here present a quantitative model fitting of the population history characterizing both the ancestral Zambian population range as well as the subsequently colonized west African populations, which themselves served as the source of multiple non-African colonization events. These parameter estimates thus represent an important null model for future investigations in to African and non-African D. melanogaster populations alike.

Download Full-text

Discovery of runs-of-homozygosity diplotype clusters and their associations with diseases in UK Biobank

10.1101/2020.10.26.20220004 ◽

2020 ◽

Author(s):

Ardalan Naseri ◽

Degui Zhi ◽

Shaojie Zhang

Keyword(s):

Demographic History ◽

Hfe Gene ◽

Uk Biobank ◽

Runs Of Homozygosity ◽

Genome Wide ◽

A Genome ◽

Large Populations ◽

History Of ◽

Outbred Populations ◽

Inbred Populations

AbstractRuns of homozygosity (ROH) segments, contiguous homozygous regions in a genome were traditionally linked to families and inbred populations. However, a growing literature suggests that ROHs are ubiquitous in outbred populations. Still, most existing genetic studies of ROH in populations are limited to aggregated ROH content across the genome, which does not offer the resolution for mapping causal loci. This limitation is mainly due to a lack of methods for efficient identification of shared ROH diplotypes. Here, we present a new method, ROH-DICE, to find large ROH diplotype clusters, sufficiently long ROHs shared by a sufficient number of individuals, in large cohorts. ROH-DICE identified over 1 million ROH diplotypes that span over 100 SNPs and shared by more than 100 UK Biobank participants. Moreover, we found significant associations of clustered ROH diplotypes across the genome with various self-reported diseases, with the strongest associations found between the extended HLA region and autoimmune disorders. We found an association between a diplotype covering the HFE gene and haemochromatosis, even though the well-known causal SNP was not directly genotyped nor imputed. Using genome-wide scan, we identified a putative association between carriers of an ROH diplotype in chromosome 4 and an increase of mortality among COVID-19 patients. In summary, our ROH-DICE method, by calling out large ROH diplotypes in a large outbred population, enables further population genetics into the demographic history of large populations. More importantly, our method enables a new genome-wide mapping approach for finding disease-causing loci with multi-marker recessive effects at population scale.

Download Full-text