scholarly journals A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots

2016 ◽  
Author(s):  
Daniel J Lawson ◽  
Lucy van Dorp ◽  
Daniel Falush

AbstractGenetic clustering algorithms, implemented in popular programs such as STRUCTURE and ADMIXTURE, have been used extensively in the characterisation of individuals and populations based on genetic data. A successful example is the reconstruction of the genetic history of African Americans who are a product of recent admixture between highly differentiated populations. Histories can also be reconstructed using the same procedure for groups which do not have admixture in their recent history, where recent genetic drift is strong or that deviate in other ways from the underlying inference model. Unfortunately, such histories can be misleading. We have implemented an approach (badMIXTURE, available at github.com/danjlawson/badMIXTURE) to assess the goodness of fit of the model using the ancestry “palettes” estimated by CHROMOPAINTER and apply it to both simulated data and real case studies. Combining these complementary analyses with additional methods that are designed to test specific hypotheses allows a richer and more robust analysis of recent demographic history based on genetic data.

2020 ◽  
Vol 37 (7) ◽  
pp. 2124-2136
Author(s):  
Paul D Blischak ◽  
Michael S Barker ◽  
Ryan N Gutenkunst

Abstract Demographic inference using the site frequency spectrum (SFS) is a common way to understand historical events affecting genetic variation. However, most methods for estimating demography from the SFS assume random mating within populations, precluding these types of analyses in inbred populations. To address this issue, we developed a model for the expected SFS that includes inbreeding by parameterizing individual genotypes using beta-binomial distributions. We then take the convolution of these genotype probabilities to calculate the expected frequency of biallelic variants in the population. Using simulations, we evaluated the model’s ability to coestimate demography and inbreeding using one- and two-population models across a range of inbreeding levels. We also applied our method to two empirical examples, American pumas (Puma concolor) and domesticated cabbage (Brassica oleracea var. capitata), inferring models both with and without inbreeding to compare parameter estimates and model fit. Our simulations showed that we are able to accurately coestimate demographic parameters and inbreeding even for highly inbred populations (F = 0.9). In contrast, failing to include inbreeding generally resulted in inaccurate parameter estimates in simulated data and led to poor model fit in our empirical analyses. These results show that inbreeding can have a strong effect on demographic inference, a pattern that was especially noticeable for parameters involving changes in population size. Given the importance of these estimates for informing practices in conservation, agriculture, and elsewhere, our method provides an important advancement for accurately estimating the demographic histories of these species.


Heredity ◽  
2021 ◽  
Author(s):  
Armando Arredondo ◽  
Beatriz Mourato ◽  
Khoa Nguyen ◽  
Simon Boitard ◽  
Willy Rodríguez ◽  
...  

AbstractInferring the demographic history of species is one of the greatest challenges in populations genetics. This history is often represented as a history of size changes, ignoring population structure. Alternatively, when structure is assumed, it is defined a priori as a population tree and not inferred. Here we propose a framework based on the IICR (Inverse Instantaneous Coalescence Rate). The IICR can be estimated for a single diploid individual using the PSMC method of Li and Durbin (2011). For an isolated panmictic population, the IICR matches the population size history, and this is how the PSMC outputs are generally interpreted. However, it is increasingly acknowledged that the IICR is a function of the demographic model and sampling scheme with limited connection to population size changes. Our method fits observed IICR curves of diploid individuals with IICR curves obtained under piecewise stationary symmetrical island models. In our models we assume a fixed number of time periods during which gene flow is constant, but gene flow is allowed to change between time periods. We infer the number of islands, their sizes, the periods at which connectivity changes and the corresponding rates of connectivity. Validation with simulated data showed that the method can accurately recover most of the scenario parameters. Our application to a set of five human PSMCs yielded demographic histories that are in agreement with previous studies using similar methods and with recent research suggesting ancient human structure. They are in contrast with the view of human evolution consisting of one ancestral population branching into three large continental and panmictic populations with varying degrees of connectivity and no population structure within each continent.


2019 ◽  
Author(s):  
Paul D. Blischak ◽  
Michael S. Barker ◽  
Ryan N. Gutenkunst

AbstractDemographic inference using the site frequency spectrum (SFS) is a common way to understand historical events affecting genetic variation. However, most methods for estimating demography from the SFS assume random mating within populations, precluding these types of analyses in inbred populations. To address this issue, we developed a model for the expected SFS that includes inbreeding by parameterizing individual genotypes using beta-binomial distributions. We then take the convolution of these genotype probabilities to calculate the expected frequency of biallelic variants in the population. Using simulations, we evaluated the model’s ability to co-estimate demography and inbreeding using one- and two-population models across a range of inbreeding levels. We also applied our method to two empirical examples, American pumas (Puma concolor) and domesticated cabbage (Brassica oleracea var. capitata), inferring models both with and without inbreeding to compare parameter estimates and model fit. Our simulations showed that we are able to accurately co-estimate demographic parameters and inbreeding even for highly inbred populations (F = 0.9). In contrast, failing to include inbreeding generally resulted in inaccurate parameter estimates in simulated data and led to poor model fit in our empirical analyses. These results show that inbreeding can have a strong effect on demographic inference, a pattern that was especially noticeable for parameters involving changes in population size. Given the importance of these estimates for informing practices in conservation, agriculture, and elsewhere, our method provides an important advancement for accurately estimating the demographic histories of these species.


2020 ◽  
Author(s):  
Armando Arredondo ◽  
Beatriz Mourato ◽  
Khoa Nguyen ◽  
Simon Boitard ◽  
Willy Rodríguez ◽  
...  

AbstractInferring the demographic history of species is one of the greatest challenges in populations genetics. This history is often represented as a history of size changes, thus ignoring population structure. Alternatively, structure is defined a priori as a population tree and not inferred. Here we propose a framework based on the IICR (Inverse Instantaneous Coalescence Rate), which can be estimated using the PSMC method of Li and Durbin (2011) for a single diploid individual. For an isolated population, the IICR matches the population size history, which is how the PSMC outputs are generally interpreted. However, it is increasingly acknowledged that the IICR is a function of the demographic model and sampling scheme. Our automated method fits observed IICR curves of diploid individuals with IICR curves obtained under piecewise-stationary symmetrical island models, in which we assume a fixed number of time periods during which gene flow is constant. We infer the number of islands, their sizes, the periods at which connectivity changes and the corresponding rates of connectivity. Validation with simulated data showed that the method can accurately recover most of the scenario parameters. Our application to a set of five human PSMCs yielded demographic histories that are in agreement with previous studies using similar methods and with recent research suggesting ancient human structure. They are in contrast with the widely accepted view of human evolution consisting of one ancestral population branching into three large continental and panmictic populations with varying degrees of connectivity and no population structure within each continent.


Genetics ◽  
2003 ◽  
Vol 164 (4) ◽  
pp. 1511-1518 ◽  
Author(s):  
Ning Yu ◽  
Michael I Jensen-Seaman ◽  
Leona Chemnick ◽  
Judith R Kidd ◽  
Amos S Deinard ◽  
...  

Abstract Comparison of the levels of nucleotide diversity in humans and apes may provide much insight into the mechanisms of maintenance of DNA polymorphism and the demographic history of these organisms. In the past, abundant mitochondrial DNA (mtDNA) polymorphism data indicated that nucleotide diversity (π) is more than threefold higher in chimpanzees than in humans. Furthermore, it has recently been claimed, on the basis of limited data, that this is also true for nuclear DNA. In this study we sequenced 50 noncoding, nonrepetitive DNA segments randomly chosen from the nuclear genome in 9 bonobos and 17 chimpanzees. Surprisingly, the π value for bonobos is only 0.078%, even somewhat lower than that (0.088%) for humans for the same 50 segments. The π values are 0.092, 0.130, and 0.082% for East, Central, and West African chimpanzees, respectively, and 0.132% for all chimpanzees. These values are similar to or at most only 1.5 times higher than that for humans. The much larger difference in mtDNA diversity than in nuclear DNA diversity between humans and chimpanzees is puzzling. We speculate that it is due mainly to a reduction in effective population size (Ne) in the human lineage after the human-chimpanzee divergence, because a reduction in Ne has a stronger effect on mtDNA diversity than on nuclear DNA diversity.


2021 ◽  
Vol 13 (3) ◽  
Author(s):  
J Roman Arguello ◽  
Stefan Laurent ◽  
Andrew G Clark

Author(s):  
Phillip A. Morin ◽  
Frederick I. Archer ◽  
Catherine D. Avila ◽  
Jennifer R. Balacco ◽  
Yury V. Bukhman ◽  
...  

2008 ◽  
Vol 4 (6) ◽  
pp. 752-754 ◽  
Author(s):  
Emma Svensson ◽  
Anders Götherström

Phylogeography has recently become more abundant in studies of demographic history of both wild and domestic species. A single nucleotide polymorphism (SNP) in the intron of the Y-chromosomal gene UTY19 displays a north–south gradient in modern cattle. Support for this geographical distribution of haplogroups has previously also been seen in ancient cattle from Germany. However, when analysing 38 historic remains of domestic bulls and three aurochs from northern Europe for this SNP we found no such association. Instead, we noted extensive amounts of temporal variation that can be attributed to transportation of cattle and late breed formation.


Sign in / Sign up

Export Citation Format

Share Document