On the importance of being structured: instantaneous coalescence rates and a re-evaluation of human evolution

Mapping Intimacies ◽

10.1101/031062 ◽

2015 ◽

Cited By ~ 2

Author(s):

Olivier Mazet ◽

Willy Rodríguez ◽

Simona Grusea ◽

Simon Boitard ◽

Lounès Chikhi

Keyword(s):

Population Structure ◽

Gene Flow ◽

Population Size ◽

Genomic Data ◽

Middle Pleistocene ◽

Demographic Model ◽

Size Change ◽

Population Genetic Inference ◽

Demographic Inference ◽

Inference Methods

Most species are structured and influenced by processes that either increased or reduced gene flow between populations. However, most population genetic inference methods ignore population structure and reconstruct a history characterized by population size changes under the assumption that species behave as panmictic units. This is potentially problematic since population structure can generate spurious signals of population size change. Moreover, when the model assumed for demographic inference is misspecified, genomic data will likely increase the precision of misleading if not meaningless parameters. In a context of model uncertainty (panmixia \textit{versus} structure) genomic data may thus not necessarily lead to improved statistical inference. We consider two haploid genomes and develop a theory which explains why any demographic model (with or without population size changes) will necessarily be interpreted as a series of changes in population size by inference methods ignoring structure. We introduce a new parameter, the IICR (inverse instantaneous coalescence rate), and show that it is equivalent to a population size only in panmictic models, and mostly misleading for structured models. We argue that this general issue affects all population genetics methods ignoring population structure. We take the PSMC method as an example and show that it infers population size changes that never took place. We apply our approach to human genomic data and find a reduction in gene flow at the start of the Pleistocene, a major increase throughout the Middle-Pleistocene, and an abrupt disconnection preceding the emergence of modern humans.

Download Full-text

The IICR and the non-stationary structured coalescent: demographic inference with arbitrary changes in population structure

10.1101/341750 ◽

2018 ◽

Author(s):

Willy Rodríguez ◽

Olivier Mazet ◽

Simona Grusea ◽

Simon Boitard ◽

Lounès Chikhi

Keyword(s):

Population Structure ◽

Population Size ◽

Environmental Changes ◽

Genomic Data ◽

Size Change ◽

Structured Model ◽

Coalescence Rate ◽

Wide Range ◽

Structured Coalescent ◽

Demographic Inference

AbstractIn the last years, a wide range of methods allowing to reconstruct past population size changes from genome-wide data have been developed. At the same time, there has been an increasing recognition that population structure can generate genetic data similar to those produced under models of population size change. Recently, Mazet et al. (2016) showed that, for any model of population structure, it is always possible to find a panmictic model with a particular function of population size changes, having exactly the same distribution of T2 (the coalescence time for a sample of size two) to that of the structured model. They called this function IICR (Inverse Instantaneous Coalescence Rate) and showed that it does not necessarily correspond to population size changes under non panmictic models. Besides, most of the methods used to analyse data under models of population structure tend to arbitrarily fix that structure and to minimise or neglect population size changes. Here we extend the seminal work of Herbots (1994) on the structured coalescent and propose a new framework, the Non-Stationary Structured Coalescent (NSSC) that incorporates demographic events (changes in gene flow and/or deme sizes) to models of nearly any complexity. We show how to compute the IICR under a wide family of stationary and non-stationary models. As an example we address the question of human and Neanderthal evolution and discuss how the NSSC framework allows to interpret genomic data under this new perspective.Author summaryGenomic data are becoming available for a rapidly increasing number of species, and contain information about their recent evolutionary history. If we wish to understand how they expanded, contracted or admixed as a consequence of recent and ancient environmental changes, we need to develop general inferential methods. Currently, demographic inference is either done assuming that a species is a single panmictic population or using arbitrary structured models. We use the concept of IICR (Inverse of the Instantaneous Coalescence Rate) together with Markov chains theory to develop a general inferential framework which we call the Non-Stationary Structured Coalescent and apply it to explain human and Neanderthal genomic data in a single structured model.

Download Full-text

Inferring number of populations and changes in connectivity under the n-island model

Heredity ◽

10.1038/s41437-021-00426-9 ◽

2021 ◽

Author(s):

Armando Arredondo ◽

Beatriz Mourato ◽

Khoa Nguyen ◽

Simon Boitard ◽

Willy Rodríguez ◽

...

Keyword(s):

Population Structure ◽

Gene Flow ◽

Population Size ◽

Demographic History ◽

Simulated Data ◽

Fixed Number ◽

Ancestral Population ◽

Demographic Model ◽

History Of ◽

Time Periods

AbstractInferring the demographic history of species is one of the greatest challenges in populations genetics. This history is often represented as a history of size changes, ignoring population structure. Alternatively, when structure is assumed, it is defined a priori as a population tree and not inferred. Here we propose a framework based on the IICR (Inverse Instantaneous Coalescence Rate). The IICR can be estimated for a single diploid individual using the PSMC method of Li and Durbin (2011). For an isolated panmictic population, the IICR matches the population size history, and this is how the PSMC outputs are generally interpreted. However, it is increasingly acknowledged that the IICR is a function of the demographic model and sampling scheme with limited connection to population size changes. Our method fits observed IICR curves of diploid individuals with IICR curves obtained under piecewise stationary symmetrical island models. In our models we assume a fixed number of time periods during which gene flow is constant, but gene flow is allowed to change between time periods. We infer the number of islands, their sizes, the periods at which connectivity changes and the corresponding rates of connectivity. Validation with simulated data showed that the method can accurately recover most of the scenario parameters. Our application to a set of five human PSMCs yielded demographic histories that are in agreement with previous studies using similar methods and with recent research suggesting ancient human structure. They are in contrast with the view of human evolution consisting of one ancestral population branching into three large continental and panmictic populations with varying degrees of connectivity and no population structure within each continent.

Download Full-text

A community-maintained standard library of population genetic models

10.1101/2019.12.20.885129 ◽

2019 ◽

Cited By ~ 7

Author(s):

Jeffrey R. Adrion ◽

Christopher B. Cole ◽

Noah Dukler ◽

Jared G. Galloway ◽

Ariella L. Gladstein ◽

...

Keyword(s):

Population Genetic ◽

Genomic Data ◽

Simulation Models ◽

Easy Access ◽

Major Barrier ◽

Simulation Engine ◽

Complex Models ◽

Demographic Inference ◽

Genetic Simulation ◽

Inference Methods

AbstractThe explosion in population genomic data demands ever more complex modes of analysis, and increasingly these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.

Download Full-text

Evaluation of the Romosinuano cattle population structure in Mexico using pedigree analysis

Revista Colombiana de Ciencias Pecuarias ◽

10.17533/udea.rccp.v32n4a05 ◽

2020 ◽

Vol 33 (1) ◽

pp. 44-59

Author(s):

Rafael Núñez-Domínguez ◽

Ricardo E Martínez-Rocha ◽

Jorge A Hidalgo-Moreno ◽

Rodolfo Ramírez-Valverde ◽

José G García-Muñiz

Keyword(s):

Genetic Diversity ◽

Population Structure ◽

Gene Flow ◽

Population Size ◽

Effective Population Size ◽

Genetic Drift ◽

Pedigree Analysis ◽

Effective Population ◽

Genetic Management ◽

Generation Interval

Background: Romosinuano cattle breed in Mexico has endured isolation and it is necessary to characterize it in order to facilitate sustainable genetic management. Objective: To assess the evolution of the structure and genetic diversity of the Romosinuano breed in Mexico, through pedigree analysis. Methods: Pedigree data was obtained from Asociación Mexicana de Criadores de Ganado Romosinuano y Lechero Tropical (AMCROLET). The ENDOG program (4.8 version) was used to analyze two datasets, one that includes upgrading from F1 animals (UP) and the other with only straight-bred cattle (SP). For both datasets, three reference populations were defined: 1998-2003 (RP1), 2004-2009 (RP2), and 2010-2017 (RP3). The pedigree included 3,432 animals in UP and 1,518 in SP. Demographic parameters were: Generation interval (GI), equivalent number of generations (EG), pedigree completeness index (PCI), and gene flow among herds. Genetic parameters were: Inbreeding (F) and average relatedness (AR) coefficients, effective population size (Nec), effective number of founders and ancestors, and number of founder genome equivalents. Results: The GI varied from 6.10 to 6.54 for UP, and from 6.47 to 7.16 yr for SP. The EG of the UP and SP improved >63% from RP1 to RP3. The PCI increased over time. No nucleus or isolated herds were found. For RP3, F and AR reached 2.08 and 5.12% in the UP, and 2.55 and 5.94% in the SP. For RP3, Nec was 57 in the UP and 45 in the SP. Genetic diversity losses were attributed mainly (>66%) to genetic drift, except for RP3 in the SP (44%). Conclusions: A reduction of the genetic diversity has been occurring after the Romosinuano breed association was established in Mexico, and this is mainly due to random loss of genes.Keywords: effective population size; gene flow; genetic diversity; genetic drift; generation interval; inbreeding; pedigree; population structure; probability of gene origin; Romosinuano cattle. Resumen Antecedentes: La raza bovina Romosinuano ha estado prácticamente aislada en México y requiere ser caracterizada para un manejo genético sostenible. Objetivo: Evaluar la evolución de la estructura y diversidad genética de la raza Romosinuano en México, mediante el análisis del pedigrí. Métodos: Los datos genealógicos provinieron de la Asociación Mexicana de Criadores de Ganado Romosinuano y Lechero Tropical (AMCROLET). Los análisis se realizaron con el programa ENDOG (versión 4.8) para dos bases de datos, una que incluyó animales en cruzamiento absorbente (UP) a partir de F1 y la otra con sólo animales puros (SP). Para ambas bases de datos se definieron tres poblaciones de referencia: 1998-2003 (RP1), 2004- 2009 (RP2), y 2010-2017 (RP3). El pedigrí incluyó 3.432 animales en la UP y 1.518 en la SP. Los parámetros demográficos fueron: intervalo generacional (GI), número de generaciones equivalentes (EG), índice de completitud del pedigrí (PCI), y flujo de genes entre hatos. Los parámetros genéticos fueron: coeficientes de consanguinidad (F) y de relación genética aditiva (AR), tamaño efectivo de la población (Nec), número efectivo de fundadores y ancestros, y número equivalente de genomas fundadores. Resultados: El GI varió de 6,10 a 6,54 para la UP, y de 6,47 a 7,16 años para la SP. El EG de la UP y la SP mejoró >63%, de RP1 a RP3. El PCI aumentó a través de los años, pero más para la SP que para la UP. No se encontraron hatos núcleo o aislados. Para RP3, F y AR alcanzaron 2,08 y 5,12% en la UP, y 2,55 y 5,94% en la SP. Para RP3, Nec fue 57 en la UP y 45 en la SP. Más de 66% de las pérdidas en diversidad genética se debieron a deriva genética, excepto para RP3 en la UP (44%). Conclusiones: una reducción de la diversidad genética ha estado ocurriendo después de que se formó la asociación de criadores de ganado Romosinuano en México, y es debida principalmente a pérdidas aleatorias de genes.Palabras clave: consanguinidad; deriva genética; diversidad genética; estructura poblacional; flujo de genes; ganado Romosinuano; intervalo generacional; pedigrí; probabilidad de origen del gen; tamaño efectivo de población. Resumo Antecedentes: A raça bovina Romosinuano tem estado praticamente isolada no México e precisa ser caracterizada para um manejo genético sustentável. Objetivo: Avaliar a evolução da estrutura e diversidade genética da raça Romosinuano no México, através da análise de pedigree. Métodos: Os dados genealógicos vieram da Asociación Mexicana de Criadores de Ganado Romosinuano y Lechero Tropical (AMCROLET). As análises foram feitas com o programa ENDOG (versão 4.8) para duas bases de dados, uma que incluiu animais em cruzamento absorvente (UP) a partir da F1 e a outra base de dados somente com animais puros (SP). Para ambas bases de dados foram definidas três populações de referência: 1998-2003 (RP1), 2004-2009 (RP2) e 2010-2017 (RP3). O pedigree incluiu 3.432 animais na UP e 1.518 na SP. Os parâmetros demográficos foram: intervalo entre gerações (GI), número de gerações equivalentes (EG), índice de completude do pedigree (PCI), e fluxo de genes entre rebanhos. Os parâmetros genéticos foram: coeficiente de consanguinidade (F) e da relação genética aditiva (AR), tamanho efetivo da população (Nec), número efetivo de fundadores e ancestrais, e número equivalente de genomas fundadores. Resultados: O GI variou de 6,10 a 6,54 para a UP, e de 6,47 a 7,16 anos para a SP. EG da UP e a SP melhorou >63%, de RP1 a RP3. O PCI aumentou ao longo dos anos, mas mais para a SP do que para o UP. Não se encontraram rebanhos núcleo ou isolados. Para RP3, F e AR alcançaram 2,08 e 5,12% na UP, e 2,55 e 5,94% na SP. Para RP3, Nec foi 57 na UP e 45 na SP. Mais de 66% das perdas em diversidade genética foram ocasionadas pela deriva genética, exceto para RP3 no UP (44%). Conclusões: Depois que a associação da raça Romosinuano foi estabelecida no México, tem ocorrido uma redução da diversidade genética, principalmente devido a perdas aleatórias de genes.Palavras-chave: consanguinidade; deriva genética; diversidade genética, estrutura populacional; fluxo de genes; intervalo entre gerações; pedigree; probabilidade de origem do gene; Romosinuano; tamanho efetivo da população.

Download Full-text

Population structure, gene flow, and historical demography of a small coastal shark (Carcharhinus isodon) in US waters of the Western Atlantic Ocean

ICES Journal of Marine Science ◽

10.1093/icesjms/fsw098 ◽

2016 ◽

Vol 73 (9) ◽

pp. 2322-2332 ◽

Cited By ~ 12

Author(s):

David S. Portnoy ◽

Christopher M. Hollenbeck ◽

Dana M. Bethea ◽

Bryan S. Frazier ◽

Jim Gelsleichter ◽

...

Keyword(s):

Population Structure ◽

Gene Flow ◽

Significant Heterogeneity ◽

Demographic Model ◽

Western Atlantic ◽

Nursery Areas ◽

Seasonal Movement ◽

Southeastern Us ◽

Mtdna Haplotypes ◽

Approximate Bayesian

AbstractPatterns of population structure, genetic demographics, and gene flow in the small coastal shark Carcharhinus isodon (finetooth shark) sampled from two discrete nurseries along the southeastern US coast (Atlantic) and three nurseries in the northern Gulf of Mexico (Gulf), were assessed using 16 nuclear-encoded microsatellites and 1077 base pairs of the mitochondrial DNA (mtDNA) control region. Significant heterogeneity in microsatellite allele distributions was detected among all localities except between the two in the Atlantic. Significant heterogeneity in mtDNA haplotypes was not detected, a result likely due to extremely low mtDNA diversity. The genetic discontinuities combined with seasonal movement patterns, a patchy distribution of appropriate nursery habitat, the apparent absence of sex-biased gene flow, and the occurrence of mating in the vicinity of nursery areas, suggest that both male and female finetooth sharks display regional philopatry to discrete nursery areas. Global and local tests of neutrality, using mtDNA haplotypes, and demographic model testing, using Approximate Bayesian Computation of microsatellite alleles, supported a range-wide expansion of finetooth sharks into US waters occurring less than ∼9000 years ago. These findings add to the growing number of studies in a variety of coastally distributed marine fishes documenting significant barriers to gene flow around peninsular Florida and in the eastern Gulf. The findings also provide further evidence that the traditional model of behavioural ecology, based on large coastal sharks, may not be appropriate for understanding and conserving small coastal sharks.

Download Full-text

A community-maintained standard library of population genetic models

eLife ◽

10.7554/elife.54967 ◽

2020 ◽

Vol 9 ◽

Cited By ~ 8

Author(s):

Jeffrey R Adrion ◽

Christopher B Cole ◽

Noah Dukler ◽

Jared G Galloway ◽

Ariella L Gladstein ◽

...

Keyword(s):

Population Genetic ◽

Genomic Data ◽

Simulation Models ◽

Easy Access ◽

Major Barrier ◽

Simulation Engine ◽

Complex Models ◽

Demographic Inference ◽

Genetic Simulation ◽

Inference Methods

The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.

Download Full-text

Phylogeography, Population Structure, and Species Delimitation in Rockhopper Penguins (Eudyptes chrysocome and Eudyptes moseleyi)

Journal of Heredity ◽

10.1093/jhered/esz051 ◽

2019 ◽

Author(s):

Herman L Mays ◽

David A Oehler ◽

Kyle W Morrison ◽

Ariadna E Morales ◽

Alyssa Lycans ◽

...

Keyword(s):

Population Structure ◽

Gene Flow ◽

Population Size ◽

Effective Population Size ◽

Species Delimitation ◽

Diversity Index ◽

Divergence Time ◽

Distinct Species ◽

Effective Population ◽

Heuristic Approaches

Abstract Rockhopper penguins are delimited as 2 species, the northern rockhopper (Eudyptes moseleyi) and the southern rockhopper (Eudyptes chrysocome), with the latter comprising 2 subspecies, the western rockhopper (Eudyptes chrysocome chrysocome) and the eastern rockhopper (Eudyptes chrysocome filholi). We conducted a phylogeographic study using multilocus data from 114 individuals sampled across 12 colonies from the entire range of the northern/southern rockhopper complex to assess potential population structure, gene flow, and species limits. Bayesian and likelihood methods with nuclear and mitochondrial DNA, including model testing and heuristic approaches, support E. moseleyi and E. chrysocome as distinct species lineages with a divergence time of 0.97 Ma. However, these analyses also indicated the presence of gene flow between these species. Among southern rockhopper subspecies, we found evidence of significant gene flow and heuristic approaches to species delimitation based on the genealogical diversity index failed to delimit them as species. The best-supported population models for the southern rockhoppers were those where E. c. chrysocome and E. c. filholi were combined into a single lineage or 2 lineages with bidirectional gene flow. Additionally, we found that E. c. filholi has the highest effective population size while E. c. chrysocome showed similar effective population size to that of the endangered E. moseleyi. We suggest that the current taxonomic definitions within rockhopper penguins be upheld and that E. chrysocome populations, all found south of the subtropical front, should be treated as a single taxon with distinct management units for E. c. chrysocome and E. c. filholi.

Download Full-text

Population structure and gene flow reversals in Atlantic salmon (Salmo salar) over contemporary and long-term temporal scales: effects of population size and life history

Molecular Ecology ◽

10.1111/j.1365-294x.2007.03541.x ◽

2007 ◽

Vol 16 (21) ◽

pp. 4504-4522 ◽

Cited By ~ 83

Author(s):

FRISO P. PALSTRA ◽

MICHAEL F. O’CONNELL ◽

DANIEL E. RUZZANTE

Keyword(s):

Population Structure ◽

Life History ◽

Gene Flow ◽

Atlantic Salmon ◽

Population Size ◽

Salmo Salar ◽

Temporal Scales ◽

Atlantic Salmon Salmo Salar ◽

Flow Reversals

Download Full-text

A spatial genomic approach identifies time lags and historic barriers to gene flow in a rapidly fragmenting Appalachian landscape

10.1101/777920 ◽

2019 ◽

Author(s):

Thomas A. Maigret ◽

John J. Cox ◽

David W. Weisrock

Keyword(s):

Population Structure ◽

Gene Flow ◽

Genomic Data ◽

Data Sets ◽

Time Lags ◽

Genetic Structuring ◽

Data Set ◽

Landscape Modification ◽

Snp Data ◽

Genetic Patterns

AbstractThe resolution offered by genomic data sets coupled with recently developed spatially informed analyses are allowing researchers to quantify population structure at increasingly fine temporal and spatial scales. However, uncertainties regarding data set size and quality thresholds and the time scale at which barriers to gene flow become detectable have limited both empirical research and conservation measures. Here, we used restriction site associated DNA sequencing to generate a large SNP data set for the copperhead snake (Agkistrodon contortrix) and address the population genomic impacts of recent and widespread landscape modification across an approximately 1000 km2 region of eastern Kentucky. Nonspatial population-based assignment and clustering methods supported little to no population structure. However, using individual-based spatial autocorrelation approaches we found evidence for genetic structuring which closely follows the path of a historic highway which experienced high traffic volumes from ca. 1920 to 1970. We found no similar spatial genomic signatures associated with more recently constructed highways or surface mining activity, though a time lag effect may be responsible for the lack of any emergent spatial genetic patterns. Subsampling of our SNP data set suggested that similar results could be obtained with as few as 250 SNPs, and thresholds for missing data exhibited limited impacts on the spatial patterns we detected outside of very strict or permissive extremes. Our findings highlight the importance of temporal factors in landscape genetics approaches, and suggest the potential advantages of large genomic data sets and fine-scale, spatially-informed approaches for quantifying subtle genetic patterns in temporally complex landscapes.

Download Full-text

Demographic inference using genetic data from a single individual: separating population size variation from population structure

10.1101/011866 ◽

2014 ◽

Cited By ~ 1

Author(s):

Olivier Mazet ◽

Willy Rodríguez ◽

Lounès Chikhi

Keyword(s):

Population Size ◽

Genomic Data ◽

Genetic Data ◽

Structured Populations ◽

Parameter Estimates ◽

Single Individual ◽

Model Choice ◽

Size Change ◽

History Of ◽

Population Size Change

The rapid development of sequencing technologies represents new opportunities for population genetics research. It is expected that genomic data will increase our ability to reconstruct the history of populations. While this increase in genetic information will likely help biologists and anthropologists to reconstruct the demographic history of populations, it also represents new challenges. Recent work has shown that structured populations generate signals of population size change. As a consequence it is often difficult to determine whether demographic events such as expansions or contractions (bottlenecks) inferred from genetic data are real or due to the fact that populations are structured in nature. Given that few inferential methods allow us to account for that structure, and that genomic data will necessarily increase the precision of parameter estimates, it is important to develop new approaches. In the present study we analyse two demographic models. The first is a model of instantaneous population size change whereas the second is the classical symmetric island model. We (i) re-derive the distribution of coalescence times under the two models for a sample of size two, (ii) use a maximum likelihood approach to estimate the parameters of these models (iii) validate this estimation procedure under a wide array of parameter combinations, (iv) implement and validate a model choice procedure by using a Kolmogorov-Smirnov test. Altogether we show that it is possible to estimate parameters under several models and perform efficient model choice using genetic data from a single diploid individual.

Download Full-text