scholarly journals Insights from a general, full‐likelihood Bayesian approach to inferring shared evolutionary events from genomic data: Inferring shared demographic events is challenging*

Evolution ◽  
2020 ◽  
Vol 74 (10) ◽  
pp. 2184-2206 ◽  
Author(s):  
Jamie R. Oaks ◽  
Nadia L'Bahy ◽  
Kerry A. Cobb
2021 ◽  
Author(s):  
Ziheng Yang ◽  
Thomas Flouris

The multispecies coalescent with introgression (MSci) model accommodates both the coalescent process and cross-species introgression/ hybridization events, two major processes that create genealogical fluctuations across the genome and gene-tree-species-tree discordance. Full likelihood implementations of the MSci model take such fluctuations as a major source of information about the history of species divergence and gene flow, and provide a powerful tool for estimating the direction, timing and strength of cross-species introgression using multilocus sequence data. However, introgression models, in particular those that accommodate bidirectional introgression (BDI), are known to cause unidentifiability issues of the label-switching type, whereby different models or parameters make the same predictions about the genomic data and thus cannot be distinguished by the data. Nevertheless, there has been no systematic study of unidentifiability when full likelihood methods are applied. Here we characterize the unidentifiability of arbitrary BDI models and derive simple rules for its identification. In general, an MSci model with k BDI events has 2^k unidentifiable towers in the posterior, with each BDI event between sister species creating within-model unidentifiability and each BDI between non-sister species creating cross-model unidentifiability. We develop novel algorithms for processing Markov chain Monte Carlo (MCMC) samples to remove label switching and implement them in the BPP program. We analyze genomic sequence data from Heliconius butterflies as well as synthetic data to illustrate the utility of the BDI models and the new algorithms.


2018 ◽  
Author(s):  
Jamie R. Oaks

AbstractA challenge to understanding biological diversification is accounting for community-scale processes that cause multiple, co-distributed lineages to co-speciate. Such processes predict non-independent, temporally clustered divergences across taxa. Approximate-likelihood Bayesian computation (ABC) approaches to inferring such patterns from comparative genetic data are very sensitive to prior assumptions and often biased toward estimating shared divergences. We introduce a full-likelihood Bayesian approach, ecoevolity, which takes full advantage of information in genomic data. By analytically integrating over gene trees, we are able to directly calculate the likelihood of the population history from genomic data, and efficiently sample the model-averaged posterior via Markov chain Monte Carlo algorithms. Using simulations, we find that the new method is much more accurate and precise at estimating the number and timing of divergence events across pairs of populations than existing approximate-likelihood methods. Our full Bayesian approach also requires several orders of magnitude less computational time than existing ABC approaches. We find that despite assuming unlinked characters (e.g., unlinked single-nucleotide polymorphisms), the new method performs better if this assumption is violated in order to retain the constant characters of whole linked loci. In fact, retaining constant characters allows the new method to robustly estimate the correct number of divergence events with high posterior probability in the face of character-acquisition biases, which commonly plague loci assembled from reduced-representation genomic libraries. We apply our method to genomic data from four pairs of insular populations of Gekko lizards from the Philippines that are not expected to have co-diverged. Despite all four pairs diverging very recently, our method strongly supports that they diverged independently, and these results are robust to very disparate prior assumptions.


2019 ◽  
Author(s):  
Jamie R. Oaks ◽  
Nadia L’Bahy ◽  
Kerry A. Cobb

AbstractFactors that influence the distribution, abundance, and diversification of species can simultaneously affect multiple evolutionary lineages within or across communities. These include changes to the environment or inter-specific ecological interactions that cause ranges of multiple species to contract, expand, or fragment. Such processes predict temporally clustered evolutionary events across species, such as synchronous population divergences and/or changes in population size. There have been a number of methods developed to infer shared divergences or changes in population size, but not both, and the latter has been limited to approximate methods. We introduce a full-likelihood Bayesian method that uses genomic data to estimate temporal clustering of an arbitrary mix of population divergences and population-size changes across taxa. Using simulated data, we find that estimating the timing and sharing of demographic changes tends to be inaccurate and sensitive to prior assumptions, which is in contrast to accurate, precise, and robust estimates of shared divergence times. We also show previous estimates of co-expansion among five Alaskan populations of threespine sticklebacks (Gasterosteus aculeatus) were likely driven by prior assumptions and ignoring invariant characters. We conclude by discussing potential avenues to improve the estimation of synchronous demographic changes across populations.


2020 ◽  
Author(s):  
Laetitia Zmuda ◽  
Charlotte Baey ◽  
Paolo Mairano ◽  
Anahita Basirat

It is well-known that individuals can identify novel words in a stream of an artificial language using statistical dependencies. While underlying computations are thought to be similar from one stream to another (e.g. transitional probabilities between syllables), performance are not similar. According to the “linguistic entrenchment” hypothesis, this would be due to the fact that individuals have some prior knowledge regarding co-occurrences of elements in speech which intervene during verbal statistical learning. The focus of previous studies was on task performance. The goal of the current study is to examine the extent to which prior knowledge impacts metacognition (i.e. ability to evaluate one’s own cognitive processes). Participants were exposed to two different artificial languages. Using a fully Bayesian approach, we estimated an unbiased measure of metacognitive efficiency and compared the two languages in terms of task performance and metacognition. While task performance was higher in one of the languages, the metacognitive efficiency was similar in both languages. In addition, a model assuming no correlation between the two languages better accounted for our results compared to a model where correlations were introduced. We discuss the implications of our findings regarding the computations which underlie the interaction between input and prior knowledge during verbal statistical learning.


Sign in / Sign up

Export Citation Format

Share Document