Bayesian phylodynamic inference with complex models

Mapping Intimacies ◽

10.1101/268052 ◽

2018 ◽

Cited By ~ 2

Author(s):

Erik M. Volz ◽

Igor Siveroni

Keyword(s):

Population Genetic ◽

Influenza A ◽

Genetic Model ◽

Sequence Data ◽

Genetic Data ◽

Simultaneous Estimation ◽

Model Parameters ◽

Epidemiological Models ◽

Structured Coalescent ◽

Speed Accuracy

AbstractPopulation genetic modeling can enhance Bayesian phylogenetic inference by providing a realistic prior on the distribution of branch lengths and times of common ancestry.The parameters of a population genetic model may also have intrinsic importance, and simultaneous estimation of a phylogeny and model parameters has enabled phylodynamic inference of population growth rates, reproduction numbers, and effective population size through time. Phylodynamic inference based on pathogen genetic sequence data has emerged as useful supplement to epidemic surveillance, however commonly-used mechanistic models that are typically fitted to non-genetic surveillance data are rarely fitted to pathogen genetic data due to a dearth of software tools, and the theory required to conduct such inference has been developed only recently. We present a framework for coalescent-based phylogenetic and phylodynamic inference which enables highly-flexible modeling of demographic and epidemiological processes. This approach builds upon previous structured coalescent approaches and includes enhancements for computational speed, accuracy, and stability. A flexible markup language is described for translating parametric demographic or epidemiological models into a structured coalescent model enabling simultaneous estimation of demographic or epidemiological parameters and time-scaled phylogenies. We demonstrate the utility of these approaches by fitting compartmental epidemiological models to Ebola virus and Influenza A virus sequence data, demonstrating how important features of these epidemics, such as the reproduction number and epidemic curves, can be gleaned from genetic data. These approaches are provided as an open-source package PhyDyn for the BEAST phylogenetics platform.

Download Full-text

The coalescent process in models with selection, recombination and geographic subdivision

Genetics Research ◽

10.1017/s0016672300029074 ◽

1991 ◽

Vol 57 (1) ◽

pp. 83-91 ◽

Cited By ~ 40

Author(s):

Norman Kaplan ◽

Richard R. Hudson ◽

Masaru Iizuka

Keyword(s):

Genetic Variation ◽

Population Genetic ◽

Genetic Model ◽

Sequence Data ◽

Balancing Selection ◽

Similar Model ◽

Proposed Model ◽

Coalescent Approach ◽

Neutral Mutations ◽

Better Than

SummaryA population genetic model with a single locus at which balancing selection acts and many linked loci at which neutral mutations can occur is analysed using the coalescent approach. The model incorporates geographic subdivision with migration, as well as mutation, recombination, and genetic drift of neutral variation. It is found that geographic subdivision can affect genetic variation even with high rates of migration, providing that selection is strong enough to maintain different allele frequencies at the selected locus. Published sequence data from the alcohol dehydrogenase locus of Drosophila melanogaster are found to fit the proposed model slightly better than a similar model without subdivision.

Download Full-text

A two-tiered model for simulating the ecological and evolutionary dynamics of rapidly evolving viruses, with an application to influenza

Journal of The Royal Society Interface ◽

10.1098/rsif.2010.0007 ◽

2010 ◽

Vol 7 (50) ◽

pp. 1257-1274 ◽

Cited By ~ 39

Author(s):

Katia Koelle ◽

Priya Khatri ◽

Meredith Kamradt ◽

Thomas B. Kepler

Keyword(s):

Influenza A ◽

Evolutionary Dynamics ◽

De Novo ◽

Sequence Data ◽

Nucleotide Composition ◽

Sequence Length ◽

Influenza B ◽

Model Parameters ◽

De Novo Mutations ◽

Estimation Of Model Parameters

Understanding the epidemiological and evolutionary dynamics of rapidly evolving pathogens is one of the most challenging problems facing disease ecologists today. To date, many mathematical and individual-based models have provided key insights into the factors that may regulate these dynamics. However, in many of these models, abstractions have been made to the simulated sequences that limit an effective interface with empirical data. This is especially the case for rapidly evolving viruses in which de novo mutations result in antigenically novel variants. With this focus, we present a simple two-tiered ‘phylodynamic’ model whose purpose is to simulate, along with case data, sequence data that will allow for a more quantitative interface with observed sequence data. The model differs from previous approaches in that it separates the simulation of the epidemiological dynamics (tier 1) from the molecular evolution of the virus's dominant antigenic protein (tier 2). This separation of phenotypic dynamics from genetic dynamics results in a modular model that is computationally simpler and allows sequences to be simulated with specifications such as sequence length, nucleotide composition and molecular constraints. To illustrate its use, we apply the model to influenza A (H3N2) dynamics in humans, influenza B dynamics in humans and influenza A (H3N8) dynamics in equine hosts. In all three of these illustrative examples, we show that the model can simulate sequences that are quantitatively similar in pattern to those empirically observed. Future work should focus on statistical estimation of model parameters for these examples as well as the possibility of applying this model, or variants thereof, to other host–virus systems.

Download Full-text

Joint inference of migration and reassortment patterns for viruses with segment genomes

10.1101/2021.05.15.442587 ◽

2021 ◽

Author(s):

Ugnė Stolz ◽

Nicola Felix Müller ◽

Tanja Stadler ◽

Timothy Glenn Vaughan

Keyword(s):

Influenza A ◽

Genetic Recombination ◽

Sequence Data ◽

Avian Host ◽

Sequencing Data ◽

Effective Population ◽

New Model ◽

Joint Inference ◽

Structured Coalescent ◽

And Migration

The structured coalescent allows inferring migration patterns between viral sub-populations from genetic sequence data. However, these analyses typically assume that no genetic recombination process impacted the sequence evolution of pathogens. For segmented viruses, such as influenza, that can undergo reassortment this assumption is broken. Reassortment reshuffles the segments of different parent lineages upon a coinfection event, which means that the shared history of viruses has to be represented by a network instead of a tree. Therefore, full genome analyses of such viruses is complex or even impossible. While this problem has been addressed for unstructured populations, it is still impossible to account for population structure, such as induced by different host populations, while also accounting for reassortment% at the same time. We address this by extending the structured coalescent to account for reassortment and present a framework for investigating possible ties between reassortment and migration (host jump) events. This method can accurately estimate sub-population dependent effective populations sizes, reassortment and migration rates from simulated data. Additionally, we apply the new model to avian influenza A/H5N1 sequences, sampled from two avian host types, Anseriformes and Galliformes. We contrast our results with a structured coalescent without reassortment inference, which assumes independently evolving segments. This reveals that taking into account segment reassortment and using sequencing data from several viral segments for joint phylodynamic inference leads to different estimates for effective population sizes, migration and clock rates. This new model is implemented as the Structured Coalescent with Reassortment (SCoRe) package for BEAST 2.5 and is available at https://github.com/jugne/SCORE.

Download Full-text

Phylodynamic inference for emerging viruses using segregating sites

10.1101/2021.07.07.451508 ◽

2021 ◽

Author(s):

Yeongseon Park ◽

Michael A. Martin ◽

Katia Koelle

Keyword(s):

Sequential Monte Carlo ◽

Sequence Data ◽

Epidemiological Surveillance ◽

Viral Population ◽

Reproductive Number ◽

Model Parameters ◽

Disease Dynamics ◽

Epidemiological Models ◽

Epidemiological Parameters ◽

Segregating Sites

Epidemiological models are commonly fit to case data to estimate model parameters and to infer unobserved disease dynamics. More recently, epidemiological models have also been fit to viral sequence data using phylodynamic inference approaches that generally rely on the reconstruction of viral phylogenies. However, especially early on in an expanding viral population, phylogenetic uncertainty can be substantial and methods that require integration over this uncertainty can be computationally intensive. Here, we present an alternative approach to phylodynamic inference that circumvents the need for phylogenetic tree reconstruction. Our "tree-free" approach instead relies on quantifying the number of segregating sites observed in sets of sequences over time and using this trajectory of segregating sites to infer epidemiological parameters within a Sequential Monte Carlo (SMC) framework. Using forward simulations, we first show that epidemiological parameters and processes leave characteristic signatures in segregating site trajectories, demonstrating that these trajectories have the potential to be used for phylodynamic inference. We then show using mock data that our proposed approach accurately recovers key epidemiological quantities such as the basic reproduction number and the timing of the index case. Finally, we apply our approach to SARS-CoV-2 sequence data from France, estimating a reproductive number of approximately 2.2 and an introduction time of mid-January 2021, consistent with estimates from epidemiological surveillance data. Our findings indicate that "tree-free?" phylodynamic inference approaches that rely on simple population genetic summary statistics can play an important role in estimating epidemiological parameters and reconstructing infectious disease dynamics, especially early on in an epidemic.

Download Full-text

Inferring time-dependent migration and coalescence patterns from genetic sequence and predictor data in structured populations

Virus Evolution ◽

10.1093/ve/vez030 ◽

2019 ◽

Vol 5 (2) ◽

Cited By ~ 4

Author(s):

Nicola F Müller ◽

Gytis Dudas ◽

Tanja Stadler

Keyword(s):

Population Dynamics ◽

Phylogenetic Trees ◽

Sequence Data ◽

Structured Populations ◽

Model Parameters ◽

Effective Population ◽

Genetic Sequence ◽

Migration Rates ◽

Population Sizes ◽

Structured Coalescent

Abstract Population dynamics can be inferred from genetic sequence data by using phylodynamic methods. These methods typically quantify the dynamics in unstructured populations or assume migration rates and effective population sizes to be constant through time in structured populations. When considering rates to vary through time in structured populations, the number of parameters to infer increases rapidly and the available data might not be sufficient to inform these. Additionally, it is often of interest to know what predicts these parameters rather than knowing the parameters themselves. Here, we introduce a method to infer the predictors for time-varying migration rates and effective population sizes by using a generalized linear model (GLM) approach under the marginal approximation of the structured coalescent. Using simulations, we show that our approach is able to reliably infer the model parameters and its predictors from phylogenetic trees. Furthermore, when simulating trees under the structured coalescent, we show that our new approach outperforms the discrete trait GLM model. We then apply our framework to a previously described Ebola virus dataset, where we infer the parameters and its predictors from genome sequences while accounting for phylogenetic uncertainty. We infer weekly cases to be the strongest predictor for effective population size and geographic distance the strongest predictor for migration. This approach is implemented as part of the BEAST2 package MASCOT, which allows us to jointly infer population dynamics, i.e. the parameters and predictors, within structured populations, the phylogenetic tree, and evolutionary parameters.

Download Full-text

Legofit: estimating population history from genetic data

BMC Bioinformatics ◽

10.1186/s12859-019-3154-1 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Alan R. Rogers

Keyword(s):

Model Selection ◽

Statistical Methods ◽

Sequence Data ◽

Model Averaging ◽

Genetic Data ◽

Population History ◽

Simultaneous Estimation ◽

Estimation Of Parameters ◽

History Of ◽

Parameter Values

Abstract Background Our current understanding of archaic admixture in humans relies on statistical methods with large biases, whose magnitudes depend on the sizes and separation times of ancestral populations. To avoid these biases, it is necessary to estimate these parameters simultaneously with those describing admixture. Genetic estimates of population histories also confront problems of statistical identifiability: different models or different combinations of parameter values may fit the data equally well. To deal with this problem, we need methods of model selection and model averaging, which are lacking from most existing software. Results The Legofit software package allows simultaneous estimation of parameters describing admixture, and the sizes and separation times of ancestral populations. It includes facilities for data manipulation, estimation, analysis of residuals, model selection, and model averaging. Conclusions Legofit uses genetic data to study the history of a subdivided population. It is unaffected by recent history and can therefore focus on the deep history of population size, subdivision, and admixture. It outperforms several statistical methods that have been widely used to study population history and should be useful in any species for which DNA sequence data is available from several populations.

Download Full-text

Faculty Opinions recommendation of Evolution of functionally conserved enhancers can be accelerated in large populations: a population-genetic model.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1010968.175358 ◽

2002 ◽

Author(s):

Patricia Simpson

Keyword(s):

Population Genetic ◽

Genetic Model ◽

Population Genetic Model ◽

Large Populations

Download Full-text

Population genetic data and forensic parameters of the 27 Y-STR panel Yfiler® Plus in Russian population

International Journal of Legal Medicine ◽

10.1007/s00414-021-02599-8 ◽

2021 ◽

Author(s):

Andrei Semikhodskii ◽

Yevgeniy Krassotkin ◽

Tatiana Makarova ◽

Vladislav Zavarin ◽

Viktoria Ilina ◽

...

Keyword(s):

Population Genetic ◽

Genetic Data ◽

Russian Population ◽

Population Genetic Data ◽

Forensic Parameters

Download Full-text

Qualitative speed-accuracy tradeoff effects can be explained by a diffusion/fast-guess mixture model

Scientific Reports ◽

10.1038/s41598-021-94451-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Roger Ratcliff ◽

Inhan Kang

Keyword(s):

Mixture Model ◽

Diffusion Model ◽

Response Times ◽

Model Fit ◽

Orientation Discrimination ◽

Model Parameters ◽

Stimulus Contrast ◽

Accuracy Stress ◽

Speed Accuracy ◽

Selective Influence

AbstractRafiei and Rahnev (2021) presented an analysis of an experiment in which they manipulated speed-accuracy stress and stimulus contrast in an orientation discrimination task. They argued that the standard diffusion model could not account for the patterns of data their experiment produced. However, their experiment encouraged and produced fast guesses in the higher speed-stress conditions. These fast guesses are responses with chance accuracy and response times (RTs) less than 300 ms. We developed a simple mixture model in which fast guesses were represented by a simple normal distribution with fixed mean and standard deviation and other responses by the standard diffusion process. The model fit the whole pattern of accuracy and RTs as a function of speed/accuracy stress and stimulus contrast, including the sometimes bimodal shapes of RT distributions. In the model, speed-accuracy stress affected some model parameters while stimulus contrast affected a different one showing selective influence. Rafiei and Rahnev’s failure to fit the diffusion model was the result of driving subjects to fast guess in their experiment.

Download Full-text

Population genetic data for 15 X chromosomal short tandem repeat markers in three U.S. populations

Forensic Science International Genetics ◽

10.1016/j.fsigen.2013.07.008 ◽

2014 ◽

Vol 8 (1) ◽

pp. 64-67 ◽

Cited By ~ 5

Author(s):

Toni M. Diegoli ◽

Adrian Linacre ◽

Michael D. Coble

Keyword(s):

Tandem Repeat ◽

Short Tandem Repeat ◽

Population Genetic ◽

Genetic Data ◽

Population Genetic Data ◽

Short Tandem Repeat Markers ◽

Short Tandem

Download Full-text