scholarly journals The choice of tree prior and molecular clock does not substantially affect phylogenetic inferences of diversification rates

2018 ◽  
Author(s):  
Brice A. J. Sarver ◽  
Matthew W. Pennell ◽  
Joseph W. Brown ◽  
Sara Keeble ◽  
Kayla M. Hardwick ◽  
...  

AbstractComparative methods allow researchers to make inferences about evolutionary processes and patterns from phylogenetic trees. In Bayesian phylogenetics, estimating a phylogeny requires specifying priors on parameters characterizing the branching process and rates of substitution among lineages, in addition to others. However, the effect that the selection of these priors has on the inference of comparative parameters has not been thoroughly investigated. Such uncertainty may systematically bias phylogenetic reconstruction and, subsequently, parameter estimation. Here, we focus on the impact of priors in Bayesian phylogenetic inference and evaluate how they affect the estimation of parameters in macroevolutionary models of lineage diversification. Specifically, we use BEAST to simulate trees under combinations of tree priors and molecular clocks, simulate sequence data, estimate trees, and estimate diversification parameters (e.g., speciation rates and extinction rates) from these trees. When substitution rate heterogeneity is large, parameter estimates deviate substantially from those estimated under the simulation conditions when not captured by an appropriate choice of relaxed molecular clock. However, in general, we find that the choice of tree prior and molecular clock has relatively little impact on the estimation of diversification rates insofar as the sequence data are sufficiently informative and substitution rate heterogeneity among lineages is low-to-moderate.

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6334 ◽  
Author(s):  
Brice A.J. Sarver ◽  
Matthew W. Pennell ◽  
Joseph W. Brown ◽  
Sara Keeble ◽  
Kayla M. Hardwick ◽  
...  

Comparative methods allow researchers to make inferences about evolutionary processes and patterns from phylogenetic trees. In Bayesian phylogenetics, estimating a phylogeny requires specifying priors on parameters characterizing the branching process and rates of substitution among lineages, in addition to others. Accordingly, characterizing the effect of prior selection on phylogenies is an active area of research. The choice of priors may systematically bias phylogenetic reconstruction and, subsequently, affect conclusions drawn from the resulting phylogeny. Here, we focus on the impact of priors in Bayesian phylogenetic inference and evaluate how they affect the estimation of parameters in macroevolutionary models of lineage diversification. Specifically, we simulate trees under combinations of tree priors and molecular clocks, simulate sequence data, estimate trees, and estimate diversification parameters (e.g., speciation and extinction rates) from these trees. When substitution rate heterogeneity is large, diversification rate estimates deviate substantially from those estimated under the simulation conditions when not captured by an appropriate choice of relaxed molecular clock. However, in general, we find that the choice of tree prior and molecular clock has relatively little impact on the estimation of diversification rates insofar as the sequence data are sufficiently informative and substitution rate heterogeneity among lineages is low-to-moderate.


2016 ◽  
Vol 371 (1699) ◽  
pp. 20160098 ◽  
Author(s):  
Kenneth De Baets ◽  
Alexandre Antonelli ◽  
Philip C. J. Donoghue

Evolutionary timescales have mainly used fossils for calibrating molecular clocks, though fossils only really provide minimum clade age constraints. In their place, phylogenetic trees can be calibrated by precisely dated geological events that have shaped biogeography. However, tectonic episodes are protracted, their role in vicariance is rarely justified, the biogeography of living clades and their antecedents may differ, and the impact of such events is contingent on ecology. Biogeographic calibrations are no panacea for the shortcomings of fossil calibrations, but their associated uncertainties can be accommodated. We provide examples of how biogeographic calibrations based on geological data can be established for the fragmentation of the Pangaean supercontinent: (i) for the uplift of the Isthmus of Panama, (ii) the separation of New Zealand from Gondwana, and (iii) for the opening of the Atlantic Ocean. Biogeographic and fossil calibrations are complementary, not competing, approaches to constraining molecular clock analyses, providing alternative constraints on the age of clades that are vital to avoiding circularity in investigating the role of biogeographic mechanisms in shaping modern biodiversity. This article is part of the themed issue ‘Dating species divergences using rocks and clocks’.


2013 ◽  
Vol 368 (1614) ◽  
pp. 20120198 ◽  
Author(s):  
Tanja Stadler ◽  
Sebastian Bonhoeffer

Host population structure has a major influence on epidemiological dynamics. However, in particular for sexually transmitted diseases, quantitative data on population contact structure are hard to obtain. Here, we introduce a new method that quantifies host population structure based on phylogenetic trees, which are obtained from pathogen genetic sequence data. Our method is based on a maximum-likelihood framework and uses a multi-type branching process, under which each host is assigned to a type (subpopulation). In a simulation study, we show that our method produces accurate parameter estimates for phylogenetic trees in which each tip is assigned to a type, as well for phylogenetic trees in which the type of the tip is unknown. We apply the method to a Latvian HIV-1 dataset, quantifying the impact of the intravenous drug user epidemic on the heterosexual epidemic (known tip states), and identifying superspreader dynamics within the men-having-sex-with-men epidemic (unknown tip states).


Author(s):  
Juan Alfredo Holley ◽  
Néstor Guillermo Basso ◽  
Juliana Sterli

Background. The clade Chelidae (Testudines, Pleurodira) is a group of fresh water turtles with representatives in Australasia and South America. Its diversity of extant and fossil species is characterized by two recognized morphotypes: the long-necked and the short-necked chelids. So far, the phylogenies constructed over Chelidae differ depending on the information source. While morphology recovers one monophyletic group of long-necked chelids (with South American and Australasian species), the molecular data split the group into South American and Australasian chelids, both as monophyletic sister groups and containing long-necked species. The consequences of this conflict imply the emergence of long-necked chelids (i) one time before the final breakup of Southern Gondwana (≅ 35 Mya) or (ii) independently after this event. Methods. Using BEAST, a set of molecular clock analyses was performed. Seven of these analyses correspond to the molecular hypothesis and thirteen to the morphological hypothesis. So, ten fossils were used as calibration points in different combinations for each hypothesis. The results were statistically compared performing ANOVA and the global similarity was inspected by a hierarchical cluster analysis (HCA). Results. Molecular hypothesis: all the analyses produced an age of the origin of Chelidae, and rising of long neck, older than 35 Mys. Divergence times in the South American clade were generally older than the observed in the Australasian clade. The result of the HCA was: analyses 2, 4 and 5 form a group and the analyses 3, 6 and 7 form another group; the analysis 1 is close related to this last. Morphological hypothesis: the origin of the clade of long-necked chelids predated the 35 Mys in all the analyses except one; however the Chelodina group resulted younger than this age in all the analyses. The HCA yielded two main groups of molecular clock analyses (1, 3, 7, 8, 9, 13 and 2, 4, 6, 10, 11, 12) and one analysis (5) clearly separated of these two. The ANOVA resulted in significant differences for all estimated nodes in both phylogenetic hypotheses. Discussion. Our set of molecular clock analyses suggests an early diversification of the chelid turtles and the raising of the long-necked chelids before the final breakup of Southern Gondwana. However, the appearance of this trait one time or as evolutionary convergence still depends on which phylogenetic scenario is taken into account. Furthermore, our results indicate that the number of calibration points not necessarily improve the precision of estimated nodes. Instead the “quality” of the fossils used as calibrations and its position in the phylogeny, have appreciable impact not only over this parameter, but also over the global evolutionary rate along the tree.


2021 ◽  
Author(s):  
Jeremy M Beaulieu ◽  
Brian C O'Meara

There is a prevailing view that the inclusion of fossil data could remedy identifiability issues related to models of diversification, by drastically reducing the number of congruent models. The fossilized birth-death (FBD) model is an appealing way of directly incorporating fossil information when estimating diversification rates. Here we explore the benefits of including fossils by implementing and then testing two-types of FBD models in more complex likelihood-based models that assume multiple rate classes across the tree. We also assess the impact of severely undersampling, and even not including fossils that represent samples of lineages that also had sampled descendants (i.e., k-type fossils), as well as converting a fossil set to represent stratigraphic ranges. Under various simulation scenarios, including a scenario that exists far outside the set of models we evaluated, including fossils rarely outperforms analyses that exclude them altogether. At best, the inclusion of fossils improves precision but does not influence bias. We also found that severely undercounting the number of k-type fossils produces highly inflated rates of turnover and extinction fraction. Similarly, we found that converting the fossil set to stratigraphic ranges results in turnover rates and extinction fraction estimates that are generally underestimated. While fossils remain essential for understanding diversification through time, in the specific case of understanding diversification given an existing, largely modern tree, they are not especially beneficial.


2015 ◽  
Author(s):  
Juan Alfredo Holley ◽  
Néstor Guillermo Basso ◽  
Juliana Sterli

Background. The clade Chelidae (Testudines, Pleurodira) is a group of fresh water turtles with representatives in Australasia and South America. Its diversity of extant and fossil species is characterized by two recognized morphotypes: the long-necked and the short-necked chelids. So far, the phylogenies constructed over Chelidae differ depending on the information source. While morphology recovers one monophyletic group of long-necked chelids (with South American and Australasian species), the molecular data split the group into South American and Australasian chelids, both as monophyletic sister groups and containing long-necked species. The consequences of this conflict imply the emergence of long-necked chelids (i) one time before the final breakup of Southern Gondwana (≅ 35 Mya) or (ii) independently after this event. Methods. Using BEAST, a set of molecular clock analyses was performed. Seven of these analyses correspond to the molecular hypothesis and thirteen to the morphological hypothesis. So, ten fossils were used as calibration points in different combinations for each hypothesis. The results were statistically compared performing ANOVA and the global similarity was inspected by a hierarchical cluster analysis (HCA). Results. Molecular hypothesis: all the analyses produced an age of the origin of Chelidae, and rising of long neck, older than 35 Mys. Divergence times in the South American clade were generally older than the observed in the Australasian clade. The result of the HCA was: analyses 2, 4 and 5 form a group and the analyses 3, 6 and 7 form another group; the analysis 1 is close related to this last. Morphological hypothesis: the origin of the clade of long-necked chelids predated the 35 Mys in all the analyses except one; however the Chelodina group resulted younger than this age in all the analyses. The HCA yielded two main groups of molecular clock analyses (1, 3, 7, 8, 9, 13 and 2, 4, 6, 10, 11, 12) and one analysis (5) clearly separated of these two. The ANOVA resulted in significant differences for all estimated nodes in both phylogenetic hypotheses. Discussion. Our set of molecular clock analyses suggests an early diversification of the chelid turtles and the raising of the long-necked chelids before the final breakup of Southern Gondwana. However, the appearance of this trait one time or as evolutionary convergence still depends on which phylogenetic scenario is taken into account. Furthermore, our results indicate that the number of calibration points not necessarily improve the precision of estimated nodes. Instead the “quality” of the fossils used as calibrations and its position in the phylogeny, have appreciable impact not only over this parameter, but also over the global evolutionary rate along the tree.


Author(s):  
Hesam Montazeri ◽  
Susan Little ◽  
Mozhgan Mozaffarilegha ◽  
Niko Beerenwinkel ◽  
Victor DeGruttola

AbstractGenetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably reconstruct transmission trees because genetic sequence data or inferred phylogenetic trees from such data contain insufficient information for accurate estimation of transmission chains. Here, we show by simulation studies that incorporating infection times, even when they are uncertain, can greatly improve the accuracy of reconstruction of transmission trees. To achieve this improvement, we propose a Bayesian inference methods using Markov chain Monte Carlo that directly draws samples from the space of transmission trees under the assumption of complete sampling of the outbreak. The likelihood of each transmission tree is computed by a phylogenetic model by treating its internal nodes as transmission events. By a simulation study, we demonstrate that accuracy of the reconstructed transmission trees depends mainly on the amount of information available on times of infection; we show superiority of the proposed method to two alternative approaches when infection times are known up to specified degrees of certainty. In addition, we illustrate the use of a multiple imputation framework to study features of epidemic dynamics, such as the relationship between characteristics of nodes and average number of outbound edges or inbound edges, signifying possible transmission events from and to nodes. We apply the proposed method to a transmission cluster in San Diego and to a dataset from the 2014 Sierra Leone Ebola virus outbreak and investigate the impact of biological, behavioral, and demographic factors.


Methodology ◽  
2015 ◽  
Vol 11 (3) ◽  
pp. 89-99 ◽  
Author(s):  
Leslie Rutkowski ◽  
Yan Zhou

Abstract. Given a consistent interest in comparing achievement across sub-populations in international assessments such as TIMSS, PIRLS, and PISA, it is critical that sub-population achievement is estimated reliably and with sufficient precision. As such, we systematically examine the limitations to current estimation methods used by these programs. Using a simulation study along with empirical results from the 2007 cycle of TIMSS, we show that a combination of missing and misclassified data in the conditioning model induces biases in sub-population achievement estimates, the magnitude and degree to which can be readily explained by data quality. Importantly, estimated biases in sub-population achievement are limited to the conditioning variable with poor-quality data while other sub-population achievement estimates are unaffected. Findings are generally in line with theory on missing and error-prone covariates. The current research adds to a small body of literature that has noted some of the limitations to sub-population estimation.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 477-477
Author(s):  
Leah K Treffer ◽  
Edward S Rice ◽  
Anna M Fuller ◽  
Samuel Cutler ◽  
Jessica L Petersen

Abstract Domestic yak (Bos grunniens) are bovids native to the Asian Qinghai-Tibetan Plateau. Studies of Asian yak have revealed that introgression with domestic cattle has contributed to the evolution of the species. When imported to North America (NA), some hybridization with B. taurus did occur. The objective of this study was to use mitochondrial (mt) DNA sequence data to better understand the mtDNA origin of NA yak and their relationship to Asian yak and related species. The complete mtDNA sequence of 14 individuals (12 NA yak, 1 Tibetan yak, 1 Tibetan B. indicus) was generated and compared with sequences of similar species from GeneBank (B. indicus, B. grunniens (Chinese), B. taurus, B. gaurus, B. primigenius, B. frontalis, Bison bison, and Ovis aries). Individuals were aligned to the B. grunniens reference genome (ARS_UNL_BGru_maternal_1.0), which was also included in the analyses. The mtDNA genes were annotated using the ARS-UCD1.2 cattle sequence as a reference. Ten unique NA yak haplotypes were identified, which a haplotype network separated into two clusters. Variation among the NA haplotypes included 93 nonsynonymous single nucleotide polymorphisms. A maximum likelihood tree including all taxa was made using IQtree after the data were partitioned into twenty-two subgroups using PartitionFinder2. Notably, six NA yak haplotypes formed a clade with B. indicus; the other four haplotypes grouped with B. grunniens and fell as a sister clade to bison, gaur and gayal. These data demonstrate two mitochondrial origins of NA yak with genetic variation in protein coding genes. Although these data suggest yak introgression with B. indicus, it appears to date prior to importation into NA. In addition to contributing to our understanding of the species history, these results suggest the two major mtDNA haplotypes in NA yak may functionally differ. Characterization of the impact of these differences on cellular function is currently underway.


2021 ◽  
Vol 45 (3) ◽  
pp. 159-177
Author(s):  
Chen-Wei Liu

Missing not at random (MNAR) modeling for non-ignorable missing responses usually assumes that the latent variable distribution is a bivariate normal distribution. Such an assumption is rarely verified and often employed as a standard in practice. Recent studies for “complete” item responses (i.e., no missing data) have shown that ignoring the nonnormal distribution of a unidimensional latent variable, especially skewed or bimodal, can yield biased estimates and misleading conclusion. However, dealing with the bivariate nonnormal latent variable distribution with present MNAR data has not been looked into. This article proposes to extend unidimensional empirical histogram and Davidian curve methods to simultaneously deal with nonnormal latent variable distribution and MNAR data. A simulation study is carried out to demonstrate the consequence of ignoring bivariate nonnormal distribution on parameter estimates, followed by an empirical analysis of “don’t know” item responses. The results presented in this article show that examining the assumption of bivariate nonnormal latent variable distribution should be considered as a routine for MNAR data to minimize the impact of nonnormality on parameter estimates.


Sign in / Sign up

Export Citation Format

Share Document