scholarly journals The Influence of Taxon Sampling and Tree Shape on Molecular Dating: An Empirical Example from Mammalian Mitochondrial Genomes

2012 ◽  
Vol 6 ◽  
pp. BBI.S9677 ◽  
Author(s):  
André E.R. Soares ◽  
Carlos G. Schrago

Over the last decade, molecular dating methods have been among the most studied subjects in statistical phylogenetics. Although the evolutionary modelling of substitution rates and the handling of calibration information are the primary focus of species divergence time research, parameters that influence topological estimation, such as taxon sampling and tree shape, also have the potential to influence evolutionary age estimates. However, the impact of topological parameters on chronological estimates is rarely considered. In this study, we use mitochondrial genomes to evaluate the influence of tree shape and taxon sampling on the divergence times of selected nodes of the mammalian tree. Our results show that taxon sampling affects divergence time estimates; the credibility intervals for age estimates decrease as taxonomic sampling increases (i.e., estimates become more precise). The influence of taxonomic sampling was not observed on nodes that lay deep in the mammalian phylogeny, although the means of the posterior distributions tend to converge with increased taxon sampling, an effect that is independent of the location of the node. In the majority of cases, the effect of tree shape was negligible.

2013 ◽  
Vol 2013 ◽  
pp. 1-12 ◽  
Author(s):  
James A. Schulte

Methods for estimating divergence times from molecular data have improved dramatically over the past decade, yet there are few studies examining alternative taxon sampling effects on node age estimates. Here, I investigate the effect of undersampling species diversity on node ages of the South American lizard clade Liolaemini using several alternative subsampling strategies for both time calibrations and taxa numbers. Penalized likelihood (PL) and Bayesian molecular dating analyses were conducted on a densely sampled (202 taxa) mtDNA-based phylogenetic hypothesis of Iguanidae, including 92 Liolaemini species. Using all calibrations and penalized likelihood, clades with very low taxon sampling had node age estimates younger than clades with more complete taxon sampling. The effect of Bayesian and PL methods differed when either one or two calibrations only were used with dense taxon sampling. Bayesian node ages were always older when fewer calibrations were used, whereas PL node ages were always younger. This work reinforces two important points: (1) whenever possible, authors should strongly consider adding as many taxa as possible, including numerous outgroups, prior to node age estimation to avoid considerable node age underestimation and (2) using more, critically assessed, and accurate fossil calibrations should yield improved divergence time estimates.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i884-i894
Author(s):  
Jose Barba-Montoya ◽  
Qiqing Tao ◽  
Sudhir Kumar

Abstract Motivation As the number and diversity of species and genes grow in contemporary datasets, two common assumptions made in all molecular dating methods, namely the time-reversibility and stationarity of the substitution process, become untenable. No software tools for molecular dating allow researchers to relax these two assumptions in their data analyses. Frequently the same General Time Reversible (GTR) model across lineages along with a gamma (+Γ) distributed rates across sites is used in relaxed clock analyses, which assumes time-reversibility and stationarity of the substitution process. Many reports have quantified the impact of violations of these underlying assumptions on molecular phylogeny, but none have systematically analyzed their impact on divergence time estimates. Results We quantified the bias on time estimates that resulted from using the GTR + Γ model for the analysis of computer-simulated nucleotide sequence alignments that were evolved with non-stationary (NS) and non-reversible (NR) substitution models. We tested Bayesian and RelTime approaches that do not require a molecular clock for estimating divergence times. Divergence times obtained using a GTR + Γ model differed only slightly (∼3% on average) from the expected times for NR datasets, but the difference was larger for NS datasets (∼10% on average). The use of only a few calibrations reduced these biases considerably (∼5%). Confidence and credibility intervals from GTR + Γ analysis usually contained correct times. Therefore, the bias introduced by the use of the GTR + Γ model to analyze datasets, in which the time-reversibility and stationarity assumptions are violated, is likely not large and can be reduced by applying multiple calibrations. Availability and implementation All datasets are deposited in Figshare: https://doi.org/10.6084/m9.figshare.12594638.


2020 ◽  
Author(s):  
Jose Barba-Montoya ◽  
Qiqing Tao ◽  
Sudhir Kumar

AbstractMotivationAs the number and diversity of species and genes grow in contemporary datasets, two common assumptions made in all molecular dating methods, namely the time-reversibility and stationarity of the substitution process, become untenable. No software tools for molecular dating allow researchers to relax these two assumptions in their data analyses. Frequently the same General Time Reversible (GTR) model across lineages along with a gamma (+Γ) distributed rates across sites is used in relaxed clock analyses, which assumes time-reversibility and stationarity of the substitution process. Many reports have quantified the impact of violations of these underlying assumptions on molecular phylogeny, but none have systematically analyzed their impact on divergence time estimates.ResultsWe quantified the bias on time estimates that resulted from using the GTR+Γ model for the analysis of computer-simulated nucleotide sequence alignments that were evolved with non-stationary (NS) and non-reversible (NR) substitution models. We tested Bayesian and RelTime approaches that do not require a molecular clock for estimating divergence times. Divergence times obtained using a GTR+Γ model differed only slightly (∼3% on average) from the expected times for NR datasets, but the difference was larger for NS datasets (∼10% on average). The use of only a few calibrations reduced these biases considerably (∼5%). Confidence and credibility intervals from GTR+Γ analysis usually contained correct times. Therefore, the bias introduced by the use of the GTR+Γ model to analyze datasets, in which the time-reversibility and stationarity assumptions are violated, is likely not large and can be reduced by applying multiple calibrations.AvailabilityAll datasets are deposited in Figshare: https://doi.org/10.6084/[email protected]


2019 ◽  
Author(s):  
Tamara Spasojevic ◽  
Gavin R. Broad ◽  
Ilari E. Sääksjärvi ◽  
Martin Schwarz ◽  
Masato Ito ◽  
...  

ABSTRACTTaxon sampling is a central aspect of phylogenetic study design, but it has received limited attention in the context of molecular dating and especially in the framework of total-evidence dating, a widely used dating approach that directly integrates molecular and morphological information from extant and fossil taxa. We here assess the impact of different outgroup sampling schemes on age estimates in a total-evidence dating analysis under the uniform tree prior. Our study group are Pimpliformes, a highly diverse, rapidly radiating group of parasitoid wasps of the family Ichneumonidae. We cover 201 extant and 79 fossil taxa, including the oldest fossils of the family from the Early Cretaceous and the first unequivocal representatives of extant subfamilies from the mid Paleogene. Based on newly compiled molecular data from ten nuclear genes and a morphological matrix that includes 222 characters, we show that age estimates become both older and less precise with the inclusion of more distant and more poorly sampled outgroups. In addition, we discover an artefact that might be detrimental for total-evidence dating: “bare-branch attraction”, namely high attachment probabilities of, especially, older fossils to terminal branches for which morphological data are missing. After restricting outgroup sampling and adding morphological data for the previously attracting, bare branches, we recover a Middle and Early Jurassic origin for Pimpliformes and Ichneumonidae, respectively. This first age estimate for the group not only suggests an older origin than previously thought, but also that diversification of the crown group happened before the Cretaceous-Paleogene boundary. Our case study demonstrates that in order to obtain robust age estimates, total-evidence dating studies need to be based on a thorough and balanced sampling of both extant and fossil taxa, with the aim of minimizing evolutionary rate heterogeneity and missing morphological information.


2019 ◽  
Author(s):  
Qiqing Tao ◽  
Koichiro Tamura ◽  
Beatriz Mello ◽  
Sudhir Kumar

AbstractConfidence intervals (CIs) depict the statistical uncertainty surrounding evolutionary divergence time estimates. They capture variance contributed by the finite number of sequences and sites used in the alignment, deviations of evolutionary rates from a strict molecular clock in a phylogeny, and uncertainty associated with clock calibrations. Reliable tests of biological hypotheses demand reliable CIs. However, current non-Bayesian methods may produce unreliable CIs because they do not incorporate rate variation among lineages and interactions among clock calibrations properly. Here, we present a new analytical method to calculate CIs of divergence times estimated using the RelTime method, along with an approach to utilize multiple calibration uncertainty densities in these analyses. Empirical data analyses showed that the new methods produce CIs that overlap with Bayesian highest posterior density (HPD) intervals. In the analysis of computer-simulated data, we found that RelTime CIs show excellent average coverage probabilities, i.e., the true time is contained within the CIs with a 95% probability. These developments will encourage broader use of computationally-efficient RelTime approach in molecular dating analyses and biological hypothesis testing.


2013 ◽  
Vol 299 (3) ◽  
pp. 585-601 ◽  
Author(s):  
Kathrin Feldberg ◽  
Jochen Heinrichs ◽  
Alexander R. Schmidt ◽  
Jiří Váňa ◽  
Harald Schneider

Fossil Record ◽  
2017 ◽  
Vol 20 (2) ◽  
pp. 201-213 ◽  
Author(s):  
Julia Bechteler ◽  
Alexander R. Schmidt ◽  
Matthew A. M. Renner ◽  
Bo Wang ◽  
Oscar Alejandro Pérez-Escobar ◽  
...  

Abstract. DNA-based divergence time estimates suggested major changes in the composition of epiphyte lineages of liverworts during the Cretaceous; however, evidence from the fossil record is scarce. We present the first Cretaceous fossil of the predominantly epiphytic leafy liverwort genus Radula in ca. 100 Myr old Burmese amber. The fossil's exquisite preservation allows first insights into the morphology of early crown group representatives of Radula occurring in gymnosperm-dominated forests. Ancestral character state reconstruction aligns the fossil with the crown group of Radula subg. Odontoradula; however, corresponding divergence time estimates using the software BEAST lead to unrealistically old age estimates. Alternatively, assignment of the fossil to the stem of subg. Odontoradula results in a stem age estimate of Radula of 227.8 Ma (95 % highest posterior density (HPD): 165.7–306.7) and a crown group estimate of 176.3 Ma (135.1–227.4), in agreement with analyses employing standard substitution rates (stem age 235.6 Ma (142.9–368.5), crown group age 183.8 Ma (109.9–289.1)). The fossil likely belongs to the stem lineage of Radula subg. Odontoradula. The fossil's modern morphology suggests that switches from gymnosperm to angiosperm phorophytes occurred without changes in plant body plans in epiphytic liverworts. The fossil provides evidence for striking morphological homoplasy in time. Even conservative node assignments of the fossil support older rather than younger age estimates of the Radula crown group, involving origins for most extant subgenera by the end of the Cretaceous and diversification of their crown groups in the Cenozoic.


2019 ◽  
Vol 69 (4) ◽  
pp. 660-670 ◽  
Author(s):  
Tom Carruthers ◽  
Michael J Sanderson ◽  
Robert W Scotland

Abstract Rate variation adds considerable complexity to divergence time estimation in molecular phylogenies. Here, we evaluate the impact of lineage-specific rates—which we define as among-branch-rate-variation that acts consistently across the entire genome. We compare its impact to residual rates—defined as among-branch-rate-variation that shows a different pattern of rate variation at each sampled locus, and gene-specific rates—defined as variation in the average rate across all branches at each sampled locus. We show that lineage-specific rates lead to erroneous divergence time estimates, regardless of how many loci are sampled. Further, we show that stronger lineage-specific rates lead to increasing error. This contrasts to residual rates and gene-specific rates, where sampling more loci significantly reduces error. If divergence times are inferred in a Bayesian framework, we highlight that error caused by lineage-specific rates significantly reduces the probability that the 95% highest posterior density includes the correct value, and leads to sensitivity to the prior. Use of a more complex rate prior—which has recently been proposed to model rate variation more accurately—does not affect these conclusions. Finally, we show that the scale of lineage-specific rates used in our simulation experiments is comparable to that of an empirical data set for the angiosperm genus Ipomoea. Taken together, our findings demonstrate that lineage-specific rates cause error in divergence time estimates, and that this error is not overcome by analyzing genomic scale multilocus data sets. [Divergence time estimation; error; rate variation.]


2019 ◽  
Vol 69 (1) ◽  
pp. 1-16 ◽  
Author(s):  
Yuan Nie ◽  
Charles S P Foster ◽  
Tianqi Zhu ◽  
Ru Yao ◽  
David A Duchêne ◽  
...  

Abstract Establishing an accurate evolutionary timescale for green plants (Viridiplantae) is essential to understanding their interaction and coevolution with the Earth’s climate and the many organisms that rely on green plants. Despite being the focus of numerous studies, the timing of the origin of green plants and the divergence of major clades within this group remain highly controversial. Here, we infer the evolutionary timescale of green plants by analyzing 81 protein-coding genes from 99 chloroplast genomes, using a core set of 21 fossil calibrations. We test the sensitivity of our divergence-time estimates to various components of Bayesian molecular dating, including the tree topology, clock models, clock-partitioning schemes, rate priors, and fossil calibrations. We find that the choice of clock model affects date estimation and that the independent-rates model provides a better fit to the data than the autocorrelated-rates model. Varying the rate prior and tree topology had little impact on age estimates, with far greater differences observed among calibration choices and clock-partitioning schemes. Our analyses yield date estimates ranging from the Paleoproterozoic to Mesoproterozoic for crown-group green plants, and from the Ediacaran to Middle Ordovician for crown-group land plants. We present divergence-time estimates of the major groups of green plants that take into account various sources of uncertainty. Our proposed timeline lays the foundation for further investigations into how green plants shaped the global climate and ecosystems, and how embryophytes became dominant in terrestrial environments.


2019 ◽  
Vol 37 (1) ◽  
pp. 280-290 ◽  
Author(s):  
Qiqing Tao ◽  
Koichiro Tamura ◽  
Beatriz Mello ◽  
Sudhir Kumar

Abstract Confidence intervals (CIs) depict the statistical uncertainty surrounding evolutionary divergence time estimates. They capture variance contributed by the finite number of sequences and sites used in the alignment, deviations of evolutionary rates from a strict molecular clock in a phylogeny, and uncertainty associated with clock calibrations. Reliable tests of biological hypotheses demand reliable CIs. However, current non-Bayesian methods may produce unreliable CIs because they do not incorporate rate variation among lineages and interactions among clock calibrations properly. Here, we present a new analytical method to calculate CIs of divergence times estimated using the RelTime method, along with an approach to utilize multiple calibration uncertainty densities in dating analyses. Empirical data analyses showed that the new methods produce CIs that overlap with Bayesian highest posterior density intervals. In the analysis of computer-simulated data, we found that RelTime CIs show excellent average coverage probabilities, that is, the actual time is contained within the CIs with a 94% probability. These developments will encourage broader use of computationally efficient RelTime approaches in molecular dating analyses and biological hypothesis testing.


Sign in / Sign up

Export Citation Format

Share Document