scholarly journals Coalescent-based phylogenetic inference from genes with unequivocal historical signal suggests a polytomy at the root of the placental mammal tree of life

2018 ◽  
Author(s):  
Filipe R. R. Moreira ◽  
Carlos G. Schrago

AbstractFinding the correct place of the root of the Placentalia tree of life is an unsolved problem in mammalian systematics. Three major competing hypotheses were proposed, alternating the position of the supraordinal taxa Afrotheria, Xenarthra, or Atlantogenata as sister lineages to the remaining placentals. While all three phylogenetic associations were inferred by studies conducted so far, recent assessments applying heterogeneous models and coalescent-based phylogenetic methods found robust support for the Atlantogenata hypothesis. Current developments in theoretical phylogenetics, such as the demonstration that concatenating genes may bias the likelihood function, and that summary coalescent-based phylogenetic methods are sensitive to errors in gene tree estimation, calls for a reevaluation of the early placental split problem. We investigated the phylogenetic relationships between placental superorders by restricting the analysis to subsets of genes with unequivocal phylogenetic signal. In contrast to previous works, we show that the hypothesis of a polytomy at the placental root could not be rejected under the multispecies coalescent model. This result endorses conclusions from analyses of retroposon insertion patterns. We provide an analytical framework to access empirical polytomies employing summary coalescent methods and topological tests, helping the investigation of poorly resolved nodes in the tree of life.

Author(s):  
Tianqi Zhu ◽  
Ziheng Yang

Abstract The multispecies coalescent (MSC) model provides a natural framework for species tree estimation accounting for gene-tree conflicts. While a number of species tree methods under the MSC have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood (ISML) and maximum likelihood (ML). We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case major differences exist among the methods. Fulllikelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes while these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation.


2020 ◽  
Vol 36 (18) ◽  
pp. 4819-4821
Author(s):  
Anastasiia Kim ◽  
James H Degnan

Abstract Summary PRANC computes the Probabilities of RANked gene tree topologies under the multispecies coalescent. A ranked gene tree is a gene tree accounting for the temporal ordering of internal nodes. PRANC can also estimate the maximum likelihood (ML) species tree from a sample of ranked or unranked gene tree topologies. It estimates the ML tree with estimated branch lengths in coalescent units. Availability and implementation PRANC is written in C++ and freely available at github.com/anastasiiakim/PRANC. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 70 (1) ◽  
pp. 49-66 ◽  
Author(s):  
Paul M Hime ◽  
Alan R Lemmon ◽  
Emily C Moriarty Lemmon ◽  
Elizabeth Prendini ◽  
Jeremy M Brown ◽  
...  

Abstract Molecular phylogenies have yielded strong support for many parts of the amphibian Tree of Life, but poor support for the resolution of deeper nodes, including relationships among families and orders. To clarify these relationships, we provide a phylogenomic perspective on amphibian relationships by developing a taxon-specific Anchored Hybrid Enrichment protocol targeting hundreds of conserved exons which are effective across the class. After obtaining data from 220 loci for 286 species (representing 94% of the families and 44% of the genera), we estimate a phylogeny for extant amphibians and identify gene tree–species tree conflict across the deepest branches of the amphibian phylogeny. We perform locus-by-locus genealogical interrogation of alternative topological hypotheses for amphibian monophyly, focusing on interordinal relationships. We find that phylogenetic signal deep in the amphibian phylogeny varies greatly across loci in a manner that is consistent with incomplete lineage sorting in the ancestral lineage of extant amphibians. Our results overwhelmingly support amphibian monophyly and a sister relationship between frogs and salamanders, consistent with the Batrachia hypothesis. Species tree analyses converge on a small set of topological hypotheses for the relationships among extant amphibian families. These results clarify several contentious portions of the amphibian Tree of Life, which in conjunction with a set of vetted fossil calibrations, support a surprisingly younger timescale for crown and ordinal amphibian diversification than previously reported. More broadly, our study provides insight into the sources, magnitudes, and heterogeneity of support across loci in phylogenomic data sets.[AIC; Amphibia; Batrachia; Phylogeny; gene tree–species tree discordance; genomics; information theory.]


2020 ◽  
Vol 69 (5) ◽  
pp. 830-847 ◽  
Author(s):  
Xiyun Jiao ◽  
Tomáš Flouri ◽  
Bruce Rannala ◽  
Ziheng Yang

Abstract Recent analyses of genomic sequence data suggest cross-species gene flow is common in both plants and animals, posing challenges to species tree estimation. We examine the levels of gene flow needed to mislead species tree estimation with three species and either episodic introgressive hybridization or continuous migration between an outgroup and one ingroup species. Several species tree estimation methods are examined, including the majority-vote method based on the most common gene tree topology (with either the true or reconstructed gene trees used), the UPGMA method based on the average sequence distances (or average coalescent times) between species, and the full-likelihood method based on multilocus sequence data. Our results suggest that the majority-vote method based on gene tree topologies is more robust to gene flow than the UPGMA method based on coalescent times and both are more robust than likelihood assuming a multispecies coalescent (MSC) model with no cross-species gene flow. Comparison of the continuous migration model with the episodic introgression model suggests that a small amount of gene flow per generation can cause drastic changes to the genetic history of the species and mislead species tree methods, especially if the species diverged through radiative speciation events. Estimates of parameters under the MSC with gene flow suggest that African mosquito species in the Anopheles gambiae species complex constitute such an example of extreme impact of gene flow on species phylogeny. [IM; introgression; migration; MSci; multispecies coalescent; species tree.]


2021 ◽  
Author(s):  
Tauana Junqueira Cunha ◽  
James Davis Reimer ◽  
Gonzalo Giribet

Abstract Phylogenetic analyses may suffer from multiple sources of error leading to conflict between genes and methods of inference. The evolutionary history of the mollusc clade Vetigastropoda makes them susceptible to these conflicts, their higher level phylogeny remaining largely unresolved. Originating over 350 million years ago, vetigastropods were the dominant marine snails in the Paleozoic. Multiple extinction events and new radiations have resulted in both very long and very short branches and a large extant diversity of over 4000 species. This is the perfect setting of a hard phylogenetic question in which sources of conflict can be explored. We present 41 new transcriptomes across the diversity of vetigastropods (62 terminals total), and provide the first genomic-scale phylogeny for the group. We find that deep divergences differ from previous studies in which long branch attraction was likely pervasive. Robust results leading to changes in taxonomy include the paraphyly of the order Lepetellida and the family Tegulidae. Tectinae subfam. nov. is designated for the clade comprising Tectus, Cittarium and Rochia. For two early divergences, topologies disagreed between concatenated analyses using site heterogeneous models vs. concatenated partitioned analyses and summary coalescent methods. We investigated rate and composition heterogeneity among genes, as well as missing data by locus and by taxon, none of which had an impact on the inferred topologies. We also found no evidence for ancient introgression throughout the phylogeny. We further tested whether uninformative genes and over-partitioning were responsible for this discordance by evaluating the phylogenetic signal of individual genes using likelihood mapping, and by analyzing the most informative genes with a full multispecies coalescent model. We find that most genes are not informative at the two conflicting nodes, but neither this nor gene-wise partitioning are the cause of discordant results. New method implementations that simultaneously integrate amino acid profile mixture models and the multispecies coalescent might be necessary to resolve these and other recalcitrant nodes in the Tree of Life.


2018 ◽  
Author(s):  
Fábio K. Mendes ◽  
Jesualdo A. Fuentes-González ◽  
Joshua G. Schraiber ◽  
Matthew W. Hahn

AbstractModern phylogenetic methods used to study how traits evolve often require a single species tree as input, and do not take underlying gene tree discordance into account. Such approaches may lead to errors in phylogenetic inference because of hemiplasy — the process by which single changes on discordant trees appear to be homoplastic when analyzed on a fixed species tree. Hemiplasy has been shown to affect inferences about discrete traits, but it is still unclear whether complications arise when quantitative traits are analyzed. In order to address this question and to characterize the effect of hemiplasy on traits controlled by a large number of loci, we present a multispecies coalescent model for quantitative traits evolving along a species tree. We demonstrate theoretically and through simulations that hemiplasy decreases the expected covariances in trait values between more closely related species relative to the covariances between more distantly related species. This effect leads to an overestimation of a trait’s evolutionary rate parameter, to a decrease of the trait’s phylogenetic signal, and to increased false positive rates in comparative methods such as the phylogenetic ANOVA. We also show that hemiplasy affects discrete, threshold traits that have an underlying continuous liability, leading to false inferences of convergent evolution. The number of loci controlling a quantitative trait appears to be irrelevant to the trends reported, for all analyses. Our results demonstrate that gene tree discordance and hemiplasy are a problem for all types of traits, across a wide range of methods. Our analyses also point to the conditions under which hemiplasy is most likely to be a factor, and suggest future approaches that may mitigate its effects.


2018 ◽  
Author(s):  
Nicolás Mongiardino Koch ◽  
Simon E. Coppard ◽  
Harilaos A. Lessios ◽  
Derek E. G. Briggs ◽  
Rich Mooi ◽  
...  

AbstractBackgroundEchinoidea is a clade of marine animals including sea urchins, heart urchins, sand dollars and sea biscuits. Found in benthic habitats across all latitudes, echinoids are key components of marine communities such as coral reefs and kelp forests. A little over 1,000 species inhabit the oceans today, a diversity that traces its roots back at least to the Permian. Although much effort has been devoted to elucidating the echinoid tree of life using a variety of morphological data, molecular attempts have relied on only a handful of genes. Both of these approaches have had limited success at resolving the deepest nodes of the tree, and their disagreement over the positions of a number of clades remains unresolved.ResultsWe performed de novo sequencing and assembly of 17 transcriptomes to complement available genomic resources of sea urchins and produce the first phylogenomic analysis of the clade. Multiple methods of probabilistic inference recovered identical topologies, with virtually all nodes showing maximum support. In contrast, the coalescent-based method ASTRAL-II resolved one node differently, a result apparently driven by gene tree error induced by evolutionary rate heterogeneity. Regardless of the method employed, our phylogenetic structure deviates from the currently accepted classification of echinoids, with neither Acroechinoidea (all euechinoids except echinothurioids), nor Clypeasteroida (sand dollars and sea biscuits) being monophyletic as currently defined. We demonstrate the strength and distribution of phylogenetic signal throughout the genome for novel resolutions of these lineages and rule out systematic biases as possible explanations.ConclusionsOur investigation substantially augments the molecular resources available for sea urchins, providing the first transcriptomes for many of its main lineages. Using this expanded genomic dataset, we resolve the position of several clades in agreement with early molecular analyses but in disagreement with morphological data. Our efforts settle multiple phylogenetic uncertainties, including the position of the enigmatic deep-sea echinothurioids and the identity of the sister clade to sand dollars. We offer a detailed assessment of evolutionary scenarios that could reconcile our findings with morphological evidence, opening up new lines of research into the development and evolutionary history of this ancient clade.


2016 ◽  
Author(s):  
Mozes P.K. Blom ◽  
Jason G. Bragg ◽  
Sally Potter ◽  
Craig Moritz

AbstractAccurate gene tree inference is an important aspect of species tree estimation in a summary-coalescent framework. Yet, in empirical studies, inferred gene trees differ in accuracy due to stochastic variation in phylogenetic signal between targeted loci. Empiricists should therefore examine the consistency of species tree inference, while accounting for the observed heterogeneity in gene tree resolution of phylogenomic datasets. Here, we assess the impact of gene tree estimation error on summary-coalescent species tree inference by screening ~2000 exonic loci based on gene tree resolution prior to phylogenetic inference. We focus on a phylogenetically challenging radiation of Australian lizards (genus Cryptoblepharus, Scincidae) and explore effects on topology and support. We identify a well-supported topology based on all loci and find that a relatively small number of high-resolution gene trees can be sufficient to converge on the same topology. Adding gene trees with decreasing resolution produced a generally consistent topology, and increased confidence for specific bipartitions that were poorly supported when using a small number of informative loci. This corroborates coalescent-based simulation studies that have highlighted the need for a large number of loci to confidently resolve challenging relationships and refutes the notion that low-resolution gene trees introduce phylogenetic noise. Further, our study also highlights the value of quantifying changes in nodal support across locus subsets of increasing size (but decreasing gene tree resolution). Such detailed analyses can reveal anomalous fluctuations in support at some nodes, suggesting the possibility of model violation. By characterizing the heterogeneity in phylogenetic signal among loci, we can account for uncertainty in gene tree estimation and assess its effect on the consistency of the species tree estimate. We suggest that the evaluation of gene tree resolution should be incorporated in the analysis of empirical phylogenomic datasets. This will ultimately increase our confidence in species tree estimation using summary-coalescent methods and enable us to exploit genomic data for phylogenetic inference.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6399 ◽  
Author(s):  
Gustavo A. Bravo ◽  
Alexandre Antonelli ◽  
Christine D. Bacon ◽  
Krzysztof Bartoszek ◽  
Mozes P. K. Blom ◽  
...  

Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.


Sign in / Sign up

Export Citation Format

Share Document