scholarly journals A matter of phylogenetic scale: Distinguishing incomplete lineage sorting from lateral gene transfer as the cause of gene tree discord in recent versus deep diversification histories

2018 ◽  
Vol 105 (3) ◽  
pp. 376-384 ◽  
Author(s):  
L. Lacey Knowles ◽  
Huateng Huang ◽  
Jeet Sukumaran ◽  
Stephen A. Smith
Author(s):  
Huateng Huang ◽  
Jeet Sukumaran ◽  
Stephen A Smith ◽  
L.Lacey Knowles

Despite recent efforts that have produced data sets with hundreds and thousands of gene regions to resolve regions of the tree of life, recalcitrant nodes persist and disagreement among genes as well as disagreement between individual gene trees and species trees are common. There are a number of evolutionary processes that contribute to these conflicts between gene trees and species trees, including deep coalescence (lineage sorting), horizontal gene transfer or hybridization, etc. While for some of these processes, we have very powerful and sophisticated models that uses the conflict in the gene trees as information that contributes materially to correctly inferring the species tree, such as the multispecies coalescent (MSC). However, usage of these models require a priori recognition of relevant processes, which is often unknown for empirical dataset. Here we propose a new perspective to not only identify the cause of discord among gene trees, but also use it to classify loci by the underlying cause of discord to identify subsets of loci for analysis with the goal of improving phylogenetic accuracy. This approach differs fundamentally from all other criteria used for making decisions about which loci to include in a phylogenetic analysis. In particular, the choice of loci in this framework is based on identifying those that reflect descent from a common ancestor (as opposed to other processes), and thereby can minimize problems with model misspecification. We present preliminary results that demonstrate the potential of this framework in distinguishing the lateral gene transfer (LGT) from incomplete lineage sorting (ILS) process, as implemented in a new software package CLASSIPHY, while also highlighting areas for further development and testing. We discussed why such methods (i) are critical to improving phylogenetic accuracy with the increased complexity of genomic/transcriptomic datasets, and that (ii) characterizing patterns of discordance and the contribution of different processes to this discordance is itself of interest for generating hypotheses about the role of lateral gene transfer, gene duplication, and incomplete lineage sorting during the divergence of different taxa.


2017 ◽  
Author(s):  
Huateng Huang ◽  
Jeet Sukumaran ◽  
Stephen A Smith ◽  
L.Lacey Knowles

Despite recent efforts that have produced data sets with hundreds and thousands of gene regions to resolve regions of the tree of life, recalcitrant nodes persist and disagreement among genes as well as disagreement between individual gene trees and species trees are common. There are a number of evolutionary processes that contribute to these conflicts between gene trees and species trees, including deep coalescence (lineage sorting), horizontal gene transfer or hybridization, etc. While for some of these processes, we have very powerful and sophisticated models that uses the conflict in the gene trees as information that contributes materially to correctly inferring the species tree, such as the multispecies coalescent (MSC). However, usage of these models require a priori recognition of relevant processes, which is often unknown for empirical dataset. Here we propose a new perspective to not only identify the cause of discord among gene trees, but also use it to classify loci by the underlying cause of discord to identify subsets of loci for analysis with the goal of improving phylogenetic accuracy. This approach differs fundamentally from all other criteria used for making decisions about which loci to include in a phylogenetic analysis. In particular, the choice of loci in this framework is based on identifying those that reflect descent from a common ancestor (as opposed to other processes), and thereby can minimize problems with model misspecification. We present preliminary results that demonstrate the potential of this framework in distinguishing the lateral gene transfer (LGT) from incomplete lineage sorting (ILS) process, as implemented in a new software package CLASSIPHY, while also highlighting areas for further development and testing. We discussed why such methods (i) are critical to improving phylogenetic accuracy with the increased complexity of genomic/transcriptomic datasets, and that (ii) characterizing patterns of discordance and the contribution of different processes to this discordance is itself of interest for generating hypotheses about the role of lateral gene transfer, gene duplication, and incomplete lineage sorting during the divergence of different taxa.


2015 ◽  
Author(s):  
Ruth Davidson ◽  
Pranjal Vachaspati ◽  
Siavash Mirarab ◽  
Tandy Warnow

Background: Species tree estimation is challenged by gene tree heterogeneity resulting from biological processes such as duplication and loss, hybridization, incomplete lineage sorting (ILS), and horizontal gene transfer (HGT). Mathematical theory about reconstructing species trees in the presence of HGT alone or ILS alone suggests that quartet-based species tree methods (known to be statistically consistent under ILS, or under bounded amounts of HGT) might be effective techniques for estimating species trees when both HGT and ILS are present. Results: We evaluated several publicly available coalescent-based methods and concatenation under maximum likelihood on simulated datasets with moderate ILS and varying levels of HGT. Our study shows that two quartet-based species tree estimation methods (ASTRAL-2 and weighted Quartets MaxCut) are both highly accurate, even on datasets with high rates of HGT. In contrast, although NJst and concatenation using maximum likelihood are highly accurate under low HGT, they are less robust to high HGT rates. Conclusion: Our study shows that quartet-based species-tree estimation methods can be highly accurate under the presence of both HGT and ILS. The study suggests the possibility that some quartet-based methods might be statistically consistent under phylogenomic models of gene tree heterogeneity with both HGT and ILS. Keywords: phylogenomics; HGT; ILS; summary methods; concatenation


2020 ◽  
Author(s):  
Liming Cai ◽  
Zhenxiang Xi ◽  
Emily Moriarty Lemmon ◽  
Alan R Lemmon ◽  
Austin Mast ◽  
...  

Abstract The genomic revolution offers renewed hope of resolving rapid radiations in the Tree of Life. The development of the multispecies coalescent (MSC) model and improved gene tree estimation methods can better accommodate gene tree heterogeneity caused by incomplete lineage sorting (ILS) and gene tree estimation error stemming from the short internal branches. However, the relative influence of these factors in species tree inference is not well understood. Using anchored hybrid enrichment, we generated a data set including 423 single-copy loci from 64 taxa representing 39 families to infer the species tree of the flowering plant order Malpighiales. This order includes nine of the top ten most unstable nodes in angiosperms, which have been hypothesized to arise from the rapid radiation during the Cretaceous. Here, we show that coalescent-based methods do not resolve the backbone of Malpighiales and concatenation methods yield inconsistent estimations, providing evidence that gene tree heterogeneity is high in this clade. Despite high levels of ILS and gene tree estimation error, our simulations demonstrate that these two factors alone are insufficient to explain the lack of resolution in this order. To explore this further, we examined triplet frequencies among empirical gene trees and discovered some of them deviated significantly from those attributed to ILS and estimation error, suggesting gene flow as an additional and previously unappreciated phenomenon promoting gene tree variation in Malpighiales. Finally, we applied a novel method to quantify the relative contribution of these three primary sources of gene tree heterogeneity and demonstrated that ILS, gene tree estimation error, and gene flow contributed to 10.0%, 34.8%, and 21.4% of the variation, respectively. Together, our results suggest that a perfect storm of factors likely influence this lack of resolution, and further indicate that recalcitrant phylogenetic relationships like the backbone of Malpighiales may be better represented as phylogenetic networks. Thus, reducing such groups solely to existing models that adhere strictly to bifurcating trees greatly oversimplifies reality, and obscures our ability to more clearly discern the process of evolution.


2020 ◽  
Author(s):  
Fernando Lopes ◽  
Larissa R Oliveira ◽  
Amanda Kessler ◽  
Yago Beux ◽  
Enrique Crespo ◽  
...  

Abstract The phylogeny and systematics of fur seals and sea lions (Otariidae) have long been studied with diverse data types, including an increasing amount of molecular data. However, only a few phylogenetic relationships have reached acceptance because of strong gene-tree species tree discordance. Divergence times estimates in the group also vary largely between studies. These uncertainties impeded the understanding of the biogeographical history of the group, such as when and how trans-equatorial dispersal and subsequent speciation events occurred. Here we used high-coverage genome-wide sequencing for 14 of the 15 species of Otariidae to elucidate the phylogeny of the family and its bearing on the taxonomy and biogeographical history. Despite extreme topological discordance among gene trees, we found a fully supported species tree that agrees with the few well-accepted relationships and establishes monophyly of the genus Arctocephalus. Our data support a relatively recent trans-hemispheric dispersal at the base of a southern clade, which rapidly diversified into six major lineages between 3 to 2.5 Ma. Otaria diverged first, followed by Phocarctos and then four major lineages within Arctocephalus. However, we found Zalophus to be non-monophyletic, with California (Z. californianus) and Steller sea lions (Eumetopias jubatus) grouping closer than the Galapagos sea lion (Z. wollebaeki) with evidence for introgression between the two genera. Overall, the high degree of genealogical discordance was best explained by incomplete lineage sorting resulting from quasi-simultaneous speciation within the southern clade with introgresssion playing a subordinate role in explaining the incongruence among and within prior phylogenetic studies of the family.


2022 ◽  
Vol 12 ◽  
Author(s):  
Martha Kandziora ◽  
Petr Sklenář ◽  
Filip Kolář ◽  
Roswitha Schmickl

A major challenge in phylogenetics and -genomics is to resolve young rapidly radiating groups. The fast succession of species increases the probability of incomplete lineage sorting (ILS), and different topologies of the gene trees are expected, leading to gene tree discordance, i.e., not all gene trees represent the species tree. Phylogenetic discordance is common in phylogenomic datasets, and apart from ILS, additional sources include hybridization, whole-genome duplication, and methodological artifacts. Despite a high degree of gene tree discordance, species trees are often well supported and the sources of discordance are not further addressed in phylogenomic studies, which can eventually lead to incorrect phylogenetic hypotheses, especially in rapidly radiating groups. We chose the high-Andean Asteraceae genus Loricaria to shed light on the potential sources of phylogenetic discordance and generated a phylogenetic hypothesis. By accounting for paralogy during gene tree inference, we generated a species tree based on hundreds of nuclear loci, using Hyb-Seq, and a plastome phylogeny obtained from off-target reads during target enrichment. We observed a high degree of gene tree discordance, which we found implausible at first sight, because the genus did not show evidence of hybridization in previous studies. We used various phylogenomic analyses (trees and networks) as well as the D-statistics to test for ILS and hybridization, which we developed into a workflow on how to tackle phylogenetic discordance in recent radiations. We found strong evidence for ILS and hybridization within the genus Loricaria. Low genetic differentiation was evident between species located in different Andean cordilleras, which could be indicative of substantial introgression between populations, promoted during Pleistocene glaciations, when alpine habitats shifted creating opportunities for secondary contact and hybridization.


2017 ◽  
Author(s):  
Meng Wu ◽  
Jamie L. Kostyun ◽  
Matthew W. Hahn ◽  
Leonie Moyle

ABSTRACTPhylogenetic analyses of trait evolution can provide insight into the evolutionary processes that initiate and drive phenotypic diversification. However, recent phylogenomic studies have revealed extensive gene tree-species tree discordance, which can lead to incorrect inferences of trait evolution if only a single species tree is used for analysis. This phenomenon—dubbed “hemiplasy”—is particularly important to consider during analyses of character evolution in rapidly radiating groups, where discordance is widespread. Here we generate whole-transcriptome data for a phylogenetic analysis of 14 species in the plant genus Jaltomata (the sister clade to Solanum), which has experienced rapid, recent trait evolution, including in fruit and nectar color, and flower size and shape. Consistent with other radiations, we find evidence for rampant gene tree discordance due to incomplete lineage sorting (ILS) and several introgression events among the well-supported subclades. Since both ILS and introgression increase the probability of hemiplasy, we perform several analyses that take discordance into account while identifying genes that might contribute to phenotypic evolution. Despite discordance, the history of fruit color evolution in Jaltomata can be inferred with high confidence, and we find evidence of de novo adaptive evolution at individual genes associated with fruit color variation. In contrast, hemiplasy appears to strongly affect inferences about floral character transitions in Jaltomata, and we identify candidate loci that could arise either from multiple lineage-specific substitutions or standing ancestral polymorphisms. Our analysis provides a generalizable example of how to manage discordance when identifying loci associated with trait evolution in a radiating lineage.


AoB Plants ◽  
2020 ◽  
Vol 12 (3) ◽  
Author(s):  
Nannie L Persson ◽  
Ingrid Toresen ◽  
Heidi Lie Andersen ◽  
Jenny E E Smedmark ◽  
Torsten Eriksson

Abstract The genus Potentilla (Rosaceae) has been subjected to several phylogenetic studies, but resolving its evolutionary history has proven challenging. Previous analyses recovered six, informally named, groups: the Argentea, Ivesioid, Fragarioides, Reptans, Alba and Anserina clades, but the relationships among some of these clades differ between data sets. The Reptans clade, which includes the type species of Potentilla, has been noticed to shift position between plastid and nuclear ribosomal data sets. We studied this incongruence by analysing four low-copy nuclear markers, in addition to chloroplast and nuclear ribosomal data, with a set of Bayesian phylogenetic and Multispecies Coalescent (MSC) analyses. A selective taxon removal strategy demonstrated that the included representatives from the Fragarioides clade, P. dickinsii and P. fragarioides, were the main sources of the instability seen in the trees. The Fragarioides species showed different relationships in each gene tree, and were only supported as a monophyletic group in a single marker when the Reptans clade was excluded from the analysis. The incongruences could not be explained by allopolyploidy, but rather by homoploid hybridization, incomplete lineage sorting or taxon sampling effects. When P. dickinsii and P. fragarioides were removed from the data set, a fully resolved, supported backbone phylogeny of Potentilla was obtained in the MSC analysis. Additionally, indications of autopolyploid origins of the Reptans and Ivesioid clades were discovered in the low-copy gene trees.


Author(s):  
Diego F Morales-Briones ◽  
Gudrun Kadereit ◽  
Delphine T Tefarikis ◽  
Michael J Moore ◽  
Stephen A Smith ◽  
...  

Abstract Gene tree discordance in large genomic data sets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The data set included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations. [Amaranthaceae; gene tree discordance; hybridization; incomplete lineage sorting; phylogenomics; species network; species tree; transcriptomics.]


Sign in / Sign up

Export Citation Format

Share Document