From population genetics to phylogeny: Uses and limits of mitochondrial DNA

1990 ◽  
Vol 3 (1) ◽  
pp. 111 ◽  
Author(s):  
RH Crozier

Mitochondrial DNA (mtDNA) is clonally and maternally inherited in all animals and in most plants. Mitochondrial gene content is similar although not identical in all eukaryotes. Because of these characteristics, mtDNA has a number of features useful to systematists for all levels of evolutionary divergence. Clonal inheritance leads to unusual confidence in constructing gene trees which are useful in population-level studies, such as in the detection of population subdivision. Maternal inheritance presents the opportunity to distinguish paternal from maternal gene flow. The clonal, or single-gene, nature of mtDNA inheritance leads to consideration of the expected convergence between gene- and species-trees. For closely related populations or species, it is desirable to use several genes to be sure that the correct species-tree is discovered; this means that, although mtDNA will be the most precise guide to the species tree because of its lower effective population size, nuclear genes should also be used in such studies. Although restriction fragment length polymorphisms dominated the field until recently, sequencing following DNA amplification using the polymerase chain reaction is now easier and opens up the use of preserved specimens to molecular systematists. Because mitochondria1 genes evolve at different rates, one of appropriate rate can be selected for almost any phylogenetic problem.

2020 ◽  
Author(s):  
Matthew H Van Dam ◽  
James B Henderson ◽  
Lauren Esposito ◽  
Michelle Trautwein

Abstract Ultraconserved genomic elements (UCEs) are generally treated as independent loci in phylogenetic analyses. The identification pipeline for UCE probes does not require prior knowledge of genetic identity, only selecting loci that are highly conserved, single copy, without repeats, and of a particular length. Here, we characterized UCEs from 11 phylogenomic studies across the animal tree of life, from birds to marine invertebrates. We found that within vertebrate lineages, UCEs are mostly intronic and intergenic, while in invertebrates, the majority are in exons. We then curated four different sets of UCE markers by genomic category from five different studies including: birds, mammals, fish, Hymenoptera (ants, wasps, and bees), and Coleoptera (beetles). Of genes captured by UCEs, we find that many are represented by two or more UCEs, corresponding to nonoverlapping segments of a single gene. We considered these UCEs to be nonindependent, merged all UCEs that belonged to a particular gene, constructed gene and species trees, and then evaluated the subsequent effect of merging cogenic UCEs on gene and species tree reconstruction. Average bootstrap support for merged UCE gene trees was significantly improved across all data sets apparently driven by the increase in loci length. Additionally, we conducted simulations and found that gene trees generated from merged UCEs were more accurate than those generated by unmerged UCEs. As loci length improves gene tree accuracy, this modest degree of UCE characterization and curation impacts downstream analyses and demonstrates the advantages of incorporating basic genomic characterizations into phylogenomic analyses. [Anchored hybrid enrichment; ants; ASTRAL; bait capture; carangimorph; Coleoptera; conserved nonexonic elements; exon capture; gene tree; Hymenoptera; mammal; phylogenomic markers; songbird; species tree; ultraconserved elements; weevils.]


2019 ◽  
Author(s):  
Matthew H. Van Dam ◽  
James B. Henderson ◽  
Lauren Esposito ◽  
Michelle Trautwein

ABSTRACTUltraconserved genomic elements (UCEs), are generally treated as independent loci in phylogenetic analyses. The identification pipeline for UCE probes is agnostic to genetic identity, only selecting loci that are highly conserved, single copy, without repeats, and of a particular length. Here we characterized UCEs from 12 phylogenomic studies across the animal tree of life, from birds to marine invertebrates. We found that within vertebrate lineages, UCEs are mostly intronic and intergenic, while in invertebrates, the majority are in exons. We then curated 4 different sets of UCE markers by genomic category from 5 different studies including; birds, mammals, fish, Hymenoptera (ants, wasps and bees) and Coleoptera (beetles). Of genes captured by UCEs, we find that many are represented by 2 or more UCEs, corresponding to non-overlapping segments of a single gene. We considered these UCEs to be non-independent, merged all UCEs that belonged to a particular gene, constructed gene and species trees, and then evaluated the subsequent effect of merging co-genic UCEs on gene and species tree reconstruction. Average bootstrap support for merged UCE gene trees were significantly improved across all datasets. Increased loci length appears to drive this increase in bootstrap support. Additionally, we found that gene trees generated from merged UCEs were more accurate than those generated by unmerged and randomly merged UCEs, based on our simulation study. This modest degree of UCE characterization and curation impacts downstream analyses and demonstrates the advantages of incorporating basic genomic characterizations into phylogenomic analyses.


2020 ◽  
Vol 15 (1) ◽  
Author(s):  
Sarah Christensen ◽  
Erin K. Molloy ◽  
Pranjal Vachaspati ◽  
Ananya Yammanuru ◽  
Tandy Warnow

Abstract Motivation Estimated gene trees are often inaccurate, due to insufficient phylogenetic signal in the single gene alignment, among other causes. Gene tree correction aims to improve the accuracy of an estimated gene tree by using computational techniques along with auxiliary information, such as a reference species tree or sequencing data. However, gene trees and species trees can differ as a result of gene duplication and loss (GDL), incomplete lineage sorting (ILS), and other biological processes. Thus gene tree correction methods need to take estimation error as well as gene tree heterogeneity into account. Many prior gene tree correction methods have been developed for the case where GDL is present. Results Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to ILS and/or HGT. We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-optimal tree refinement and completion (RF-OTRC) Problem, which seeks a refinement and completion of a singly-labeled gene tree with respect to a given singly-labeled species tree so as to minimize the Robinson−Foulds (RF) distance. Our extensive simulation study on 68,000 estimated gene trees shows that TRACTION matches or improves on the accuracy of well-established methods from the GDL literature when HGT and ILS are both present, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. We also show that a naive generalization of the RF-OTRC problem to multi-labeled trees is possible, but can produce misleading results where gene tree heterogeneity is due to GDL.


2022 ◽  
Author(s):  
XiaoXu Pang ◽  
Da-Yong Zhang

The species studied in any evolutionary investigation generally constitute a very small proportion of all the species currently existing or that have gone extinct. It is therefore likely that introgression, which is widespread across the tree of life, involves "ghosts," i.e., unsampled, unknown, or extinct lineages. However, the impact of ghost introgression on estimations of species trees has been rarely studied and is thus poorly understood. In this study, we use mathematical analysis and simulations to examine the robustness of species tree methods based on a multispecies coalescent model under gene flow sourcing from an extant or ghost lineage. We found that very low levels of extant or ghost introgression can result in anomalous gene trees (AGTs) on three-taxon rooted trees if accompanied by strong incomplete lineage sorting (ILS). In contrast, even massive introgression, with more than half of the recipient genome descending from the donor lineage, may not necessarily lead to AGTs. In cases involving an ingroup lineage (defined as one that diverged no earlier than the most basal species under investigation) acting as the donor of introgression, the time of root divergence among the investigated species was either underestimated or remained unaffected, but for the cases of outgroup ghost lineages acting as donors, the divergence time was generally overestimated. Under many conditions of ingroup introgression, the stronger the ILS was, the higher was the accuracy of estimating the time of root divergence, although the topology of the species tree is more prone to be biased by the effect of introgression.


2022 ◽  
Vol 12 ◽  
Author(s):  
Martha Kandziora ◽  
Petr Sklenář ◽  
Filip Kolář ◽  
Roswitha Schmickl

A major challenge in phylogenetics and -genomics is to resolve young rapidly radiating groups. The fast succession of species increases the probability of incomplete lineage sorting (ILS), and different topologies of the gene trees are expected, leading to gene tree discordance, i.e., not all gene trees represent the species tree. Phylogenetic discordance is common in phylogenomic datasets, and apart from ILS, additional sources include hybridization, whole-genome duplication, and methodological artifacts. Despite a high degree of gene tree discordance, species trees are often well supported and the sources of discordance are not further addressed in phylogenomic studies, which can eventually lead to incorrect phylogenetic hypotheses, especially in rapidly radiating groups. We chose the high-Andean Asteraceae genus Loricaria to shed light on the potential sources of phylogenetic discordance and generated a phylogenetic hypothesis. By accounting for paralogy during gene tree inference, we generated a species tree based on hundreds of nuclear loci, using Hyb-Seq, and a plastome phylogeny obtained from off-target reads during target enrichment. We observed a high degree of gene tree discordance, which we found implausible at first sight, because the genus did not show evidence of hybridization in previous studies. We used various phylogenomic analyses (trees and networks) as well as the D-statistics to test for ILS and hybridization, which we developed into a workflow on how to tackle phylogenetic discordance in recent radiations. We found strong evidence for ILS and hybridization within the genus Loricaria. Low genetic differentiation was evident between species located in different Andean cordilleras, which could be indicative of substantial introgression between populations, promoted during Pleistocene glaciations, when alpine habitats shifted creating opportunities for secondary contact and hybridization.


2020 ◽  
Author(s):  
Michael J. Sanderson ◽  
Michelle M. McMahon ◽  
Mike Steel

AbstractTerraces in phylogenetic tree space are sets of trees with identical optimality scores for a given data set, arising from missing data. These were first described for multilocus phylogenetic data sets in the context of maximum parsimony inference and maximum likelihood inference under certain model assumptions. Here we show how the mathematical properties that lead to terraces extend to gene tree - species tree problems in which the gene trees are incomplete. Inference of species trees from either sets of gene family trees subject to duplication and loss, or allele trees subject to incomplete lineage sorting, can exhibit terraces in their solution space. First, we show conditions that lead to a new kind of terrace, which stems from subtree operations that appear in reconciliation problems for incomplete trees. Then we characterize when terraces of both types can occur when the optimality criterion for tree search is based on duplication, loss or deep coalescence scores. Finally, we examine the impact of assumptions about the causes of losses: whether they are due to imperfect sampling or true evolutionary deletion.


2020 ◽  
Author(s):  
Ishrat Tanzila Farah ◽  
Md Muktadirul Islam ◽  
Kazi Tasnim Zinat ◽  
Atif Hasan Rahman ◽  
Md Shamsuzzoha Bayzid

AbstractSpecies tree estimation from multi-locus dataset is extremely challenging, especially in the presence of gene tree heterogeneity across the genome due to incomplete lineage sorting (ILS). Summary methods have been developed which estimate gene trees and then combine the gene trees to estimate a species tree by optimizing various optimization scores. In this study, we have formalized the concept of “phylogenomic terraces” in the species tree space, where multiple species trees with distinct topologies may have exactly the same optimization score (quartet score, extra lineage score, etc.) with respect to a collection of gene trees. We investigated the presence and implication of terraces in species tree estimation from multi-locus data by taking ILS into account. We analyzed two of the most popular ILS-aware optimization criteria: maximize quartet consistency (MQC) and minimize deep coalescence (MDC). Methods based on MQC are provably statistically consistent, whereas MDC is not a consistent criterion for species tree estimation. Our experiments, on a collection of dataset simulated under ILS, indicate that MDC-based methods may achieve competitive or identical quartet consistency score as MQC but could be significantly worse than MQC in terms of tree accuracy – demonstrating the presence and affect of phylogenomic terraces. This is the first known study that formalizes the concept of phylogenomic terraces in the context of species tree estimation from multi-locus data, and reports the presence and implications of terraces in species tree estimation under ILS.


2020 ◽  
Author(s):  
Rebecca S. Taylor ◽  
Ashley C. Bramwell ◽  
Rute Clemente-Carvalho ◽  
Nicholas A. Cairns ◽  
Frances Bonier ◽  
...  

ABSTRACTThe golden-crowned (Zonotrichia atricapilla) and white-crowned (Z. leucophrys) sparrows have been presented as a compelling case for rapid speciation. They display divergence in song and plumage with overlap in their breeding ranges implying reproductive isolation, but have almost identical mitochondrial genomes. Previous research proposed hybridization and subsequent mitochondrial introgression as an alternate explanation, but lacked robust nuclear gene trees to distinguish between introgression and incomplete lineage sorting. We test for signatures of these processes between Z. atricapilla and Z. leucophrys, and investigate the relationships among Z. leucophrys subspecies, using mitochondrial sequencing and a reduced representation nuclear genomic dataset. Contrary to the paraphyly evident in mitochondrial gene trees, we confirmed the reciprocal monophyly of Z. atricapilla and Z. leucophrys using large panels of single nucleotide polymorphism (SNPs). The pattern of cytonuclear discordance is consistent with limited, historical hybridization and mitochondrial introgression, rather than a recent origin and incomplete lineage sorting between recent sister species. We found evidence of nuclear phylogeographic structure within Z. leucophrys with two distinct clades. Altogether, our results support the true species status of Z. atricapilla and Z. leucophrys, and indicate deeper divergences between the two species than inferred using mitochondrial markers. Our results demonstrate the limitations of relying solely on mitochondrial DNA for taxonomy, and raise questions about the possibility of selection on the mitochondrial genome during temperature oscillations (e.g. during the Pleistocene). Historical mitochondrial introgression facilitated by past environmental changes could cause erroneous dating of lineage splitting in other taxa when based on mitochondrial DNA alone.


2016 ◽  
Author(s):  
W. Bryan Jennings

AbstractStudies using multi-locus coalescent methods to infer species trees or historical demographic parameters usually require the assumption that the gene tree for each locus (or SNP) is genealogically independent from the gene trees of other sampled loci. In practice, however, researchers have used two different criteria to delimit independent loci in phylogenomic studies. The first criterion, which directly addresses the condition of genealogical independence of sampled loci, considers the long-term effects of homologous recombination and effective population size on linkage between two loci. In contrast, the second criterion, which only considers the single-generation effects of recombination in the meioses of individuals, identifies sampled loci as being independent of each other if they undergo Mendelian independent assortment. Methods that use these criteria to estimate the number of independent loci per genome as well as intra-chromosomal “distance thresholds” that can be used to delimit independent loci in phylogenomic datasets are reviewed. To compare the efficacy of each criterion, they are applied to two species (an invertebrate and vertebrate) for which relevant genetic and genomic data are available. Although the independent assortment criterion is relatively easy to apply, the results of this study show that it is overly conservative and therefore its use would unfairly restrict the sizes of phylogenomic datasets. It is therefore recommended that researchers only refer to genealogically independent loci when discussing the independent loci assumption in phylogenomics and avoid using terms that may conflate this assumption with independent assortment. Moreover, whenever feasible, researchers should use methods for delimiting putatively independent loci that take into account both homologous recombination and effective population size (i.e., long-term effective recombination).


2017 ◽  
Author(s):  
Damien M. de Vienne ◽  
Fran Supek ◽  
Toni Gabaldon

AbstractBackgroundOvertraining occurs when an optimization process is applied for too many steps, leading to a model describing noise in addition to the signal present in the data. This effect may affect typical approaches for species tree reconstruction that use maximum likelihood optimization procedures on a small sample of concatenated genes. In this context, overtraining may result in trees better describing the specific evolutionary history of the sampled genes rather than the sought evolutionary relationships among the species.ResultsUsing a cross-validation-like approach on real and simulated datasets we showed that overtraining occurs in a significant fraction of cases, leading to species trees that are more distant from a gold-standard reference tree than a previously considered (and rejected) solution in the optimization process. However, we show that the shape of the likelihood curve is informative of the optimal stopping point. As expected, overtraining is aggravated in smaller gene samples and in datasets with increased levels of topological variation among gene trees, but occurs also in controlled, simulated scenarios where a common underlying topology is enforced.ConclusionsOvertraining is frequent in species tree reconstruction and leads to a final tree that is worse in describing the evolutionary relationships of the species under study than an earlier (and rejected) solution encountered during the likelihood optimization process. This result should help develop specific methods for species tree reconstruction in the future, and may improve our understanding of the complexity of tree likelihood landscapes.


Sign in / Sign up

Export Citation Format

Share Document