scholarly journals Does more sequence data improve estimates of galliform phylogeny? Analyses of a rapid radiation using a complete data matrix

Author(s):  
Rebecca T Kimball ◽  
Edward L Braun

The resolution of rapid evolutionary radiations or “bushes” in the tree of life has been one of the most difficult and interesting problems in phylogenetics. The avian order Galliformes appears to have undergone several rapid radiations that have limited the resolution of prior studies and obscured the position of taxa important both agriculturally and as model systems (chicken, turkey, Japanese quail). Here we present analyses of a multi-locus data matrix comprising over 15,000 sites, primarily from nuclear introns but also including three mitochondrial regions, from 46 galliform taxa with all gene regions sampled for all taxa. The increased sampling of unlinked nuclear genes provided strong bootstrap support for all but a small number of relationships. Coalescent-based methods to combine individual gene trees and analyses of datasets independent of published data indicated that this well-supported topology is likely to reflect the galliform species tree. Some of the key findings include support for a second major clade within the core phasianids that includes the chicken and Japanese quail and clarification of the phylogenetic relationships of turkey. Jackknifed datasets suggested that there is an advantage to sampling many independent regions across the genome rather than obtaining long sequences for a small number of loci, possibly reflecting the differences among gene trees that differ due to incomplete lineage sorting. Despite the novel insights we obtained using this increased sampling of gene regions, some nodes remain unresolved, likely due to periods of rapid diversification. Resolving these remaining groups will likely require sequencing a very large number of gene regions, but our analyses now appear to support a robust backbone for this order.

2013 ◽  
Author(s):  
Rebecca T Kimball ◽  
Edward L Braun

The resolution of rapid evolutionary radiations or “bushes” in the tree of life has been one of the most difficult and interesting problems in phylogenetics. The avian order Galliformes appears to have undergone several rapid radiations that have limited the resolution of prior studies and obscured the position of taxa important both agriculturally and as model systems (chicken, turkey, Japanese quail). Here we present analyses of a multi-locus data matrix comprising over 15,000 sites, primarily from nuclear introns but also including three mitochondrial regions, from 46 galliform taxa with all gene regions sampled for all taxa. The increased sampling of unlinked nuclear genes provided strong bootstrap support for all but a small number of relationships. Coalescent-based methods to combine individual gene trees and analyses of datasets independent of published data indicated that this well-supported topology is likely to reflect the galliform species tree. Some of the key findings include support for a second major clade within the core phasianids that includes the chicken and Japanese quail and clarification of the phylogenetic relationships of turkey. Jackknifed datasets suggested that there is an advantage to sampling many independent regions across the genome rather than obtaining long sequences for a small number of loci, possibly reflecting the differences among gene trees that differ due to incomplete lineage sorting. Despite the novel insights we obtained using this increased sampling of gene regions, some nodes remain unresolved, likely due to periods of rapid diversification. Resolving these remaining groups will likely require sequencing a very large number of gene regions, but our analyses now appear to support a robust backbone for this order.


Zootaxa ◽  
2020 ◽  
Vol 4750 (3) ◽  
pp. 328-348 ◽  
Author(s):  
DAVID A. GRAY ◽  
DAVID B. WEISSMAN ◽  
JEFFREY A. COLE ◽  
EMILY MORIARTY LEMMON

We present the first comprehensive molecular phylogeny of Gryllus field cricket species found in the United States and Canada, select additional named Gryllus species found in Mexico and the Bahamas, plus the European field cricket G. campestris Linnaeus and the Afro-Eurasian cricket G. bimaculatus De Geer. Acheta, Teleogryllus, and Nigrogryllus were used as outgroups. Anchored hybrid enrichment was used to generate 492,531 base pairs of DNA sequence from 563 loci. RAxML analysis of concatenated sequence data and Astral analysis of gene trees gave broadly congruent results, especially for older branches and overall tree structure. The North American Gryllus are monophyletic with respect to the two Old World taxa; certain sub-groups show rapid recent divergence. This is the first Anchored Hybrid Enrichment study of an insect group done for closely related species within a single genus, and the results illustrate the challenges of reconstructing the evolutionary history of young rapidly diverged taxa when both incomplete lineage sorting and probable hybridization are at play. Because Gryllus field crickets have been used extensively as a model system in evolutionary ecology, behavior, neuro-physiology, speciation, and life-history and life-cycle evolution, these results will help inform, interpret, and guide future research in these areas. 


2019 ◽  
Vol 68 (6) ◽  
pp. 937-955 ◽  
Author(s):  
Alison Cloutier ◽  
Timothy B Sackton ◽  
Phil Grayson ◽  
Michele Clamp ◽  
Allan J Baker ◽  
...  

Abstract Palaeognathae represent one of the two basal lineages in modern birds, and comprise the volant (flighted) tinamous and the flightless ratites. Resolving palaeognath phylogenetic relationships has historically proved difficult, and short internal branches separating major palaeognath lineages in previous molecular phylogenies suggest that extensive incomplete lineage sorting (ILS) might have accompanied a rapid ancient divergence. Here, we investigate palaeognath relationships using genome-wide data sets of three types of noncoding nuclear markers, together totaling 20,850 loci and over 41 million base pairs of aligned sequence data. We recover a fully resolved topology placing rheas as the sister to kiwi and emu + cassowary that is congruent across marker types for two species tree methods (MP-EST and ASTRAL-II). This topology is corroborated by patterns of insertions for 4274 CR1 retroelements identified from multispecies whole-genome screening, and is robustly supported by phylogenomic subsampling analyses, with MP-EST demonstrating particularly consistent performance across subsampling replicates as compared to ASTRAL. In contrast, analyses of concatenated data supermatrices recover rheas as the sister to all other nonostrich palaeognaths, an alternative that lacks retroelement support and shows inconsistent behavior under subsampling approaches. While statistically supporting the species tree topology, conflicting patterns of retroelement insertions also occur and imply high amounts of ILS across short successive internal branches, consistent with observed patterns of gene tree heterogeneity. Coalescent simulations and topology tests indicate that the majority of observed topological incongruence among gene trees is consistent with coalescent variation rather than arising from gene tree estimation error alone, and estimated branch lengths for short successive internodes in the inferred species tree fall within the theoretical range encompassing the anomaly zone. Distributions of empirical gene trees confirm that the most common gene tree topology for each marker type differs from the species tree, signifying the existence of an empirical anomaly zone in palaeognaths.


Author(s):  
Paul Zaharias ◽  
Tandy Warnow

With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the last few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g., incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements.


2019 ◽  
Vol 69 (3) ◽  
pp. 431-444 ◽  
Author(s):  
Emily J Roycroft ◽  
Adnan Moussalli ◽  
Kevin C Rowe

Abstract The estimation of robust and accurate measures of branch support has proven challenging in the era of phylogenomics. In data sets of potentially millions of sites, bootstrap support for bifurcating relationships around very short internal branches can be inappropriately inflated. Such overestimation of branch support may be particularly problematic in rapid radiations, where phylogenetic signal is low and incomplete lineage sorting severe. Here, we explore this issue by comparing various branch support estimates under both concatenated and coalescent frameworks, in the recent radiation Australo-Papuan murine rodents (Muridae: Hydromyini). Using nucleotide sequence data from 1245 independent loci and several phylogenomic inference methods, we unequivocally resolve the majority of genus-level relationships within Hydromyini. However, at four nodes we recover inconsistency in branch support estimates both within and among concatenated and coalescent approaches. In most cases, concatenated likelihood approaches using standard fast bootstrap algorithms did not detect any uncertainty at these four nodes, regardless of partitioning strategy. However, we found this could be overcome with two-stage resampling, that is, across genes and sites within genes (using -bsam GENESITE in IQ-TREE). In addition, low confidence at recalcitrant nodes was recovered using UFBoot2, a recent revision to the bootstrap protocol in IQ-TREE, but this depended on partitioning strategy. Summary coalescent approaches also failed to detect uncertainty under some circumstances. For each of four recalcitrant nodes, an equivalent (or close to equivalent) number of genes were in strong support ($>$ 75% bootstrap) of both the primary and at least one alternative topological hypothesis, suggesting notable phylogenetic conflict among loci not detected using some standard branch support metrics. Recent debate has focused on the appropriateness of concatenated versus multigenealogical approaches to resolving species relationships, but less so on accurately estimating uncertainty in large data sets. Our results demonstrate the importance of employing multiple approaches when assessing confidence and highlight the need for greater attention to the development of robust measures of uncertainty in the era of phylogenomics.


The Auk ◽  
2003 ◽  
Vol 120 (3) ◽  
pp. 889-907
Author(s):  
Kim T. Scribner ◽  
Sandra L. Talbot ◽  
John M. Pearce ◽  
Barbara J. Pierson ◽  
Karen S. Bollinger ◽  
...  

Abstract Using molecular genetic markers that differ in mode of inheritance and rate of evolution, we examined levels and partitioning of genetic variation for seven nominal subspecies (11 breeding populations) of Canada Geese (Branta canadensis) in western North America. Gene trees constructed from mtDNA control region sequence data show that subspecies of Canada Geese do not have distinct mtDNA. Large and small-bodied forms of Canada Geese were highly diverged (0.077 average sequence divergence) and represent monophyletic groups. A majority (65%) of 20 haplotypes resolved were observed in single breeding locales. However, within both large and small-bodied forms certain haplotypes occurred across multiple subspecies. Population trees for both nuclear (microsatellites) and mitochondrial markers were generally concordant and provide resolution of population and subspecific relationships indicating incomplete lineage sorting. All populations and subspecies were genetically diverged, but to varying degrees. Analyses of molecular variance, nested-clade and coalescencebased analyses of mtDNA suggest that both historical (past fragmentation) and contemporary forces have been important in shaping current spatial genetic distributions. Gene flow appears to be ongoing though at different rates, even among currently recognized subspecies. The efficacy of current subspecific taxonomy is discussed in light of hypothesized historical vicariance and current demographic trends of management and conservation concern.


Author(s):  
Paul Zaharias ◽  
Tandy Warnow

With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the last few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g., incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Shahed Uddin Ahmed Shazib ◽  
Peter Vďačný ◽  
Marek Slovák ◽  
Eleni Gentekaki ◽  
Mann Kyoon Shin

Abstract The ciliate genus Spirostomum comprises eight morphospecies, inhabiting diverse aquatic environments worldwide, where they can be used as water quality indicators. Although Spirostomum species are relatively easily identified using morphological methods, the previous nuclear rDNA-based phylogenies indicated several conflicts in morphospecies delineation. Moreover, the single locus phylogenies and previous analytical approaches could not unambiguously resolve phylogenetic relationships among Spirostomum morphospecies. Here, we attempt to investigate species boundaries and evolutionary history of Spirostomum taxa, using 166 new sequences from multiple populations employing one mitochondrial locus (CO1 gene) and two nuclear loci (rRNA operon and alpha-tubulin gene). In accordance with previous studies, relationships among the eight Spirostomum morphospecies were poorly supported statistically in individual gene trees. To overcome this problem, we utilised for the first time in ciliates the Bayesian coalescent approach, which accounts for ancestral polymorphisms, incomplete lineage sorting, and recombination. This strategy enabled us to robustly resolve deep relationships between Spirostomum species and to support the hypothesis that taxa with compact macronucleus and taxa with moniliform macronucleus each form a distinct lineage. Bayesian coalescent-based delimitation analyses strongly statistically supported the traditional morphospecies concept but also indicated that there are two S. minus-like cryptic species and S. teres is non-monophyletic. Spirostomum teres was very likely defined by a set of ancestral features of lineages that also gave rise to S. yagiui and S. dharwarensis. However, molecular data from type populations of the morphospecies S. minus and S. teres are required to unambiguously resolve the taxonomic problems.


2018 ◽  
Author(s):  
Alison Cloutier ◽  
Timothy B. Sackton ◽  
Phil Grayson ◽  
Michele Clamp ◽  
Allan J. Baker ◽  
...  

AbstractPalaeognathae represent one of the two basal lineages in modern birds, and comprise the volant (flighted) tinamous and the flightless ratites. Resolving palaeognath phylogenetic relationships has historically proved difficult, and short internal branches separating major palaeognath lineages in previous molecular phylogenies suggest that extensive incomplete lineage sorting (ILS) might have accompanied a rapid ancient divergence. Here, we investigate palaeognath relationships using genome-wide data sets of three types of noncoding nuclear markers, together totalling 20,850 loci and over 41 million base pairs of aligned sequence data. We recover a fully resolved topology placing rheas as the sister to kiwi and emu + cassowary that is congruent across marker types for two species tree methods (MP-EST and ASTRAL-II). This topology is corroborated by patterns of insertions for 4,274 CR1 retroelements identified from multi-species whole genome screening, and is robustly supported by phylogenomic subsampling analyses, with MP-EST demonstrating particularly consistent performance across subsampling replicates as compared to ASTRAL. In contrast, analyses of concatenated data supermatrices recover rheas as the sister to all other non-ostrich palaeognaths, an alternative that lacks retroelement support and shows inconsistent behavior under subsampling approaches. While statistically supporting the species tree topology, conflicting patterns of retroelement insertions also occur and imply high amounts of ILS across short successive internal branches, consistent with observed patterns of gene tree heterogeneity. Coalescent simulations indicate that the majority of observed topological incongruence among gene trees is consistent with coalescent variation rather than arising from gene tree estimation error alone, and estimated branch lengths for short successive internodes in the inferred species tree fall within the theoretical range encompassing the anomaly zone. Distributions of empirical gene trees confirm that the most common gene tree topology for each marker type differs from the species tree, signifying the existence of an empirical anomaly zone in palaeognaths.


Author(s):  
Huateng Huang ◽  
Jeet Sukumaran ◽  
Stephen A Smith ◽  
L.Lacey Knowles

Despite recent efforts that have produced data sets with hundreds and thousands of gene regions to resolve regions of the tree of life, recalcitrant nodes persist and disagreement among genes as well as disagreement between individual gene trees and species trees are common. There are a number of evolutionary processes that contribute to these conflicts between gene trees and species trees, including deep coalescence (lineage sorting), horizontal gene transfer or hybridization, etc. While for some of these processes, we have very powerful and sophisticated models that uses the conflict in the gene trees as information that contributes materially to correctly inferring the species tree, such as the multispecies coalescent (MSC). However, usage of these models require a priori recognition of relevant processes, which is often unknown for empirical dataset. Here we propose a new perspective to not only identify the cause of discord among gene trees, but also use it to classify loci by the underlying cause of discord to identify subsets of loci for analysis with the goal of improving phylogenetic accuracy. This approach differs fundamentally from all other criteria used for making decisions about which loci to include in a phylogenetic analysis. In particular, the choice of loci in this framework is based on identifying those that reflect descent from a common ancestor (as opposed to other processes), and thereby can minimize problems with model misspecification. We present preliminary results that demonstrate the potential of this framework in distinguishing the lateral gene transfer (LGT) from incomplete lineage sorting (ILS) process, as implemented in a new software package CLASSIPHY, while also highlighting areas for further development and testing. We discussed why such methods (i) are critical to improving phylogenetic accuracy with the increased complexity of genomic/transcriptomic datasets, and that (ii) characterizing patterns of discordance and the contribution of different processes to this discordance is itself of interest for generating hypotheses about the role of lateral gene transfer, gene duplication, and incomplete lineage sorting during the divergence of different taxa.


Sign in / Sign up

Export Citation Format

Share Document