Targeted NGS for species level phylogenomics: “made to measure” or “one size fits all”?

PeerJ ◽

10.7717/peerj.3569 ◽

2017 ◽

Vol 5 ◽

pp. e3569 ◽

Cited By ~ 22

Author(s):

Malvina Kadlec ◽

Dirk U. Bellstedt ◽

Nicholas C. Le Maitre ◽

Michael D. Pirie

Keyword(s):

High Throughput Sequencing ◽

Gene Tree ◽

Flowering Plant ◽

Homologous Sequence ◽

Gene Trees ◽

Species Complexes ◽

Custom Made ◽

Targeted Ngs ◽

Promising Source ◽

Rapid Radiations

Targeted high-throughput sequencing using hybrid-enrichment offers a promising source of data for inferring multiple, meaningfully resolved, independent gene trees suitable to address challenging phylogenetic problems in species complexes and rapid radiations. The targets in question can either be adopted directly from more or less universal tools, or custom made for particular clades at considerably greater effort. We applied custom made scripts to select sets of homologous sequence markers from transcriptome and WGS data for use in the flowering plant genus Erica (Ericaceae). We compared the resulting targets to those that would be selected both using different available tools (Hyb-Seq; MarkerMiner), and when optimising for broader clades of more distantly related taxa (Ericales; eudicots). Approaches comparing more divergent genomes (including MarkerMiner, irrespective of input data) delivered fewer and shorter potential markers than those targeted for Erica. The latter may nevertheless be effective for sequence capture across the wider family Ericaceae. We tested the targets delivered by our scripts by obtaining an empirical dataset. The resulting sequence variation was lower than that of standard nuclear ribosomal markers (that in Erica fail to deliver a well resolved gene tree), confirming the importance of maximising the lengths of individual markers. We conclude that rather than searching for “one size fits all” universal markers, we should improve and make more accessible the tools necessary for developing “made to measure” ones.

Download Full-text

Targeted NGS for species level phylogenomics: “made to measure” or “one size fits all”?

10.7287/peerj.preprints.2763v2 ◽

2017 ◽

Author(s):

Malvina Kadlec ◽

Dirk U Bellstedt ◽

Nicholas C Le Maitre ◽

Michael D Pirie

Keyword(s):

High Throughput Sequencing ◽

Gene Tree ◽

Flowering Plant ◽

Homologous Sequence ◽

Gene Trees ◽

Species Complexes ◽

Custom Made ◽

Targeted Ngs ◽

Promising Source ◽

Rapid Radiations

Targeted high-throughput sequencing using hybrid-enrichment offers a promising source of data for inferring multiple, meaningfully resolved, independent gene trees suitable to address challenging phylogenetic problems in species complexes and rapid radiations. The targets in question can either be adopted directly from more or less universal tools, or custom made for particular clades at considerably greater effort. We applied custom made scripts to select sets of homologous sequence markers from transcriptome and WGS data for use in the flowering plant genus Erica (Ericaceae). We compared the resulting targets to those that would be selected both using different available tools (Hyb-Seq; MarkerMiner), and when optimising for broader clades of more distantly related taxa (Ericales; eudicots). Approaches comparing more divergent genomes (including MarkerMiner, irrespective of input data) delivered fewer and shorter potential markers than those targeted for Erica. The latter may nevertheless be effective for sequence capture across the wider family Ericaceae. We tested the targets delivered by our scripts by obtaining an empirical dataset. The resulting sequence variation was lower than that of standard nuclear ribosomal markers (that in Erica fail to deliver a well resolved gene tree), confirming the importance of maximising the lengths of individual markers. We conclude that rather than searching for “one size fits all” universal markers, we should improve and make more accessible the tools necessary for developing “made to measure” ones.

Download Full-text

Targeted NGS for species level phylogenomics: “made to measure” or “one size fits all”?

10.7287/peerj.preprints.2763v1 ◽

2017 ◽

Author(s):

Malvina Kadlec ◽

Dirk U Bellstedt ◽

Nicholas C Le Maitre ◽

Michael D Pirie

Keyword(s):

High Throughput Sequencing ◽

Gene Tree ◽

Flowering Plant ◽

Homologous Sequence ◽

Gene Trees ◽

Species Complexes ◽

Custom Made ◽

Targeted Ngs ◽

Promising Source ◽

Rapid Radiations

Targeted high-throughput sequencing using hybrid-enrichment offers a promising source of data for inferring multiple, meaningfully resolved, independent gene trees suitable to address challenging phylogenetic problems in species complexes and rapid radiations. The targets in question can either be adopted directly from more or less universal tools, or custom made for particular clades at considerably greater effort. We applied custom made scripts to select sets of homologous sequence markers from transcriptome and WGS data for use in the flowering plant genus Erica (Ericaceae). We compared the resulting targets to those that would be selected both using different available tools (Hyb-Seq; MarkerMiner), and when optimising for broader clades of more distantly related taxa (Ericales; eudicots). Approaches comparing more divergent genomes (including MarkerMiner, irrespective of input data) delivered fewer and shorter potential markers than those targeted for Erica. The latter may nevertheless be effective for sequence capture across the wider family Ericaceae. We tested the targets delivered by our scripts by obtaining an empirical dataset. The resulting sequence variation was lower than that of standard nuclear ribosomal markers (that in Erica fail to deliver a well resolved gene tree), confirming the importance of maximising the lengths of individual markers. We conclude that rather than searching for “one size fits all” universal markers, we should improve and make more accessible the tools necessary for developing “made to measure” ones.

Download Full-text

Targeted NGS for species level phylogenomics: “made to measure” or “one size fits all”?

10.7287/peerj.preprints.2763v3 ◽

2017 ◽

Author(s):

Malvina Kadlec ◽

Dirk U Bellstedt ◽

Nicholas C Le Maitre ◽

Michael D Pirie

Keyword(s):

High Throughput Sequencing ◽

Gene Tree ◽

Flowering Plant ◽

Homologous Sequence ◽

Gene Trees ◽

Species Complexes ◽

Custom Made ◽

Targeted Ngs ◽

Promising Source ◽

Rapid Radiations

Targeted high-throughput sequencing using hybrid-enrichment offers a promising source of data for inferring multiple, meaningfully resolved, independent gene trees suitable to address challenging phylogenetic problems in species complexes and rapid radiations. The targets in question can either be adopted directly from more or less universal tools, or custom made for particular clades at considerably greater effort. We applied custom made scripts to select sets of homologous sequence markers from transcriptome and WGS data for use in the flowering plant genus Erica (Ericaceae). We compared the resulting targets to those that would be selected both using different available tools (Hyb-Seq; MarkerMiner), and when optimising for broader clades of more distantly related taxa (Ericales; eudicots). Approaches comparing more divergent genomes (including MarkerMiner, irrespective of input data) delivered fewer and shorter potential markers than those targeted for Erica. The latter may nevertheless be effective for sequence capture across the wider family Ericaceae. We tested the targets delivered by our scripts by obtaining an empirical dataset. The resulting sequence variation was lower than that of standard nuclear ribosomal markers (that in Erica fail to deliver a well resolved gene tree), confirming the importance of maximising the lengths of individual markers. We conclude that rather than searching for “one size fits all” universal markers, we should improve and make more accessible the tools necessary for developing “made to measure” ones.

Download Full-text

Target enrichment improves phylogenetic resolution in the genus Zanthoxylum (Rutaceae) and indicates both incomplete lineage sorting and hybridization events

10.1101/2021.04.12.439519 ◽

2021 ◽

Author(s):

Niklas Reichelt ◽

Jun Wen ◽

Claudia Paetzold ◽

Marc Appelhans

Keyword(s):

High Throughput Sequencing ◽

Incomplete Lineage Sorting ◽

Phylogenetic Analyses ◽

Gene Tree ◽

Single Copy ◽

Reticulate Evolution ◽

Gene Trees ◽

Target Enrichment ◽

Lineage Sorting ◽

Phylogenetic Resolution

Background and aims: Zanthoxylum L. is the only pantropical genus within Rutaceae, with a few species native to temperate eastern Asia and North America. Efforts using Sanger sequencing failed to resolve the backbone phylogeny of Zanthoxylum. In this study, we employed target enrichment high-throughput sequencing to improve resolution. Gene trees were examined for concordance and sectional classifications of Zanthoxylum were evaluated. Off-target reads were investigated to identify putative single-copy markers for bait refinement, and low-copy markers for evidence of putative hybridization events. Methods: We developed a custom bait set for target enrichment of 745 exons in Zanthoxylum and applied it to 45 Zanthoxylum species and one Tetradium species as the outgroup. Illumina reads were processed via the HybPhyloMaker pipeline. Phylogenetic inferences were conducted using coalescent and concatenated methods. Concordance was assessed using quartet sampling. Off-target reads were assembled and putative single- and low-copy genes were extracted. Additional phylogenetic analyses were performed based on these alignments. Key results: Four major clades are supported within Zanthoxylum: the African clade, the Z. asiaticum clade, the Asian-Pacific-Australian clade, and the American-eastern Asian clade. While overall support has improved, regions of conflict are similar to those previously observed. Gene tree discordances indicate a hybridization event in the ancestor of the Hawaiian lineage, and incomplete lineage sorting for the American backbone. Off-target putative single-copy genes largely confirm on-target results, and putative low-copy genes provide additional evidence for hybridization in the Hawaiian lineage. Only two of the five sections of Zanthoxylum are resolved as monophyletic. Conclusion: Target enrichment is suitable to assess phylogenetic relationships in Zanthoxylum. Our phylogenetic analyses reveal that current sectional classifications need revision. Quartet tree concordance indicates several instances of reticulate evolution. Off-target reads are proven useful to identify additional phylogenetically informative regions for bait refinement or gene tree based approaches.

Download Full-text

Resolving Rapid Radiations Within Angiosperm Families Using Anchored Phylogenomics

10.1101/110296 ◽

2017 ◽

Cited By ~ 2

Author(s):

Étienne Léveillé-Bourret ◽

Julian R. Starr ◽

Bruce A. Ford ◽

Emily Moriarty Lemmon ◽

Alan R. Lemmon

Keyword(s):

Estimation Error ◽

Gene Tree ◽

Nuclear Gene ◽

Nuclear Data ◽

Pcr Primers ◽

Nuclear Genes ◽

Flowering Plant ◽

Nuclear Ribosomal Dna ◽

Anchored Phylogenomics ◽

Rapid Radiations

AbstractDespite the promise that molecular data would provide a seemingly unlimited source of independent characters, many plant phylogenetic studies are based on only two regions, the plastid genome and nuclear ribosomal DNA (nrDNA). Their popularity can be explained by high copy numbers and universal PCR primers that make their sequences easily amplified and converted into parallel datasets. Unfortunately, their utility is limited by linked loci and limited characters resulting in low confidence in the accuracy of phylogenetic estimates, especially when rapid radiations occur. In another contribution on anchored phylogenomics in angiosperms, we presented flowering plant-specific anchored enrichment probes for hundreds of conserved nuclear genes and demonstrated their use at the level of all angiosperms. In this contribution, we focus on a common problem in phylogenetic reconstructions below the family level: weak or unresolved backbone due to rapid radiations (≤ 10 million years) followed by long divergence, using the Cariceae-Dulichieae-Scirpeae clade (CDS, Cyperaceae) as a test case. By comparing our nuclear matrix of 461 genes to a typical Sanger-sequence dataset consisting of a few plastid genes (matK, ndhF) and an nrDNA marker (ETS), we demonstrate that our nuclear data is fully compatible with the Sanger dataset and resolves short backbone internodes with high support in both concatenated and coalescence-based analyses. In addition, we show that nuclear gene tree incongruence is inversely proportional to phylogenetic information content, indicating that incongruence is mostly due to gene tree estimation error. This suggests that large numbers of conserved nuclear loci could produce more accurate trees than sampling rapidly evolving regions prone to saturation and long-branch attraction. The robust phylogenetic estimates obtained here, and high congruence with previous morphological and molecular analyses, are strong evidence for a complete tribal revision of CDS. The anchored hybrid enrichment probes used in this study should be similarly effective in other flowering plant groups. [Carex, coalescent based species tree, flowering plants, low-copy nuclear genes, low-level phylogenetics, universal hybrid enrichment probes]

Download Full-text

Phylogenomic data reveal hard polytomies across the backbone of the large genus Solanum (Solanaceae)

10.1101/2021.03.25.436973 ◽

2021 ◽

Author(s):

Edeline Gagnon ◽

Rebecca Hilgenhof ◽

Andr&eacutes Orejuela ◽

Angela McDonnell ◽

Gaurav Sablok ◽

...

Keyword(s):

Phylogenetic Trees ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Flowering Plant ◽

Gene Trees ◽

Species Trees ◽

Nuclear Target ◽

Lineage Sorting ◽

Target Capture ◽

Gene Tree Discordance

Increased volumes of phylogenomic data have revealed incongruent topologies in gene trees, both between and within genomes across many organisms. Some of these incongruences indicate polytomies that may remain impossible to resolve. Here, widespread gene-tree discordance is uncovered along the backbone of Solanum, one of the largest flowering plant genera that includes the cultivated potato, tomato, and eggplant, as well as 24 minor crop plants. First, a densely sampled species-level phylogeny of Solanum is built using unpublished and publicly available Sanger sequences comprising 60% of all accepted species (742 spp.) and nine regions (ITS, waxy, and seven plastid markers). The robustness of the Sanger-based topology is tested by examining a plastome dataset with 140 species and a nuclear target-capture dataset with 39 species of Solanum. Clear incongruences between species trees generated from the supermatrix, plastome, and nuclear target-capture datasets are revealed. Discordance within the plastome and target-capture dataset are found at different evolutionary depths in three different areas along the backbone of these phylogenetic trees, with polytomy tests suggesting that most of these nodes have short branches and should be collapsed. We argue that incomplete lineage sorting due to rapid diversification is the most likely cause behind these polytomies, and that embracing the uncertainty that underlies them is crucial to depict the evolution of large and rapidly radiating lineages.

Download Full-text

Phylogenomic Analysis of Concatenated Ultraconserved Elements Reveals the Recent Evolutionary Radiation of the Fairy Wrasses (Teleostei: Labridae: Cirrhilabrus)

Systematic Biology ◽

10.1093/sysbio/syab012 ◽

2021 ◽

Author(s):

Yi-Kai Tea ◽

Xin Xu ◽

Joseph D DiBattista ◽

Nathan Lo ◽

Peter F Cowman ◽

...

Keyword(s):

Phylogenetic Signal ◽

Estimation Error ◽

Gene Tree ◽

Molecular Dating ◽

Morphological Data ◽

Phylogenomic Analysis ◽

Gene Trees ◽

Data Set ◽

Ultraconserved Elements ◽

Species Complexes

Abstract The fairy wrasses (genus Cirrhilabrus) are among the most successful of the extant wrasse lineages (Teleostei: Labridae), with their 61 species accounting for nearly 10$\%$ of the family. Although species complexes within the genus have been diagnosed on the basis of coloration patterns and synapomorphies, attempts to resolve evolutionary relationships among these groups using molecular and morphological data have largely been unsuccessful. Here, we use a phylogenomic approach with a data set comprising 991 ultraconserved elements (UCEs) and mitochondrial COI to uncover the evolutionary history and patterns of temporal and spatial diversification of the fairy wrasses. Our analyses of phylogenetic signal suggest that most gene-tree incongruence is caused by estimation error, leading to poor resolution in a summary-coalescent analysis of the data. In contrast, analyses of concatenated sequences are able to resolve the major relationships of Cirrhilabrus. We determine the placements of species that were previously regarded as incertae sedis and find evidence for the nesting of Conniella, an unusual, monotypic genus, within Cirrhilabrus. Our relaxed-clock dating analysis indicates that the major divergences within the genus occurred around the Miocene–Pliocene boundary, followed by extensive cladogenesis of species complexes in the Pliocene–Pleistocene. Biogeographic reconstruction suggests that the fairy wrasses emerged within the Coral Triangle, with episodic fluctuations of sea levels during glacial cycles coinciding with shallow divergence events but providing few opportunities for more widespread dispersal. Our study demonstrates both the resolving power and limitations of UCEs across shallow timescales where there is substantial estimation error in individual gene trees.[Biogeography; concatenation; gene genealogy interrogation; gene trees; molecular dating; summary coalescent; UCEs.]

Download Full-text

The Perfect Storm: Gene Tree Estimation Error, Incomplete Lineage Sorting, and Ancient Gene Flow Explain the Most Recalcitrant Ancient Angiosperm Clade, Malpighiales

Systematic Biology ◽

10.1093/sysbio/syaa083 ◽

2020 ◽

Author(s):

Liming Cai ◽

Zhenxiang Xi ◽

Emily Moriarty Lemmon ◽

Alan R Lemmon ◽

Austin Mast ◽

...

Keyword(s):

Gene Flow ◽

Incomplete Lineage Sorting ◽

Estimation Error ◽

Gene Tree ◽

Species Tree ◽

Flowering Plant ◽

Estimation Methods ◽

Lineage Sorting ◽

Tree Estimation ◽

Perfect Storm

Abstract The genomic revolution offers renewed hope of resolving rapid radiations in the Tree of Life. The development of the multispecies coalescent (MSC) model and improved gene tree estimation methods can better accommodate gene tree heterogeneity caused by incomplete lineage sorting (ILS) and gene tree estimation error stemming from the short internal branches. However, the relative influence of these factors in species tree inference is not well understood. Using anchored hybrid enrichment, we generated a data set including 423 single-copy loci from 64 taxa representing 39 families to infer the species tree of the flowering plant order Malpighiales. This order includes nine of the top ten most unstable nodes in angiosperms, which have been hypothesized to arise from the rapid radiation during the Cretaceous. Here, we show that coalescent-based methods do not resolve the backbone of Malpighiales and concatenation methods yield inconsistent estimations, providing evidence that gene tree heterogeneity is high in this clade. Despite high levels of ILS and gene tree estimation error, our simulations demonstrate that these two factors alone are insufficient to explain the lack of resolution in this order. To explore this further, we examined triplet frequencies among empirical gene trees and discovered some of them deviated significantly from those attributed to ILS and estimation error, suggesting gene flow as an additional and previously unappreciated phenomenon promoting gene tree variation in Malpighiales. Finally, we applied a novel method to quantify the relative contribution of these three primary sources of gene tree heterogeneity and demonstrated that ILS, gene tree estimation error, and gene flow contributed to 10.0%, 34.8%, and 21.4% of the variation, respectively. Together, our results suggest that a perfect storm of factors likely influence this lack of resolution, and further indicate that recalcitrant phylogenetic relationships like the backbone of Malpighiales may be better represented as phylogenetic networks. Thus, reducing such groups solely to existing models that adhere strictly to bifurcating trees greatly oversimplifies reality, and obscures our ability to more clearly discern the process of evolution.

Download Full-text

Bayes Estimation of Species Divergence Times and Ancestral Population Sizes Using DNA Sequences From Multiple Loci

Genetics ◽

10.1093/genetics/164.4.1645 ◽

2003 ◽

Vol 164 (4) ◽

pp. 1645-1656 ◽

Cited By ~ 3

Author(s):

Bruce Rannala ◽

Ziheng Yang

Keyword(s):

Dna Sequences ◽

Gene Tree ◽

Species Tree ◽

Bayes Estimation ◽

Ancestral Population ◽

Divergence Times ◽

Gene Trees ◽

Species Divergence ◽

Multiple Loci ◽

Population Sizes

Abstract The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be ∼20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.

Download Full-text