scholarly journals Dissecting phylogenetic signal and accounting for bias in whole-genome data sets: a case study of the Metazoa

2015 ◽  
Author(s):  
Marek L Borowiec ◽  
Ernest K Lee ◽  
Joanna C Chiu ◽  
David C Plachetzki

Transcriptome-enabled phylogenetic analyses have dramatically improved our understanding of metazoan phylogeny in recent years, although several important questions remain. The branching order near the base of the tree is one such outstanding issue. To address this question we assemble a novel data set comprised of 1,080 orthologous loci derived from 36 publicly available genomes and dissect the phylogenetic signal present in each individual partition. The size of this data set allows for a closer look at the potential biases and sources of non-phylogenetic signal. We assessed a range of measures for each data partition including information content, saturation, rate of evolution, long-branch score, and taxon occupancy and explored how each of these characteristics impacts phylogeny estimation. We then used these data to prepare a reduced set of partitions that fit an optimal set of criteria and are amenable to the most appropriate and computationally intensive analyses using site-heterogeneous models of sequence evolution. We also employed several strategies to examine the potential for long-branch attraction to bias our inferences. All of our analyses support Ctenophora as the sister lineage to other Metazoa, although support for this relationship varies among analyses. We find no support for the traditional view uniting the ctenophores and Cnidaria (jellies, anemones, corals, and kin). We also examine phylogenetic placement of myriapods (centipedes and millipedes) and find it more sensitive to the type of analysis and data used. Our study provides a workflow for minimizing systematic bias in whole genome-based phylogenetic analyses.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Tao Zhao ◽  
Arthur Zwaenepoel ◽  
Jia-Yu Xue ◽  
Shu-Min Kao ◽  
Zhen Li ◽  
...  

AbstractPlant genomes vary greatly in size, organization, and architecture. Such structural differences may be highly relevant for inference of genome evolution dynamics and phylogeny. Indeed, microsynteny—the conservation of local gene content and order—is recognized as a valuable source of phylogenetic information, but its use for the inference of large phylogenies has been limited. Here, by combining synteny network analysis, matrix representation, and maximum likelihood phylogenetic inference, we provide a way to reconstruct phylogenies based on microsynteny information. Both simulations and use of empirical data sets show our method to be accurate, consistent, and widely applicable. As an example, we focus on the analysis of a large-scale whole-genome data set for angiosperms, including more than 120 available high-quality genomes, representing more than 50 different plant families and 30 orders. Our ‘microsynteny-based’ tree is largely congruent with phylogenies proposed based on more traditional sequence alignment-based methods and current phylogenetic classifications but differs for some long-contested and controversial relationships. For instance, our synteny-based tree finds Vitales as early diverging eudicots, Saxifragales within superasterids, and magnoliids as sister to monocots. We discuss how synteny-based phylogenetic inference can complement traditional methods and could provide additional insights into some long-standing controversial phylogenetic relationships.


Genetics ◽  
1996 ◽  
Vol 144 (4) ◽  
pp. 1817-1833 ◽  
Author(s):  
Michel C Milinkovitch ◽  
Richard G LeDuc ◽  
Jun Adachi ◽  
Frederic Farnir ◽  
Michel Georges ◽  
...  

Different phylogenetic analyses of the same genetic data set can yield conflicting results, depending on the choice of parameter settings and included taxa. This is particularly true in studies involving data sets where levels of homoplasy are high and likely to obscure the phylogenetic signal. Filtering of this phylogenetic noise can be attempted, with varying degrees of success, by using different weighting schemes and ingroup/outgroup choices, but it can be difficult to decide objectively which approach is best. Using a cytochrome b data set from cetaceans and artiodactyls, we examined the effects of a suite of parameter settings on the outcome of phylogenetic analyses. We tested 2968 combinations among the seven parameters that most often vary among phylogenetic studies. It is our contention that this sensitivity analysis identifies portions of the multidimensional parameter space where phylogenetic signal is most reliably recovered, and simple rules are given to guide the choice of settings. Portions of this data set have been used in previous studies with conflicting results, namely the monophyly vs. paraphyly of one of the two major recognized cetacean suborders, the toothed whales. This analysis strongly supports the sister relationship between sperm whales and baleen whales.


2017 ◽  
Author(s):  
Marek L. Borowiec

AbstractThe evolution of the suite of morphological and behavioral adaptations underlying the ecological success of army ants has been the subject of considerable debate. This “army ant syn-drome” has been argued to have arisen once or multiple times within the ant subfamily Dorylinae. To address this question I generated data from 2,166 loci and a comprehensive taxon sampling for a phylogenetic investigation. Most analyses show strong support for convergent evolution of the army ant syndrome in the Old and New World but certain relationships are sensitive to analytics. I examine the signal present in this data set and find that conflict is diminished when only loci less likely to violate common phylogenetic model assumptions are considered. I also provide a temporal and spatial context for doryline evolution with timecalibrated, biogeographic, and diversification rate shift analyses. This study underscores the need for cautious analysis of phylogenomic data and calls for more efficient algorithms employing better-fitting models of molecular evolution.SignificanceRecent interpretation of army ant evolution holds that army ant behavior and morphology originated only once within the subfamily Dorylinae. An inspection of phylogenetic signal in a large new data set shows that support for this hypothesis may be driven by bias present in the data. Convergent evolution of the army ant syndrome is consistently supported when sequences violating assumptions of a commonly used model of sequence evolution are excluded from the analysis. This hypothesis also fits with a simple scenario of doryline biogeography. These results highlight the importance of careful evaluation of signal and conflict within phylogenomic data sets, even when taxon sampling is comprehensive.


2020 ◽  
Vol 94 (11) ◽  
Author(s):  
Shengzhong Xu ◽  
Liang Zhou ◽  
Xiaosha Liang ◽  
Yifan Zhou ◽  
Hao Chen ◽  
...  

ABSTRACT Virophages are small parasitic double-stranded DNA (dsDNA) viruses of giant dsDNA viruses infecting unicellular eukaryotes. Except for a few isolated virophages characterized by parasitization mechanisms, features of virophages discovered in metagenomic data sets remain largely unknown. Here, the complete genomes of seven virophages (26.6 to 31.5 kbp) and four large DNA viruses (190.4 to 392.5 kbp) that coexist in the freshwater lake Dishui Lake, Shanghai, China, have been identified based on environmental metagenomic investigation. Both genomic and phylogenetic analyses indicate that Dishui Lake virophages (DSLVs) are closely related to each other and to other lake virophages, and Dishui Lake large DNA viruses are affiliated with the micro-green alga-infecting Prasinovirus of the Phycodnaviridae (named Dishui Lake phycodnaviruses [DSLPVs]) and protist (protozoan and alga)-infecting Mimiviridae (named Dishui Lake large alga virus [DSLLAV]). The DSLVs possess more genes with closer homology to that of large alga viruses than to that of giant protozoan viruses. Furthermore, the DSLVs are strongly associated with large green alga viruses, including DSLPV4 and DSLLAV1, based on codon usage as well as oligonucleotide frequency and correlation analyses. Surprisingly, a nonhomologous CRISPR-Cas like system is found in DSLLAV1, which appears to protect DSLLAV1 from the parasitization of DSLV5 and DSLV8. These results suggest that novel cell-virus-virophage (CVv) tripartite infection systems of green algae, large green alga virus (Phycodnaviridae- and Mimiviridae-related), and virophage exist in Dishui Lake, which will contribute to further deep investigations of the evolutionary interaction of virophages and large alga viruses as well as of the essential roles that the CVv plays in the ecology of algae. IMPORTANCE Virophages are small parasitizing viruses of large/giant viruses. To our knowledge, the few isolated virophages all parasitize giant protozoan viruses (Mimiviridae) for propagation and form a tripartite infection system with hosts, here named the cell-virus-virophage (CVv) system. However, the CVv system remains largely unknown in environmental metagenomic data sets. In this study, we systematically investigated the metagenomic data set from the freshwater lake Dishui Lake, Shanghai, China. Consequently, four novel large alga viruses and seven virophages were discovered to coexist in Dishui Lake. Surprisingly, a novel CVv tripartite infection system comprising green algae, large green alga viruses (Phycodnaviridae- and Mimiviridae-related), and virophages was identified based on genetic link, genomic signature, and CRISPR system analyses. Meanwhile, a nonhomologous CRISPR-like system was found in Dishui Lake large alga viruses, which appears to protect the virus host from the infection of Dishui Lake virophages (DSLVs). These findings are critical to give insight into the potential significance of CVv in global evolution and ecology.


IMA Fungus ◽  
2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Felix Grewe ◽  
Claudio Ametrano ◽  
Todd J. Widhelm ◽  
Steven Leavitt ◽  
Isabel Distefano ◽  
...  

AbstractParmeliaceae is the largest family of lichen-forming fungi with a worldwide distribution. We used a target enrichment data set and a qualitative selection method for 250 out of 350 genes to infer the phylogeny of the major clades in this family including 81 taxa, with both subfamilies and all seven major clades previously recognized in the subfamily Parmelioideae. The reduced genome-scale data set was analyzed using concatenated-based Bayesian inference and two different Maximum Likelihood analyses, and a coalescent-based species tree method. The resulting topology was strongly supported with the majority of nodes being fully supported in all three concatenated-based analyses. The two subfamilies and each of the seven major clades in Parmelioideae were strongly supported as monophyletic. In addition, most backbone relationships in the topology were recovered with high nodal support. The genus Parmotrema was found to be polyphyletic and consequently, it is suggested to accept the genus Crespoa to accommodate the species previously placed in Parmotrema subgen. Crespoa. This study demonstrates the power of reduced genome-scale data sets to resolve phylogenetic relationships with high support. Due to lower costs, target enrichment methods provide a promising avenue for phylogenetic studies including larger taxonomic/specimen sampling than whole genome data would allow.


2000 ◽  
Vol 355 (1398) ◽  
pp. 769-793 ◽  
Author(s):  
Karen Sue Renzaglia ◽  
R. Joel Duff ◽  
Daniel L. Nickrent ◽  
David J. Garbary

As the oldest extant lineages of land plants, bryophytes provide a living laboratory in which to evaluate morphological adaptations associated with early land existence. In this paper we examine reproductive and structural innovations in the gametophyte and sporophyte generations of hornworts, liverworts, mosses and basal pteridophytes. Reproductive features relating to spermatogenesis and the architecture of motile male gametes are overviewed and evaluated from an evolutionary perspective. Phylogenetic analyses of a data set derived from spermatogenesis and one derived from comprehensive morphogenetic data are compared with a molecular analysis of nuclear and mitochondrial small subunit rDNA sequences. Although relatively small because of a reliance on water for sexual reproduction, gametophytes of bryophytes are the most elaborate of those produced by any land plant. Phenotypic variability in gametophytic habit ranges from leafy to thalloid forms with the greatest diversity exhibited by hepatics. Appendages, including leaves, slime papillae and hairs, predominate in liverworts and mosses, while hornwort gametophytes are strictly thalloid with no organized external structures. Internalization of reproductive and vegetative structures within mucilage–filled spaces is an adaptive strategy exhibited by hornworts. The formative stages of gametangial development are similar in the three bryophyte groups, with the exception that in mosses apical growth is intercalated into early organogenesis, a feature echoed in moss sporophyte ontogeny. A monosporangiate, unbranched sporophyte typifies bryophytes, but developmental and structural innovations suggest the three bryophyte groups diverged prior to elaboration of this generation. Sporophyte morphogenesis in hornworts involves non–synchronized sporogenesis and the continued elongation of the single sporangium, features unique among archegoniates. In hepatics, elongation of the sporophyte seta and archegoniophore is rapid and requires instantaneous wall expandability and hydrostatic support. Unicellular, spiralled elaters and capsule dehiscence through the formation of four regular valves are autapomorphies of liverworts. Sporophytic sophistications in the moss clade include conducting tissue, stomata, an assimilative layer and an elaborate peristome for extended spore dispersal. Characters such as stomata and conducting cells that are shared among sporophytes of mosses, hornworts and pteridophytes are interpreted as parallelisms and not homologies. Our phylogenetic analysis of three different data sets is the most comprehensive to date and points to a single phylogenetic solution for the evolution of basal embryophytes. Hornworts are supported as the earliest divergent embryophyte clade with a moss/liverwort clade sister to tracheophytes. Among pteridophytes, lycophytes are monophyletic and an assemblage containing ferns, Equisetum and psilophytes is sister to seed plants. Congruence between morphological and molecular hypotheses indicates that these data sets are tracking the same phylogenetic signal and reinforces our phylogenetic conclusions. It appears that total evidence approaches are valuable in resolving ancient radiations such as those characterizing the evolution of early embryophytes. More information on land plant phylogeny can be found at: http://www.science.siu.edu/landplants/index.html.


2020 ◽  
Vol 37 (11) ◽  
pp. 3380-3388
Author(s):  
Stephen A Smith ◽  
Nathanael Walker-Hale ◽  
Joseph F Walker

Abstract Most phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors can cause intragenic conflict. The extent to which this conflict is present in empirical data sets is not well documented, but if common, could have far-reaching implications for phylogenetic analyses. We examined several large phylogenomic data sets from diverse taxa using a fast and simple method to identify well-supported intragenic conflict. We found conflict to be highly variable between data sets, from 1% to >92% of genes investigated. We analyzed four exemplar genes in detail and analyzed simulated data under several scenarios. Our results suggest that alignment error may be one major source of conflict, but other conflicts remain unexplained and may represent biological signal or other errors. Whether as part of data analysis pipelines or to explore biologically processes, analyses of within-gene phylogenetic signal should become common.


2021 ◽  
Vol 7 (12) ◽  
Author(s):  
Arthur K. Turner ◽  
Muhammad Yasir ◽  
Sarah Bastkowski ◽  
Andrea Telatin ◽  
Andrew Page ◽  
...  

Trimethoprim and sulfamethoxazole are used commonly together as cotrimoxazole for the treatment of urinary tract and other infections. The evolution of resistance to these and other antibacterials threatens therapeutic options for clinicians. We generated and analysed a chemical-biology-whole-genome data set to predict new targets for antibacterial combinations with trimethoprim and sulfamethoxazole. For this we used a large transposon mutant library in Escherichia coli BW25113 where an outward-transcribing inducible promoter was engineered into one end of the transposon. This approach allows regulated expression of adjacent genes in addition to gene inactivation at transposon insertion sites, a methodology that has been called TraDIS-Xpress. These chemical genomic data sets identified mechanisms for both reduced and increased susceptibility to trimethoprim and sulfamethoxazole. The data identified that over-expression of FolA reduced trimethoprim susceptibility, a known mechanism for reduced susceptibility. In addition, transposon insertions into the genes tdk, deoR, ybbC, hha, ldcA, wbbK and waaS increased susceptibility to trimethoprim and likewise for rsmH, fadR, ddlB, nlpI and prc with sulfamethoxazole, while insertions in ispD, uspC, minC, minD, yebK, truD and umpG increased susceptibility to both these antibiotics. Two of these genes’ products, Tdk and IspD, are inhibited by AZT and fosmidomycin respectively, antibiotics that are known to synergise with trimethoprim. Thus, the data identified two known targets and several new target candidates for the development of co-drugs that synergise with trimethoprim, sulfamethoxazole or cotrimoxazole. We demonstrate that the TraDIS-Xpress technology can be used to generate information-rich chemical-genomic data sets that can be used for antibacterial development.


2019 ◽  
Author(s):  
Sankar Subramanian ◽  
Umayal Ramasamy ◽  
David Chen

In the past decades a number of software programs have been developed to deduce the phylogenetic relationship between populations. However, these programs are not suited for large-scale whole genome data. Recently, a few standalone or web applications have been developed to handle genome-wide data, but they were either computationally intensive, dependent on third party software or required significant time and resource of a web server. In the post-genomic era, researchers are able to obtain bioinformatically processed high-quality publication-ready whole genome data for many individuals in a population from next generation sequencing companies due to the reduction in the cost of sequencing and analysis. Such genotype data is typically presented in the Variant Call Format (VCF) and there is no simple software available that uses this data to construct the phylogeny of populations in a short time. To address this limitation, we have developed a one-click user-friendly software, VCF2PopTree that uses gnome-wide SNPs to construct and display phylogenetic trees in seconds to minutes. For example, it reads a 1 GB VCF file and draws a tree in less than 5 minutes. VCF2PopTree accepts genotype data from a local machine, constructs a tree using UPGMA and Neighbour-Joining algorithms and displays it on a web-browser. It also produces pairwise-diversity matrix in MEGA and PHYLIP file formats as well as trees in the Newick format which could be directly used by other popular phylogenetic software programs. The software including the source code, a test VCF input file and short documentation are available at: https://github.com/sansubs/vcf2pop.


Sign in / Sign up

Export Citation Format

Share Document