scholarly journals Comparing phylogenetic trees according to tip label categories

2018 ◽  
Author(s):  
Michelle Kendall ◽  
Vegard Eldholm ◽  
Caroline Colijn

AbstractTrees that illustrate patterns of ancestry and evolution are a central tool in many areas of biology. Comparing evolutionary trees to each other has widespread applications in comparing the evolutionary stories told by different sources of data, assessing the quality of inference methods, and highlighting areas where patterns of ancestry are uncertain. While these tasks are complicated by the fact that trees are high-dimensional structures encoding a large amount of information, there are a number of metrics suitable for comparing evolutionary trees whose tips have the same set of unique labels. There are also metrics for comparing trees where there is no relationship between their labels: in ‘unlabelled’ tree metrics the tree shapes are compared without reference to the tip labels.In many interesting applications, however, the taxa present in two or more trees are related but not identical, and it is informative to compare the trees whilst retaining information about their tips’ relationships. We present methods for comparing trees whose labels belong to a pre-defined set of categories. The methods include a measure of distance between two such trees, and a measure of concordance between one such tree and a hierarchical classification tree of the unique categories. We demonstrate the intuition of our methods with some toy examples before presenting an analysis of Mycobacterium tuberculosis trees, in which we use our methods to quantify the differences between trees built from typing versus sequence data.

2004 ◽  
Vol 10 (2) ◽  
pp. 157-166 ◽  
Author(s):  
George I. Hagstrom ◽  
Dehua H. Hang ◽  
Charles Ofria ◽  
Eric Torng

Phylogenetic trees group organisms by their ancestral relationships. There are a number of distinct algorithms used to reconstruct these trees from molecular sequence data, but different methods sometimes give conflicting results. Since there are few precisely known phylogenies, simulations are typically used to test the quality of reconstruction algorithms. These simulations randomly evolve strings of symbols to produce a tree, and then the algorithms are run with the tree leaves as inputs. Here we use Avida to test two widely used reconstruction methods, which gives us the chance to observe the effect of natural selection on tree reconstruction. We find that if the organisms undergo natural selection between branch points, the methods will be successful even on very large time scales. However, these algorithms often falter when selection is absent.


Author(s):  
Hesam Montazeri ◽  
Susan Little ◽  
Mozhgan Mozaffarilegha ◽  
Niko Beerenwinkel ◽  
Victor DeGruttola

AbstractGenetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably reconstruct transmission trees because genetic sequence data or inferred phylogenetic trees from such data contain insufficient information for accurate estimation of transmission chains. Here, we show by simulation studies that incorporating infection times, even when they are uncertain, can greatly improve the accuracy of reconstruction of transmission trees. To achieve this improvement, we propose a Bayesian inference methods using Markov chain Monte Carlo that directly draws samples from the space of transmission trees under the assumption of complete sampling of the outbreak. The likelihood of each transmission tree is computed by a phylogenetic model by treating its internal nodes as transmission events. By a simulation study, we demonstrate that accuracy of the reconstructed transmission trees depends mainly on the amount of information available on times of infection; we show superiority of the proposed method to two alternative approaches when infection times are known up to specified degrees of certainty. In addition, we illustrate the use of a multiple imputation framework to study features of epidemic dynamics, such as the relationship between characteristics of nodes and average number of outbound edges or inbound edges, signifying possible transmission events from and to nodes. We apply the proposed method to a transmission cluster in San Diego and to a dataset from the 2014 Sierra Leone Ebola virus outbreak and investigate the impact of biological, behavioral, and demographic factors.


2018 ◽  
Vol 59 ◽  
pp. 00025
Author(s):  
Agnieszka Szuster – Janiaczyk ◽  
Rafał Brodziak ◽  
Jędrzej Bylka

One of the processes that significantly determines the quality of water to consumers is the process of mixing water from different sources in the water mains. Put to the network two or more chemically and biologically stable waters may result in the formation of water that will be deprived of these features. This article presents the german guidelines for analysing water quality for mixing waters from different sources, in various proportions. Then performed an analysis of utility the mathematical models,including quality criteria, for use in network control. An IT tool has been developed to manage selected water quality processes using mathematical modeling. The basis for implementing the tool was a network model created in Epanet integrated with the Matlab.


2000 ◽  
Vol 57 (8) ◽  
pp. 1701-1717 ◽  
Author(s):  
Carol A Stepien ◽  
Alison K Dillon ◽  
Amy K Patterson

Population genetic, phylogeographic, and systematic relationships are elucidated among the three species comprising the thornyhead rockfish genus Sebastolobus (Teleostei: Scorpaenidae). Genetic variation among sampling sites representing their extensive ranges along the deep continental slopes of the northern Pacific Ocean is compared using sequence data from the left domain of the mtDNA control region. Comparisons are made among the shortspine thornyhead (S. alascanus) (from seven locations), the longspine thornyhead (S. altivelis) (from five sites), which are sympatric in the northeast, and the broadbanded thornyhead (S. macrochir) (a single site) from the northwest. Phylogenetic trees rooted to Sebastes show that S. macrochir is the sister taxon of S. alascanus and S. altivelis. Intraspecific genetic variability is appreciable, with most individuals having unique haplotypes. Gene flow is substantial among some locations and others diverged significantly. Genetic divergences among sampling sites for S. alascanus indicate an isolation by geographic distance pattern. Genetic divergences for S. altivelis are unrelated to the hypothesis of isolation by geographic distance and appear to be more consistent with the hypothesis of larval retention in currents and gyres. Differences in geographic genetic patterns between the species are attributed to life history differences in their relative mobilities as juveniles and adults.


2014 ◽  
Vol 95 (11) ◽  
pp. 2372-2376 ◽  
Author(s):  
Andi Krumbholz ◽  
Jeannette Lange ◽  
Andreas Sauerbrei ◽  
Marco Groth ◽  
Matthias Platzer ◽  
...  

The avian-like swine influenza viruses emerged in 1979 in Belgium and Germany. Thereafter, they spread through many European swine-producing countries, replaced the circulating classical swine H1N1 influenza viruses, and became endemic. Serological and subsequent molecular data indicated an avian source, but details remained obscure due to a lack of relevant avian influenza virus sequence data. Here, the origin of the European avian-like swine influenza viruses was analysed using a collection of 16 European swine H1N1 influenza viruses sampled in 1979–1981 in Germany, the Netherlands, Belgium, Italy and France, as well as several contemporaneous avian influenza viruses of various serotypes. The phylogenetic trees suggested a triple reassortant with a unique genotype constellation. Time-resolved maximum clade credibility trees indicated times to the most recent common ancestors of 34–46 years (before 2008) depending on the RNA segment and the method of tree inference.


1980 ◽  
Vol 187 (1) ◽  
pp. 65-74 ◽  
Author(s):  
D Penny ◽  
M D Hendy ◽  
L R Foulds

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.


2016 ◽  
Vol 1 ◽  
pp. 4 ◽  
Author(s):  
Sarah Auburn ◽  
Ulrike Böhme ◽  
Sascha Steinbiss ◽  
Hidayat Trimarsanto ◽  
Jessica Hostetler ◽  
...  

Plasmodium vivax is now the predominant cause of malaria in the Asia-Pacific, South America and Horn of Africa. Laboratory studies of this species are constrained by the inability to maintain the parasite in continuous ex vivo culture, but genomic approaches provide an alternative and complementary avenue to investigate the parasite’s biology and epidemiology. To date, molecular studies of P. vivax have relied on the Salvador-I reference genome sequence, derived from a monkey-adapted strain from South America. However, the Salvador-I reference remains highly fragmented with over 2500 unassembled scaffolds.  Using high-depth Illumina sequence data, we assembled and annotated a new reference sequence, PvP01, sourced directly from a patient from Papua Indonesia. Draft assemblies of isolates from China (PvC01) and Thailand (PvT01) were also prepared for comparative purposes. The quality of the PvP01 assembly is improved greatly over Salvador-I, with fragmentation reduced to 226 scaffolds. Detailed manual curation has ensured highly comprehensive annotation, with functions attributed to 58% core genes in PvP01 versus 38% in Salvador-I. The assemblies of PvP01, PvC01 and PvT01 are larger than that of Salvador-I (28-30 versus 27 Mb), owing to improved assembly of the subtelomeres.  An extensive repertoire of over 1200 Plasmodium interspersed repeat (pir) genes were identified in PvP01 compared to 346 in Salvador-I, suggesting a vital role in parasite survival or development. The manually curated PvP01 reference and PvC01 and PvT01 draft assemblies are important new resources to study vivax malaria. PvP01 is maintained at GeneDB and ongoing curation will ensure continual improvements in assembly and annotation quality.


2020 ◽  
Vol 0 (0) ◽  
pp. 0-0
Author(s):  
Ramy Amer ◽  
Nazmy Abdelghany ◽  
Laila Haggag ◽  
Noha Mansour ◽  
Abdallah Korayem

2021 ◽  
Author(s):  
Nicola De Maio ◽  
Lukas Weilguny ◽  
Conor R. Walker ◽  
Yatish Turakhia ◽  
Russell Corbett-Detig ◽  
...  

AbstractSequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, as well as being part of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software is available fromhttps://github.com/NicolaDM/phastSimand allows easy integration with other Python packages as well as a variety of evolutionary models, including new ones that we developed to more realistically model SARS-CoV-2 genome evolution.


2021 ◽  
Vol 8 ◽  
Author(s):  
Filomena Fortinguerra ◽  
Serena Perna ◽  
Roberto Marini ◽  
Alessandra Dell'Utri ◽  
Maurizio Trapanese ◽  
...  

Objectives: Starting from April 2017, the Italian Medicine Agency (AIFA) has approved new criteria for defining any new medicinal product with an innovative indication. The purpose of the study is to analyze the activity of innovativeness evaluation according to the new approach, to estimate the weight of each criterion considered for innovativeness definition, and to evaluate how the new approach works in terms of consistency and reproducibility.Methods: A retrospective analysis was performed on the final reports evaluating the drug innovativeness assessment published on the AIFA's website between April 2017 and January 2021. Descriptive statistics, chi-square test, whether the conditions were respected, or Fisher's exact test was used to explore the association between characteristics of drugs and the innovativeness status and the association between the three criteria. Profiles of the decision process and their relationship with innovativeness response were described. In order to evaluate the weight of each criterion in predicting the innovativeness status, a Classification Tree (CT) algorithm was applied.Results: Overall, of the 109 published drugs reports, 37 (33.9%) were recognized as fully innovative, 29 (26.6%) were considered conditionally innovative, while for 43 (39.4%) reports innovativeness was not recognized. Considering the three criteria of the decision process, the added therapeutic value was the only criterion statistically associated with a drug's degree of innovation (p < 0.001). The therapeutic need and the quality of clinical evidence were statistically associated (p = 0.008) even if only a mild association was observed. The added therapeutic value was the most important variable in predicting the innovativeness status according to the classification tree (CT) model applied, achieving an accuracy of 89.4%. No difference was found between orphans and non-orphan drugs or oncological and non-oncological drugs.Discussion: The added therapeutic value is the most important criterion of the multidimensional approach for the innovativeness status definition of a new medical product. A mild association was found between the therapeutic need and the quality of evidence. Overall, similar decision profiles bring the same evaluation of innovativeness status, indicating a good consistency and reproducibility between decisions.


Sign in / Sign up

Export Citation Format

Share Document