A New Phylogenetic Inference Based on Genetic Attribute Reduction for Morphological Data

Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well known Matrix Representation with Parsimony, others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the final supertree. We find this modifications to increase the number of true positive clades by 16% while decreasing the number of false positive clades by 3% compared to the currently used Overlap scoring.

Download Full-text

Collecting reliable clades using the Greedy Strict Consensus Merger

PeerJ ◽

10.7717/peerj.2172 ◽

2016 ◽

Vol 4 ◽

pp. e2172 ◽

Cited By ~ 5

Author(s):

Markus Fleischauer ◽

Sebastian Böcker

Keyword(s):

Computational Complexity ◽

Phylogenetic Trees ◽

Optimization Problems ◽

Matrix Representation ◽

Phylogenetic Inference ◽

Scoring Functions ◽

True Positive ◽

Worst Case ◽

Inference Methods ◽

Supertree Methods

Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well-known Matrix Representation with Parsimony, while others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the GSCM supertree. We find this modifications to increase the number of true positive clades by 18% compared to the currently used Overlap scoring.

Download Full-text

Collecting reliable clades using the Greedy Strict Consensus Merger

10.7287/peerj.preprints.1297 ◽

2015 ◽

Author(s):

Markus Fleischauer ◽

Sebastian Böcker

Keyword(s):

Computational Complexity ◽

Phylogenetic Trees ◽

Optimization Problems ◽

Matrix Representation ◽

Phylogenetic Inference ◽

Scoring Functions ◽

True Positive ◽

Worst Case ◽

Inference Methods ◽

Supertree Methods

Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well known Matrix Representation with Parsimony, others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the final supertree. We find this modifications to increase the number of true positive clades by 16% while decreasing the number of false positive clades by 3% compared to the currently used Overlap scoring.

Download Full-text

Collecting reliable clades using the Greedy Strict Consensus Merger

10.7287/peerj.preprints.1297v1 ◽

2015 ◽

Author(s):

Markus Fleischauer ◽

Sebastian Böcker

Keyword(s):

Computational Complexity ◽

Phylogenetic Trees ◽

Optimization Problems ◽

Matrix Representation ◽

Phylogenetic Inference ◽

Scoring Functions ◽

True Positive ◽

Worst Case ◽

Inference Methods ◽

Supertree Methods

Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well known Matrix Representation with Parsimony, others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the final supertree. We find this modifications to increase the number of true positive clades by 16% while decreasing the number of false positive clades by 3% compared to the currently used Overlap scoring.

Download Full-text

Breaking bud: probing the scalability limits of phylogenetic network inference methods

10.1101/056572 ◽

2016 ◽

Author(s):

Hussein A Hejase ◽

Kevin J Liu

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Network Inference ◽

State Of The Art ◽

Probabilistic Inference ◽

Phylogenetic Network ◽

Main Memory ◽

Tree Inference ◽

Dataset Size ◽

Inference Methods

AbstractBackgroundBranching events in phylogenetic trees reflect strictly bifurcating and/or multifurcating speciation and splitting events. In the presence of gene flow, a phylogeny cannot be described by a tree but is instead a directed acyclic graph known as a phylogenetic network. Both phylogenetic trees and networks are typically reconstructed using computational analysis of multi-locus sequence data. The advent of high-throughput sequencing technologies has brought about two main scalability challenges:(1) dataset size in terms of the number of taxa and (2) the evolutionary divergence of the taxa in a study. The impact of both dimensions of scale on phylogenetic tree inference has been well characterized by recent studies; in contrast, the scalability limits of phylogenetic network inference methods are largely unknown. In this study, we quantify the performance of state-of-the-art phylogenetic network inference methods on large-scale datasets using empirical data sampled from natural mouse populations and synthetic data capturing a wide range of evolutionary scenarios.ResultsWe find that, as in the case of phylogenetic tree inference, the performance of leading network inference methods is negatively impacted by both dimensions of dataset scale. In general, we found that topological accuracy degrades as the number of taxa increases; a similar effect was observed with increased sequence mutation rate. The most accurate methods were probabilistic inference methods which maximize either likelihood under coalescent-based models or pseudo-likelihood approximations to the model likelihood. Furthermore, probabilistic inference methods with optimization criteria which did not make use of gene tree root and/or branch length information performed best-a result that runs contrary to widely held assumptions in the literature. The improved accuracy obtained with probabilistic inference methods comes at a computational cost in terms of runtime and main memory usage, which quickly become prohibitive as dataset size grows past thirty taxa.ConclusionsWe conclude that the state of the art of phylogenetic network inference lags well behind the scope of current phylogenomic studies. New algorithmic development is critically needed to address this methodological gap.

Download Full-text

TreeCluster: Massively scalable transmission clustering using phylogenetic trees

10.1101/261354 ◽

2018 ◽

Cited By ~ 3

Author(s):

Niema Moshiri

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Control Strategies ◽

Molecular Data ◽

Viral Control ◽

Standard Methods ◽

Cross Platform ◽

Clustering Optimization ◽

Inference Methods ◽

Viral Sequencing

AbstractBackgroundThe ability to infer transmission clusters from molecular data is critical to designing and evaluating viral control strategies. Viral sequencing datasets are growing rapidly, but standard methods of transmission cluster inference do not scale well beyond thousands of sequences.ResultsI present TreeCluster, a cross-platform tool that performs transmission cluster inference on a given phylogenetic tree orders of magnitude faster than existing inference methods and supports multiple clustering optimization functions.ConclusionsTreeCluster is a freely-available cross-platform open source Python 3 tool for inferring transmission clusters from phylogenetic trees. Code, usage information, and in-depth descriptions of the implemented clustering modes are available publicly at the following repository:https://github.com/niemasd/TreeCluster

Download Full-text

Efficient Bayesian inference of phylogenetic trees from large scale, low-depth genome-wide single-cell data

10.1101/2020.05.06.058180 ◽

2020 ◽

Cited By ~ 1

Author(s):

Fatemeh Dorri ◽

Sohrab Salehi ◽

Kevin Chern ◽

Tyler Funnell ◽

Marc Williams ◽

...

Keyword(s):

Bayesian Inference ◽

High Resolution ◽

Phylogenetic Tree ◽

Single Cell ◽

Copy Number ◽

Phylogenetic Trees ◽

Evolutionary Dynamics ◽

Phylogenetic Inference ◽

Copy Number Data ◽

Polynomial Size

A new generation of scalable single cell whole genome sequencing (scWGS) methods, allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cells populations. Phylogenetic reconstruction is central to identifying sub-populations and distinguishing mutational processes. The ability to sequence tens of thousands of single genomes at high resolution per experiment is challenging the assumptions and scalability of existing phylogenetic tree building methods and calls for tailored phylogenetic models and scalable inference algorithms. We propose a phylogenetic model and associated Bayesian inference procedure which exploits the specifics of scWGS data. A first highlight of our approach is a novel phylogenetic encoding of copy-number data providing an attractive statistical-computational trade-off by simplifying the site dependencies induced by rearrangements while still forming a sound foundation to phylogenetic inference. A second highlight is an innovative phylogenetic tree exploration move which makes the cost of MCMC iterations bounded by O(|C| + |L|), where |C| is the number of cells and |L| is the number of loci. In contrast, existing off-the-shelf likelihood-based methods incur iteration cost of O(|C| |L|). Moreover, the novel move considers an exponential number of neighbouring trees whereas off-the-shelf moves consider a polynomial size set of neighbours. The third highlight is a novel mutation calling method that incorporates the copy-number data and the underlying phylogenetic tree to overcome the missing data issue. This framework allows us to realistically consider routine Bayesian phylogenetic inference at the scale of scWGS data.

Download Full-text

Collecting reliable clades using the Greedy Strict Consensus Merger

10.7287/peerj.preprints.1297v3 ◽

2015 ◽

Author(s):

Markus Fleischauer ◽

Sebastian Böcker

Keyword(s):

Computational Complexity ◽

Phylogenetic Trees ◽

Optimization Problems ◽

Matrix Representation ◽

Phylogenetic Inference ◽

Scoring Functions ◽

True Positive ◽

Worst Case ◽

Inference Methods ◽

Supertree Methods

Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well known Matrix Representation with Parsimony, others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the final supertree. We find this modifications to increase the number of true positive clades by 16% while decreasing the number of false positive clades by 3% compared to the currently used Overlap scoring.

Download Full-text

Computing nearest neighbour interchange distances between ranked phylogenetic trees

Journal of Mathematical Biology ◽

10.1007/s00285-021-01567-5 ◽

2021 ◽

Vol 82 (1-2) ◽

Author(s):

Lena Collienne ◽

Alex Gavryushkin

Keyword(s):

Cancer Research ◽

Computational Complexity ◽

Phylogenetic Tree ◽

Shortest Path ◽

Phylogenetic Trees ◽

Shortest Paths ◽

Nearest Neighbour ◽

Tree Inference ◽

Subtree Prune And Regraft ◽

Comparison Algorithms

AbstractMany popular algorithms for searching the space of leaf-labelled (phylogenetic) trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is $${\mathbf {N}}{\mathbf {P}}$$ N P -hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although anked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to $${\mathbf {N}}{\mathbf {P}}$$ N P -hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with tens of thousands of leaves (and likely hundreds of thousands if implemented efficiently).

Download Full-text

WildTermitomycesSpecies Collected from Ondo and Ekiti States Are More Related to African Species as Revealed by ITS Region of rDNA

The Scientific World JOURNAL ◽

10.1100/2012/689296 ◽

2012 ◽

Vol 2012 ◽

pp. 1-5 ◽

Cited By ~ 2

Author(s):

Victor Olusegun Oyetayo

Keyword(s):

Sequence Analysis ◽

Phylogenetic Tree ◽

Molecular Identification ◽

Internal Transcribed Spacer ◽

Its Region ◽

Its Sequence ◽

African Countries ◽

African Species ◽

Its Sequence Analysis ◽

Degree Of Similarity

Molecular identification of eighteenTermitomycesspecies collected from two states, Ondo and Ekiti in Nigeria was carried out using the internal transcribed spacer (ITS) region. The amplicons obtained from rDNA ofTermitomycesspecies were compared with existing sequences in the NCBI GenBank. The results of the ITS sequence analysis discriminated between all theTermitomycesspecies (obtained from Ondo and Ekiti States) andTermitomycessp. sequences obtained from NCBI GenBank. The degree of similarity of T1 to T18 to gene ofTermitomycessp. obtained from NCBI ranges between 82 and 99 percent.Termitomycesspecies from Garbon with ascension number AF321374 was the closest relative of T1 to T18 except T12 that has T. eurhizus and T. striatus as the closet relative. Phylogenetic tree generated with ITS sequences obtained from NCBI GenBank data revealed that T1 to T18 are more related toTermitomycesspecies indigenous to African countries such as Senegal, Congo, and Gabon.

Download Full-text