scholarly journals Distance Measures for Tumor Evolutionary Trees

2019 ◽  
Author(s):  
Zach DiNardo ◽  
Kiran Tomlinson ◽  
Anna Ritz ◽  
Layla Oesper

AbstractIn recent years, there has been increased interest in studying cancer by using algorithmic methods to infer the evolutionary tree underlying a tumor’s developmental history. Quantitative measures that compare such trees are then vital to benchmarking these algorithmic tree inference methods, understanding the structure of the space of possible trees for a given dataset, and clustering together similar trees in order to evaluate inheritance patterns. However, few appropriate distance measures exist, and those that do exist have low resolution for differentiating trees or do not fully account for the complex relationship between tree topology and how the mutations that label that topology are inherited. Here we present two novel distance measures,CommonAncestorSetdistance (CASet) andDistinctlyInheritedSetComparison distance (DISC), that are specifically designed to account for the subclonal mutation inheritance patterns characteristic of tumor evolutionary trees. We apply CASet and DISC to two simulated and two breast cancer datasets and show that our distance measures allow for more nuanced and accurate delineation between tumor evolutionary trees than existing distance measures. Implementations of CASet and DISC are available at:https://bitbucket.org/oesperlab/stereodist.

2019 ◽  
Vol 36 (7) ◽  
pp. 2090-2097
Author(s):  
Zach DiNardo ◽  
Kiran Tomlinson ◽  
Anna Ritz ◽  
Layla Oesper

Abstract Motivation There has been recent increased interest in using algorithmic methods to infer the evolutionary tree underlying the developmental history of a tumor. Quantitative measures that compare such trees are vital to a number of different applications including benchmarking tree inference methods and evaluating common inheritance patterns across patients. However, few appropriate distance measures exist, and those that do have low resolution for differentiating trees or do not fully account for the complex relationship between tree topology and the inheritance of the mutations labeling that topology. Results Here, we present two novel distance measures, Common Ancestor Set distance (CASet) and Distinctly Inherited Set Comparison distance (DISC), that are specifically designed to account for the subclonal mutation inheritance patterns characteristic of tumor evolutionary trees. We apply CASet and DISC to multiple simulated datasets and two breast cancer datasets and show that our distance measures allow for more nuanced and accurate delineation between tumor evolutionary trees than existing distance measures. Availability and implementation Implementations of CASet and DISC are freely available at: https://bitbucket.org/oesperlab/stereodist. Supplementary information Supplementary data are available at Bioinformatics online.


IMA Fungus ◽  
2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Claudio G. Ametrano ◽  
Felix Grewe ◽  
Pedro W. Crous ◽  
Stephen B. Goodwin ◽  
Chen Liang ◽  
...  

Abstract Dothideomycetes is the most diverse fungal class in Ascomycota and includes species with a wide range of lifestyles. Previous multilocus studies have investigated the taxonomic and evolutionary relationships of these taxa but often failed to resolve early diverging nodes and frequently generated inconsistent placements of some clades. Here, we use a phylogenomic approach to resolve relationships in Dothideomycetes, focusing on two genera of melanized, extremotolerant rock-inhabiting fungi, Lichenothelia and Saxomyces, that have been suggested to be early diverging lineages. We assembled phylogenomic datasets from newly sequenced (4) and previously available genomes (238) of 242 taxa. We explored the influence of tree inference methods, supermatrix vs. coalescent-based species tree, and the impact of varying amounts of genomic data. Overall, our phylogenetic reconstructions provide consistent and well-supported topologies for Dothideomycetes, recovering Lichenothelia and Saxomyces among the earliest diverging lineages in the class. In addition, many of the major lineages within Dothideomycetes are recovered as monophyletic, and the phylogenomic approach implemented strongly supports their relationships. Ancestral character state reconstruction suggest that the rock-inhabiting lifestyle is ancestral within the class.


2016 ◽  
Vol 4 (4) ◽  
Author(s):  
Victoria M. Gershuni ◽  
Rexford S. Ahima ◽  
Julia Tchou

2013 ◽  
Vol 112 (7) ◽  
pp. 1263-1278 ◽  
Author(s):  
Dayana E. Salas-Leiva ◽  
Alan W. Meerow ◽  
Michael Calonje ◽  
M. Patrick Griffith ◽  
Javier Francisco-Ortega ◽  
...  

2018 ◽  
Author(s):  
Stephen A. Smith ◽  
Nathanael Walker-Hale ◽  
Joseph F. Walker ◽  
Joseph W. Brown

AbstractStudies have demonstrated that pervasive gene tree conflict underlies several important phylogenetic relationships where different species tree methods produce conflicting results. Here, we present a means of dissecting the phylogenetic signal for alternative resolutions within a dataset in order to resolve recalcitrant relationships and, importantly, identify what the dataset is unable to resolve. These procedures extend upon methods for isolating conflict and concordance involving specific candidate relationships and can be used to identify systematic error and disambiguate sources of conflict among species tree inference methods. We demonstrate these on a large phylogenomic plant dataset. Our results support the placement of Amborella as sister to the remaining extant angiosperms, Gnetales as sister to pines, and the monophyly of extant gymnosperms. Several other contentious relationships, including the resolution of relationships within the bryophytes and the eudicots, remain uncertain given the low number of supporting gene trees. To address whether concatenation of filtered genes amplified phylogenetic signal for relationships, we implemented a combinatorial heuristic to test combinability of genes. We found that nested conflicts limited the ability of data filtering methods to fully ameliorate conflicting signal amongst gene trees. These analyses confirmed that the underlying conflicting signal does not support broad concatenation of genes. Our approach provides a means of dissecting a specific dataset to address deep phylogenetic relationships while also identifying the inferential boundaries of the dataset.


2018 ◽  
Author(s):  
Zhi Yan ◽  
Peng Du ◽  
Matthew W. Hahn ◽  
Luay Nakhleh

AbstractThe multispecies coalescent (MSC) has emerged as a powerful and desirable framework for species tree inference in phylogenomic studies. Under this framework, the data for each locus is assumed to consist of orthologous, single-copy genes, and heterogeneity across loci is assumed to be due to incomplete lineage sorting (ILS). These assumptions have led biologists that use ILS-aware inference methods, whether based directly on the MSC or proven to be statistically consistent under it (collectively referred to here as MSC-based methods), to exclude all loci that are present in more than a single copy in any of the studied genomes. Furthermore, such analyses entail orthology assignment to avoid the potential of hidden paralogy in the data. The question we seek to answer in this study is: What happens if one runs such species tree inference methods on data where paralogy is present, in addition to or without ILS being present? Through simulation studies and analyses of two biological data sets, we show that running such methods on data with paralogs provide very accurate results, either by treating all gene copies within a family as alleles from multiple individuals or by randomly selecting one copy per species. Our results have significant implications for the use of MSC-based phylogenomic analyses, demonstrating that they do not have to be restricted to single-copy loci, thus greatly increasing the amount of data that can be used. [Multispecies coalescent; incomplete lineage sorting; gene duplication and loss; orthology; paralogy.]


2020 ◽  
Author(s):  
D.M. Emms ◽  
S. Kelly

AbstractOrthobench is the standard benchmark to assess the accuracy of orthogroup inference methods. It contains 70 expert curated reference orthogroups (RefOGs) that span the Bilateria and cover a range of different challenges for orthogroup inference. Here we leveraged improvements in tree inference algorithms and computational resources to re-interrogate these RefOGs and carry out an extensive phylogenetic delineation of their composition. This phylogenetic revision altered the membership of 31 of the 70 RefOGs, with 24 subject to extensive revision and a further 7 that required minor changes. We further used these revised and updated RefOGs to provide an assessment of the orthogroup inference accuracy of widely used orthogroup inference methods. Finally, we provide an open-source benchmarking suite to support the future development and use of the Orthobench benchmark.Significance statementOrthogroup inference forms the foundation of comparative genomic analysis. Benchmarks to evaluate performance are essential to enable these methods to be compared and stimulate further method development. Here we present an update to the orthobench benchmark database and provide a comparative performance evaluation of commonly used orthogroup inference methods.


2021 ◽  
Author(s):  
Megan L Smith ◽  
Dan Vanderpool ◽  
Matthew W. Hahn

Traditionally, single-copy orthologs have been the gold standard in phylogenomics. Most phylogenomic studies identify putative single-copy orthologs by using clustering approaches and retaining families with a single sequence from each species. However, this approach can severely limit the amount of data available by excluding larger families. Recent methodological advances have suggested several ways to include data from larger families. For instance, tree-based decomposition methods facilitate the extraction of orthologs from large families. Additionally, several popular methods for species tree inference appear to be robust to the inclusion of paralogs, and hence could use all of the data from larger families. Here, we explore the effects of using all families for phylogenetic inference using genomes from 26 primate species. We compare single-copy families, orthologs extracted using tree-based decomposition approaches, and all families with all data (i.e., including orthologs and paralogs). We explore several species tree inference methods, finding that across all nodes of the tree except one, identical trees are returned across nearly all datasets and methods. As in previous studies, the relationships among Platyrrhini remain contentious; however, the tree inference methods matter more than the dataset used. We also assess the effects of each dataset on branch length estimates, measures of phylogenetic uncertainty and concordance, and in detecting introgression. Our results demonstrate that using data from larger gene families drastically increases the number of genes available for phylogenetic inference and leads to consistent estimates of branch lengths, nodal certainty and concordance, and inferences of introgression.


2018 ◽  
Author(s):  
Michelle Kendall ◽  
Vegard Eldholm ◽  
Caroline Colijn

AbstractTrees that illustrate patterns of ancestry and evolution are a central tool in many areas of biology. Comparing evolutionary trees to each other has widespread applications in comparing the evolutionary stories told by different sources of data, assessing the quality of inference methods, and highlighting areas where patterns of ancestry are uncertain. While these tasks are complicated by the fact that trees are high-dimensional structures encoding a large amount of information, there are a number of metrics suitable for comparing evolutionary trees whose tips have the same set of unique labels. There are also metrics for comparing trees where there is no relationship between their labels: in ‘unlabelled’ tree metrics the tree shapes are compared without reference to the tip labels.In many interesting applications, however, the taxa present in two or more trees are related but not identical, and it is informative to compare the trees whilst retaining information about their tips’ relationships. We present methods for comparing trees whose labels belong to a pre-defined set of categories. The methods include a measure of distance between two such trees, and a measure of concordance between one such tree and a hierarchical classification tree of the unique categories. We demonstrate the intuition of our methods with some toy examples before presenting an analysis of Mycobacterium tuberculosis trees, in which we use our methods to quantify the differences between trees built from typing versus sequence data.


2016 ◽  
Vol 3 (5) ◽  
pp. 160016 ◽  
Author(s):  
Raazesh Sainudiin ◽  
Amandine Véber

In this article, we construct a generalization of the Blum–François Beta-splitting model for evolutionary trees, which was itself inspired by Aldous' Beta-splitting model on cladograms. The novelty of our approach allows for asymmetric shares of diversification rates (or diversification ‘potential’) between two sister species in an evolutionarily interpretable manner, as well as the addition of extinction to the model in a natural way. We describe the incremental evolutionary construction of a tree with n leaves by splitting or freezing extant lineages through the generating, organizing and deleting processes. We then give the probability of any (binary rooted) tree under this model with no extinction, at several resolutions: ranked planar trees giving asymmetric roles to the first and second offspring species of a given species and keeping track of the order of the speciation events occurring during the creation of the tree, unranked planar trees , ranked non-planar trees and finally ( unranked non-planar ) trees . We also describe a continuous-time equivalent of the generating, organizing and deleting processes where tree topology and branch lengths are jointly modelled and provide code in SageMath/Python for these algorithms.


Sign in / Sign up

Export Citation Format

Share Document