Distance Measures for Tumor Evolutionary Trees

Mapping Intimacies ◽

10.1101/591107 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zach DiNardo ◽

Kiran Tomlinson ◽

Anna Ritz ◽

Layla Oesper

Keyword(s):

Breast Cancer ◽

Tree Topology ◽

Distance Measures ◽

Evolutionary Trees ◽

Complex Relationship ◽

Developmental History ◽

Inheritance Patterns ◽

Tree Inference ◽

Subclonal Mutation ◽

Inference Methods

AbstractIn recent years, there has been increased interest in studying cancer by using algorithmic methods to infer the evolutionary tree underlying a tumor’s developmental history. Quantitative measures that compare such trees are then vital to benchmarking these algorithmic tree inference methods, understanding the structure of the space of possible trees for a given dataset, and clustering together similar trees in order to evaluate inheritance patterns. However, few appropriate distance measures exist, and those that do exist have low resolution for differentiating trees or do not fully account for the complex relationship between tree topology and how the mutations that label that topology are inherited. Here we present two novel distance measures,CommonAncestorSetdistance (CASet) andDistinctlyInheritedSetComparison distance (DISC), that are specifically designed to account for the subclonal mutation inheritance patterns characteristic of tumor evolutionary trees. We apply CASet and DISC to two simulated and two breast cancer datasets and show that our distance measures allow for more nuanced and accurate delineation between tumor evolutionary trees than existing distance measures. Implementations of CASet and DISC are available at:https://bitbucket.org/oesperlab/stereodist.

Download Full-text

Distance measures for tumor evolutionary trees

Bioinformatics ◽

10.1093/bioinformatics/btz869 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2090-2097

Author(s):

Zach DiNardo ◽

Kiran Tomlinson ◽

Anna Ritz ◽

Layla Oesper

Keyword(s):

Common Ancestor ◽

Distance Measures ◽

Supplementary Information ◽

Evolutionary Trees ◽

Developmental History ◽

Inheritance Patterns ◽

Tree Inference ◽

History Of ◽

Subclonal Mutation ◽

Inference Methods

Abstract Motivation There has been recent increased interest in using algorithmic methods to infer the evolutionary tree underlying the developmental history of a tumor. Quantitative measures that compare such trees are vital to a number of different applications including benchmarking tree inference methods and evaluating common inheritance patterns across patients. However, few appropriate distance measures exist, and those that do have low resolution for differentiating trees or do not fully account for the complex relationship between tree topology and the inheritance of the mutations labeling that topology. Results Here, we present two novel distance measures, Common Ancestor Set distance (CASet) and Distinctly Inherited Set Comparison distance (DISC), that are specifically designed to account for the subclonal mutation inheritance patterns characteristic of tumor evolutionary trees. We apply CASet and DISC to multiple simulated datasets and two breast cancer datasets and show that our distance measures allow for more nuanced and accurate delineation between tumor evolutionary trees than existing distance measures. Availability and implementation Implementations of CASet and DISC are freely available at: https://bitbucket.org/oesperlab/stereodist. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genome-scale data resolve ancestral rock-inhabiting lifestyle in Dothideomycetes (Ascomycota)

IMA Fungus ◽

10.1186/s43008-019-0018-2 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Claudio G. Ametrano ◽

Felix Grewe ◽

Pedro W. Crous ◽

Stephen B. Goodwin ◽

Chen Liang ◽

...

Keyword(s):

Character State ◽

Ancestral Character State ◽

Wide Range ◽

Tree Inference ◽

Ancestral Character State Reconstruction ◽

Inference Methods ◽

Genome Scale ◽

The Impact ◽

Rock Inhabiting Fungi ◽

Scale Data

Abstract Dothideomycetes is the most diverse fungal class in Ascomycota and includes species with a wide range of lifestyles. Previous multilocus studies have investigated the taxonomic and evolutionary relationships of these taxa but often failed to resolve early diverging nodes and frequently generated inconsistent placements of some clades. Here, we use a phylogenomic approach to resolve relationships in Dothideomycetes, focusing on two genera of melanized, extremotolerant rock-inhabiting fungi, Lichenothelia and Saxomyces, that have been suggested to be early diverging lineages. We assembled phylogenomic datasets from newly sequenced (4) and previously available genomes (238) of 242 taxa. We explored the influence of tree inference methods, supermatrix vs. coalescent-based species tree, and the impact of varying amounts of genomic data. Overall, our phylogenetic reconstructions provide consistent and well-supported topologies for Dothideomycetes, recovering Lichenothelia and Saxomyces among the earliest diverging lineages in the class. In addition, many of the major lineages within Dothideomycetes are recovered as monophyletic, and the phylogenomic approach implemented strongly supports their relationships. Ancestral character state reconstruction suggest that the rock-inhabiting lifestyle is ancestral within the class.

Download Full-text

Obesity and Breast Cancer: A Complex Relationship

Current Surgery Reports ◽

10.1007/s40137-016-0134-5 ◽

2016 ◽

Vol 4 (4) ◽

Cited By ~ 4

Author(s):

Victoria M. Gershuni ◽

Rexford S. Ahima ◽

Julia Tchou

Keyword(s):

Breast Cancer ◽

Complex Relationship

Download Full-text

Phylogeny of the cycads based on multiple single-copy nuclear genes: congruence of concatenated parsimony, likelihood and species tree inference methods

Annals of Botany ◽

10.1093/aob/mct192 ◽

2013 ◽

Vol 112 (7) ◽

pp. 1263-1278 ◽

Cited By ~ 60

Author(s):

Dayana E. Salas-Leiva ◽

Alan W. Meerow ◽

Michael Calonje ◽

M. Patrick Griffith ◽

Javier Francisco-Ortega ◽

...

Keyword(s):

Single Copy ◽

Species Tree ◽

Nuclear Genes ◽

Tree Inference ◽

Inference Methods ◽

Species Tree Inference

Download Full-text

Phylogenetic conflicts, combinability, and deep phylogenomics in plants

10.1101/371930 ◽

2018 ◽

Cited By ~ 1

Author(s):

Stephen A. Smith ◽

Nathanael Walker-Hale ◽

Joseph F. Walker ◽

Joseph W. Brown

Keyword(s):

Phylogenetic Relationships ◽

Phylogenetic Signal ◽

Gene Tree ◽

Species Tree ◽

Gene Trees ◽

Data Filtering ◽

Tree Inference ◽

Tree Methods ◽

Inference Methods ◽

Species Tree Inference

AbstractStudies have demonstrated that pervasive gene tree conflict underlies several important phylogenetic relationships where different species tree methods produce conflicting results. Here, we present a means of dissecting the phylogenetic signal for alternative resolutions within a dataset in order to resolve recalcitrant relationships and, importantly, identify what the dataset is unable to resolve. These procedures extend upon methods for isolating conflict and concordance involving specific candidate relationships and can be used to identify systematic error and disambiguate sources of conflict among species tree inference methods. We demonstrate these on a large phylogenomic plant dataset. Our results support the placement of Amborella as sister to the remaining extant angiosperms, Gnetales as sister to pines, and the monophyly of extant gymnosperms. Several other contentious relationships, including the resolution of relationships within the bryophytes and the eudicots, remain uncertain given the low number of supporting gene trees. To address whether concatenation of filtered genes amplified phylogenetic signal for relationships, we implemented a combinatorial heuristic to test combinability of genes. We found that nested conflicts limited the ability of data filtering methods to fully ameliorate conflicting signal amongst gene trees. These analyses confirmed that the underlying conflicting signal does not support broad concatenation of genes. Our approach provides a means of dissecting a specific dataset to address deep phylogenetic relationships while also identifying the inferential boundaries of the dataset.

Download Full-text

Species Tree Inference under the Multispecies Coalescent on Data with Paralogs is Accurate

10.1101/498378 ◽

2018 ◽

Cited By ~ 10

Author(s):

Zhi Yan ◽

Peng Du ◽

Matthew W. Hahn ◽

Luay Nakhleh

Keyword(s):

Incomplete Lineage Sorting ◽

Single Copy ◽

Species Tree ◽

Biological Data ◽

Lineage Sorting ◽

Multispecies Coalescent ◽

Gene Copies ◽

Tree Inference ◽

Inference Methods ◽

Species Tree Inference

AbstractThe multispecies coalescent (MSC) has emerged as a powerful and desirable framework for species tree inference in phylogenomic studies. Under this framework, the data for each locus is assumed to consist of orthologous, single-copy genes, and heterogeneity across loci is assumed to be due to incomplete lineage sorting (ILS). These assumptions have led biologists that use ILS-aware inference methods, whether based directly on the MSC or proven to be statistically consistent under it (collectively referred to here as MSC-based methods), to exclude all loci that are present in more than a single copy in any of the studied genomes. Furthermore, such analyses entail orthology assignment to avoid the potential of hidden paralogy in the data. The question we seek to answer in this study is: What happens if one runs such species tree inference methods on data where paralogy is present, in addition to or without ILS being present? Through simulation studies and analyses of two biological data sets, we show that running such methods on data with paralogs provide very accurate results, either by treating all gene copies within a family as alleles from multiple individuals or by randomly selecting one copy per species. Our results have significant implications for the use of MSC-based phylogenomic analyses, demonstrating that they do not have to be restricted to single-copy loci, thus greatly increasing the amount of data that can be used. [Multispecies coalescent; incomplete lineage sorting; gene duplication and loss; orthology; paralogy.]

Download Full-text

Benchmarking Orthogroup Inference Accuracy: Revisiting Orthobench

10.1101/2020.07.09.195586 ◽

2020 ◽

Author(s):

D.M. Emms ◽

S. Kelly

Keyword(s):

Method Development ◽

Genomic Analysis ◽

Comparative Genomic ◽

Comparative Performance ◽

Inference Algorithms ◽

Benchmark Database ◽

Tree Inference ◽

Computational Resources ◽

Inference Methods ◽

Phylogenetic Revision

AbstractOrthobench is the standard benchmark to assess the accuracy of orthogroup inference methods. It contains 70 expert curated reference orthogroups (RefOGs) that span the Bilateria and cover a range of different challenges for orthogroup inference. Here we leveraged improvements in tree inference algorithms and computational resources to re-interrogate these RefOGs and carry out an extensive phylogenetic delineation of their composition. This phylogenetic revision altered the membership of 31 of the 70 RefOGs, with 24 subject to extensive revision and a further 7 that required minor changes. We further used these revised and updated RefOGs to provide an assessment of the orthogroup inference accuracy of widely used orthogroup inference methods. Finally, we provide an open-source benchmarking suite to support the future development and use of the Orthobench benchmark.Significance statementOrthogroup inference forms the foundation of comparative genomic analysis. Benchmarks to evaluate performance are essential to enable these methods to be compared and stimulate further method development. Here we present an update to the orthobench benchmark database and provide a comparative performance evaluation of commonly used orthogroup inference methods.

Download Full-text

Using all gene families vastly expands data available for phylogenomic inference in primates

10.1101/2021.09.22.461252 ◽

2021 ◽

Author(s):

Megan L Smith ◽

Dan Vanderpool ◽

Matthew W. Hahn

Keyword(s):

Branch Length ◽

Gene Families ◽

Phylogenetic Inference ◽

Single Copy ◽

Decomposition Methods ◽

Species Tree ◽

Primate Species ◽

Tree Inference ◽

Inference Methods ◽

Species Tree Inference

Traditionally, single-copy orthologs have been the gold standard in phylogenomics. Most phylogenomic studies identify putative single-copy orthologs by using clustering approaches and retaining families with a single sequence from each species. However, this approach can severely limit the amount of data available by excluding larger families. Recent methodological advances have suggested several ways to include data from larger families. For instance, tree-based decomposition methods facilitate the extraction of orthologs from large families. Additionally, several popular methods for species tree inference appear to be robust to the inclusion of paralogs, and hence could use all of the data from larger families. Here, we explore the effects of using all families for phylogenetic inference using genomes from 26 primate species. We compare single-copy families, orthologs extracted using tree-based decomposition approaches, and all families with all data (i.e., including orthologs and paralogs). We explore several species tree inference methods, finding that across all nodes of the tree except one, identical trees are returned across nearly all datasets and methods. As in previous studies, the relationships among Platyrrhini remain contentious; however, the tree inference methods matter more than the dataset used. We also assess the effects of each dataset on branch length estimates, measures of phylogenetic uncertainty and concordance, and in detecting introgression. Our results demonstrate that using data from larger gene families drastically increases the number of genes available for phylogenetic inference and leads to consistent estimates of branch lengths, nodal certainty and concordance, and inferences of introgression.

Download Full-text

Comparing phylogenetic trees according to tip label categories

10.1101/251710 ◽

2018 ◽

Cited By ~ 5

Author(s):

Michelle Kendall ◽

Vegard Eldholm ◽

Caroline Colijn

Keyword(s):

Phylogenetic Trees ◽

Sequence Data ◽

Classification Tree ◽

Hierarchical Classification ◽

Evolutionary Trees ◽

Tree Metrics ◽

Inference Methods ◽

Different Sources ◽

Central Tool

AbstractTrees that illustrate patterns of ancestry and evolution are a central tool in many areas of biology. Comparing evolutionary trees to each other has widespread applications in comparing the evolutionary stories told by different sources of data, assessing the quality of inference methods, and highlighting areas where patterns of ancestry are uncertain. While these tasks are complicated by the fact that trees are high-dimensional structures encoding a large amount of information, there are a number of metrics suitable for comparing evolutionary trees whose tips have the same set of unique labels. There are also metrics for comparing trees where there is no relationship between their labels: in ‘unlabelled’ tree metrics the tree shapes are compared without reference to the tip labels.In many interesting applications, however, the taxa present in two or more trees are related but not identical, and it is informative to compare the trees whilst retaining information about their tips’ relationships. We present methods for comparing trees whose labels belong to a pre-defined set of categories. The methods include a measure of distance between two such trees, and a measure of concordance between one such tree and a hierarchical classification tree of the unique categories. We demonstrate the intuition of our methods with some toy examples before presenting an analysis of Mycobacterium tuberculosis trees, in which we use our methods to quantify the differences between trees built from typing versus sequence data.

Download Full-text

A Beta-splitting model for evolutionary trees

Royal Society Open Science ◽

10.1098/rsos.160016 ◽

2016 ◽

Vol 3 (5) ◽

pp. 160016 ◽

Cited By ~ 8

Author(s):

Raazesh Sainudiin ◽

Amandine Véber

Keyword(s):

Continuous Time ◽

Rooted Tree ◽

Tree Topology ◽

Sister Species ◽

Evolutionary Trees ◽

Diversification Rates ◽

Branch Lengths ◽

The Creation ◽

Natural Way

In this article, we construct a generalization of the Blum–François Beta-splitting model for evolutionary trees, which was itself inspired by Aldous' Beta-splitting model on cladograms. The novelty of our approach allows for asymmetric shares of diversification rates (or diversification ‘potential’) between two sister species in an evolutionarily interpretable manner, as well as the addition of extinction to the model in a natural way. We describe the incremental evolutionary construction of a tree with n leaves by splitting or freezing extant lineages through the generating, organizing and deleting processes. We then give the probability of any (binary rooted) tree under this model with no extinction, at several resolutions: ranked planar trees giving asymmetric roles to the first and second offspring species of a given species and keeping track of the order of the speciation events occurring during the creation of the tree, unranked planar trees , ranked non-planar trees and finally ( unranked non-planar ) trees . We also describe a continuous-time equivalent of the generating, organizing and deleting processes where tree topology and branch lengths are jointly modelled and provide code in SageMath/Python for these algorithms.

Download Full-text