scholarly journals A deep learning approach for building multiple trees classification

2019 ◽  
Author(s):  
Nadia Tahiri

Each gene has its own evolutionary history which can substantially differ from the evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer or hybridization events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree, or Tree of Life, that represents the main patterns of vertical descent. Here, we present a new efficient method for inferring single or multiple consensus trees and supertrees for a given set of phylogenetic trees (i.e. additive trees or X-trees). The output of the traditional tree consensus methods is a unique consensus tree or supertree. Here, we show how Machine Learning (ML) models, based on some interesting properties of the Robinson and Foulds topological distance, can be used to partition a given set of trees into one (when the data are homogeneous) or multiple (when the data are heterogeneous) cluster(s) of trees. We adapt the popular Accuracy, Precision, Sensitivity, and F1 scores to the tree clustering. A special attention is paid to the relevant, but very challenging, problem of inferring alternative supertrees that are built from phylogenies defined on different, but mutually overlapping, sets of species. The use of an approximate objective function in clustering makes the new method faster than the existing tree clustering techniques and thus suitable for the analysis of large genomic datasets.

2021 ◽  
Author(s):  
Nadia Tahiri ◽  
Bernard Fichet ◽  
Vladimir Makarenkov

AbstractEach gene has its own evolutionary history which can substantially differ from the evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer and recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. The output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree. Here, we describe a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of phylogenetic trees (i.e. additive trees or X-trees). We show how a specific version of the popular k-means clustering algorithm, based on some interesting properties of the Robinson and Foulds topological distance, can be used to partition a given set of trees into one (when the data are homogeneous) or multiple (when the data are heterogeneous) cluster(s) of trees. We adapt the popular Caliński-Harabasz, Silhouette, Ball and Hall, and Gap cluster validity indices to tree clustering with k-means. A special attention is paid to the relevant but very challenging problem of inferring alternative supertrees, built from phylogenies constructed for different, but mutually overlapping, sets of taxa. The use of the Euclidean approximation in the objective function of the method makes it faster than the existing tree clustering techniques, and thus perfectly suitable for the analysis of large genomic datasets. In this study, we apply it to discover alternative supertrees characterizing the main patterns of evolution of SARS-CoV-2 and the related betacoronaviruses.


2006 ◽  
Vol 04 (01) ◽  
pp. 59-74 ◽  
Author(s):  
YING-JUN HE ◽  
TRINH N. D. HUYNH ◽  
JESPER JANSSON ◽  
WING-KIN SUNG

To construct a phylogenetic tree or phylogenetic network for describing the evolutionary history of a set of species is a well-studied problem in computational biology. One previously proposed method to infer a phylogenetic tree/network for a large set of species is by merging a collection of known smaller phylogenetic trees on overlapping sets of species so that no (or as little as possible) branching information is lost. However, little work has been done so far on inferring a phylogenetic tree/network from a specified set of trees when in addition, certain evolutionary relationships among the species are known to be highly unlikely. In this paper, we consider the problem of constructing a phylogenetic tree/network which is consistent with all of the rooted triplets in a given set [Formula: see text] and none of the rooted triplets in another given set [Formula: see text]. Although NP-hard in the general case, we provide some efficient exact and approximation algorithms for a number of biologically meaningful variants of the problem.


GigaScience ◽  
2021 ◽  
Vol 10 (5) ◽  
Author(s):  
Mengni Liu ◽  
Jianyu Chen ◽  
Xin Wang ◽  
Chengwei Wang ◽  
Xiaolong Zhang ◽  
...  

Abstract Background Multi-region sequencing (MRS) has been widely used to analyze intra-tumor heterogeneity (ITH) and cancer evolution. However, comprehensive analysis of mutational data from MRS is still challenging, necessitating complicated integration of a plethora of computational and statistical approaches. Findings Here, we present MesKit, an R/Bioconductor package that can assist in characterizing genetic ITH and tracing the evolutionary history of tumors based on somatic alterations detected by MRS. MesKit provides a wide range of analysis and visualization modules, including ITH evaluation, metastatic route inference, and mutational signature identification. In addition, MesKit implements an auto-layout algorithm to generate phylogenetic trees based on somatic mutations. The application of MesKit for 2 reported MRS datasets of hepatocellular carcinoma and colorectal cancer identified known heterogeneous features and evolutionary patterns, together with potential driver events during cancer evolution. Conclusions In summary, MesKit is useful for interpreting ITH and tracing evolutionary trajectory based on MRS data. MesKit is implemented in R and available at https://bioconductor.org/packages/MesKit under the GPL v3 license.


1993 ◽  
Vol 67 (4) ◽  
pp. 549-570 ◽  
Author(s):  
Bruce S. Lieberman

Phylogenetic parsimony analysis was used to classify the Siegenian–Eifelian “Metacryphaeus group” of the family Calmoniidae. Thirty-eight exoskeletal characters for 16 taxa produced a shortest-length cladogram with a consistency index of 0.49. A classification based on retrieving the structure of this cladogram recognizes nine genera: Typhloniscus Salter, Plesioconvexa n. gen., Punillaspis Baldis and Longobucco, Eldredgeia n. gen., Clarkeaspis n. gen., Malvinocooperella n. gen., Wolfartaspis Cooper, Plesiomalvinella Lieberman, Edgecombe, and Eldredge (used to represent the malvinellid clade), and Metacryphaeus Reed. The malvinellid clade is most closely related to a revised monophyletic Metacryphaeus. Typhloniscus is the basal member of the “Metacryphaeus group,” and the monotypic Wolfartaspis is sister to the clade containing the malvinellids and Metacryphaeus. Six new species are diagnosed: Punillaspis n. sp. A, “Clarkeaspis” gouldi, Clarkeaspis padillaensis, Malvinocooperella pregiganteus, Metacryphaeus curvigena, and Metacryphaeus branisai. Primitively, this group has South African and Andean affinities, and its evolutionary history suggests rapid diversification. In addition, evolutionary patterns in this group, and the distribution of character reversals, call into question certain notions about the nature of adaptive radiations. The distributions of taxa may answer questions about the number of marine transgressive/regressive cycles in the Emsian–Eifelian of the Malvinokaffric Realm.


2009 ◽  
Vol 75 (16) ◽  
pp. 5410-5416 ◽  
Author(s):  
Gabriele Margos ◽  
Stephanie A. Vollmer ◽  
Muriel Cornet ◽  
Martine Garnier ◽  
Volker Fingerle ◽  
...  

ABSTRACT Analysis of Lyme borreliosis (LB) spirochetes, using a novel multilocus sequence analysis scheme, revealed that OspA serotype 4 strains (a rodent-associated ecotype) of Borrelia garinii were sufficiently genetically distinct from bird-associated B. garinii strains to deserve species status. We suggest that OspA serotype 4 strains be raised to species status and named Borrelia bavariensis sp. nov. The rooted phylogenetic trees provide novel insights into the evolutionary history of LB spirochetes.


2020 ◽  
Author(s):  
Yuji Matsuo ◽  
Akinao Nose ◽  
Hiroshi Kohsaka

AbstractSpeed and trajectory of locomotion are characteristic traits of individual species. During evolution, locomotion kinematics is likely to have been tuned for survival in the habitats of each species. Although kinematics of locomotion is thought to be influenced by habitats, the quantitative relation between the kinematics and environmental factors has not been fully revealed. Here, we performed comparative analyses of larval locomotion in 11 Drosophila species. We found that larval locomotion kinematics are divergent among the species. The diversity is not correlated to the body length but is correlated instead to the minimum habitat temperature of the species. Phylogenetic analyses using Bayesian inference suggest that the evolutionary rate of the kinematics is diverse among phylogenetic trees. The results of this study imply that the kinematics of larval locomotion has diverged in the evolutionary history of the genus Drosophila and evolved under the effects of the minimum ambient temperature of habitats.


2020 ◽  
Author(s):  
Sumanth Kumar Mutte ◽  
Dolf Weijers

ABSTRACTProtein oligomerization is a fundamental process to build complex functional modules. Domains that facilitate the oligomerization process are diverse and widespread in nature across all kingdoms of life. One such domain is the Phox and Bem1 (PB1) domain, which is functionally (relatively) well understood in the animal kingdom. However, beyond animals, neither the origin nor the evolutionary patterns of PB1-containing proteins are understood. While PB1 domain proteins have been found in other kingdoms, including plants, it is unclear how these relate to animal PB1 proteins.To address this question, we utilized large transcriptome datasets along with the proteomes of a broad range of species. We discovered eight PB1 domain-containing protein families in plants, along with three each in Protozoa and Chromista and four families in Fungi. Studying the deep evolutionary history of PB1 domains throughout eukaryotes revealed the presence of at least two, but likely three, ancestral PB1 copies in the Last Eukaryotic Common Ancestor (LECA). These three ancestral copies gave rise to multiple orthologues later in evolution. Tertiary structural models of these plant PB1 families, combined with Random Forest based classification, indicated family-specific differences attributed to the length of PB1 domain and the proportion of β-sheets.This study identifies novel PB1 families and reveals considerable complexity in the protein oligomerization potential at the origin of eukaryotes. The newly identified relationships provide an evolutionary basis to understand the diverse functional interactions of key regulatory proteins carrying PB1 domains across eukaryotic life.


Algorithms ◽  
2020 ◽  
Vol 13 (9) ◽  
pp. 225
Author(s):  
Broňa Brejová ◽  
Rastislav Královič

In the reconciliation problem, we are given two phylogenetic trees. A species tree represents the evolutionary history of a group of species, and a gene tree represents the history of a family of related genes within these species. A reconciliation maps nodes of the gene tree to the corresponding points of the species tree, and thus helps to interpret the gene family history. In this paper, we study the case when both trees are unrooted and their edge lengths are known exactly. The goal is to root them and to find a reconciliation that agrees with the edge lengths. We show a linear-time algorithm for finding the set of all possible root locations, which is a significant improvement compared to the previous O(N3logN) algorithm.


Sign in / Sign up

Export Citation Format

Share Document