scholarly journals MetaPhOrs 2.0: integrative, phylogeny-based inference of orthology and paralogy across the tree of life

2020 ◽  
Vol 48 (W1) ◽  
pp. W553-W557 ◽  
Author(s):  
Uciel Chorostecki ◽  
Manuel Molina ◽  
Leszek P Pryszcz ◽  
Toni Gabaldón

Abstract Inferring homology relationships across genes in different species is a central task in comparative genomics. Therefore, a large number of resources and methods have been developed over the years. Some public databases include phylogenetic trees of homologous gene families which can be used to further differentiate homology relationships into orthology and paralogy. MetaPhOrs is a web server that integrates phylogenetic information from different sources to provide orthology and paralogy relationships based on a common phylogeny-based predictive algorithm and associated with a consistency-based confidence score. Here we describe the latest version of the web server which includes major new implementations and provides orthology and paralogy relationships derived from ∼8.2 million gene family trees—from 13 different source repositories across ∼4000 species with sequenced genomes. MetaPhOrs server is freely available, without registration, at http://orthology.phylomedb.org/

2019 ◽  
Author(s):  
Benoit Morel ◽  
Alexey M. Kozlov ◽  
Alexandros Stamatakis ◽  
Gergely J. Szöllősi

AbstractInferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges species tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data pre-processing (e.g., computing bootstrap trees), and rely on approximations and heuristics that limit the degree of tree space exploration. Here we present GeneRax, the first maximum likelihood species tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared to competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical datasets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1099 Cyanobacteria families in eight minutes on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax.


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 49 ◽  
Author(s):  
Fabian Schreiber

Summary: Phylogenetic trees are widely used to represent the evolution of gene families. As the history of gene families can be complex (including lots of gene duplications), its visualisation can become a difficult task. A good/accurate visualisation of phylogenetic trees - especially on the web - allows easier understanding and interpretation of trees to help to reveal the mechanisms that shape the evolution of a specific set of gene/species. Here, I present treeWidget, a modular BioJS component to visualise phylogenetic trees on the web. Through its modularity, treeWidget can be easily customized to allow the display of sequence information, e.g. protein domains and alignment conservation patterns.Availability: http://github.com/biojs/biojs; http://dx.doi.org/10.5281/zenodo.7707


2020 ◽  
Vol 37 (9) ◽  
pp. 2763-2774 ◽  
Author(s):  
Benoit Morel ◽  
Alexey M Kozlov ◽  
Alexandros Stamatakis ◽  
Gergely J Szöllősi

Abstract Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson–Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).  


2015 ◽  
Author(s):  
Pere Puigbo ◽  
Jacqueline M Major

GPT (Global Positioning Trees) is a web-server that maps phylogenetic trees on a virtual globe. The minimum requirements are a phylogenetic tree and geographical coordinates of leaves to generate a Keyhole Markup Language (KML) file that can be viewed on Google Earth. An advantage of GPT is the results may be pre-visualized directly on the web. This web-server also implements several tools to display geolocation and geotrack data. GPT has been designed to be an easy-to-use tool to track evolutionary processes and will be useful for phylogeographical and spatial epidemiological studies. It covers a wide-range of visualizations divided in three components increasingly complex: geolocation, geotrack and GPT. This web-server is freely available at http://ppuigbo.me/programs/GPT and only requires Internet access, a web browser, and an earth browser able to read KML files. Several examples and a tutorial are accessible from the web-server’s home page.


2015 ◽  
Author(s):  
Pere Puigbo ◽  
Jacqueline M Major

GPT (Global Positioning Trees) is a web-server that maps phylogenetic trees on a virtual globe. The minimum requirements are a phylogenetic tree and geographical coordinates of leaves to generate a Keyhole Markup Language (KML) file that can be viewed on Google Earth. An advantage of GPT is the results may be pre-visualized directly on the web. This web-server also implements several tools to display geolocation and geotrack data. GPT has been designed to be an easy-to-use tool to track evolutionary processes and will be useful for phylogeographical and spatial epidemiological studies. It covers a wide-range of visualizations divided in three components increasingly complex: geolocation, geotrack and GPT. This web-server is freely available at http://ppuigbo.me/programs/GPT and only requires Internet access, a web browser, and an earth browser able to read KML files. Several examples and a tutorial are accessible from the web-server’s home page.


2016 ◽  
Vol 1 (1) ◽  
pp. 001
Author(s):  
Harry Setya Hadi

String searching is a common process in the processes that made the computer because the text is the main form of data storage. Boyer-Moore is the search string from right to left is considered the most efficient methods in practice, and matching string from the specified direction specifically an algorithm that has the best results theoretically. A system that is connected to a computer network that literally pick a web server that is accessed by multiple users in different parts of both good and bad aim. Any activity performed by the user, will be stored in Web server logs. With a log report contained in the web server can help a web server administrator to search the web request error. Web server log is a record of the activities of a web site that contains the data associated with the IP address, time of access, the page is opened, activities, and access methods. The amount of data contained in the resulting log is a log shed useful information.


1993 ◽  
Vol 13 (3) ◽  
pp. 1708-1718 ◽  
Author(s):  
M Schäfer ◽  
D Börsch ◽  
A Hülster ◽  
U Schäfer

We have analyzed a locus of Drosophila melanogaster located at 98C on chromosome 3, which contains two tandemly arranged genes, named Mst98Ca and Mst98Cb. They are two additional members of the Mst(3)CGP gene family by three criteria. (i) Both genes are exclusively transcribed in the male germ line. (ii) Both transcripts encode a protein with a high proportion of the repetitive motif Cys-Gly-Pro. (iii) Their expression is translationally controlled; while transcripts can be detected in diploid stages of spermatogenesis, association with polysomes can be shown only in haploid stages of sperm development. The genes differ markedly from the other members of the gene family in structure; they do not contain introns, they are of much larger size, and they have the Cys-Gly-Pro motifs clustered at the carboxy-terminal end of the encoded proteins. An antibody generated against the Mst98Ca protein recognizes both Mst98C proteins in D. melanogaster. In a male-sterile mutation in which spermiogenesis is blocked before individualization of sperm, both of these proteins are no longer synthesized. This finding provides proof of late translation for the Mst98C proteins and thereby independent proof of translational control of expression. Northern (RNA) and Western immunoblot analyses indicate the presence of homologous gene families in many other Drosophila species. The Mst98C proteins share sequence homology with proteins of the outer dense fibers in mammalian spermatozoa and can be localized to the sperm tail by immunofluorescence with an anti-Mst98Ca antibody.


2009 ◽  
Vol 43 (1) ◽  
pp. 203-205 ◽  
Author(s):  
Chetan Kumar ◽  
K. Sekar

The identification of sequence (amino acids or nucleotides) motifs in a particular order in biological sequences has proved to be of interest. This paper describes a computing server,SSMBS, which can locate and display the occurrences of user-defined biologically important sequence motifs (a maximum of five) present in a specific order in protein and nucleotide sequences. While the server can efficiently locate motifs specified using regular expressions, it can also find occurrences of long and complex motifs. The computation is carried out by an algorithm developed using the concepts of quantifiers in regular expressions. The web server is available to users around the clock at http://dicsoft1.physics.iisc.ernet.in/ssmbs/.


Sign in / Sign up

Export Citation Format

Share Document