MetaPhOrs 2.0: integrative, phylogeny-based inference of orthology and paralogy across the tree of life

Uciel Chorostecki; Manuel Molina; Leszek P Pryszcz; Toni Gabaldón

doi:10.1093/nar/gkaa282

GeneRax: A tool for species tree-aware maximum likelihood based gene family tree inference under gene duplication, transfer, and loss

10.1101/779066 ◽

2019 ◽

Cited By ~ 3

Author(s):

Benoit Morel ◽

Alexey M. Kozlov ◽

Alexandros Stamatakis ◽

Gergely J. Szöllősi

Keyword(s):

Maximum Likelihood ◽

Phylogenetic Trees ◽

Large Scale ◽

Simulated Data ◽

Gene Families ◽

Species Tree ◽

Homologous Gene ◽

Sequence Alignments ◽

Full Likelihood ◽

True Tree

AbstractInferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges species tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data pre-processing (e.g., computing bootstrap trees), and rely on approximations and heuristics that limit the degree of tree space exploration. Here we present GeneRax, the first maximum likelihood species tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared to competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical datasets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1099 Cyanobacteria families in eight minutes on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax.

Download Full-text

treeWidget: a BioJS component to visualise phylogenetic trees

F1000Research ◽

10.12688/f1000research.3-49.v1 ◽

2014 ◽

Vol 3 ◽

pp. 49 ◽

Cited By ~ 1

Author(s):

Fabian Schreiber

Keyword(s):

Phylogenetic Trees ◽

Protein Domains ◽

Gene Families ◽

Sequence Information ◽

Gene Duplications ◽

Link Type ◽

History Of ◽

Conservation Patterns ◽

Evolution Of Gene Families ◽

The Web

Summary: Phylogenetic trees are widely used to represent the evolution of gene families. As the history of gene families can be complex (including lots of gene duplications), its visualisation can become a difficult task. A good/accurate visualisation of phylogenetic trees - especially on the web - allows easier understanding and interpretation of trees to help to reveal the mechanisms that shape the evolution of a specific set of gene/species. Here, I present treeWidget, a modular BioJS component to visualise phylogenetic trees on the web. Through its modularity, treeWidget can be easily customized to allow the display of sequence information, e.g. protein domains and alignment conservation patterns.Availability: http://github.com/biojs/biojs; http://dx.doi.org/10.5281/zenodo.7707

Download Full-text

GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss

Molecular Biology and Evolution ◽

10.1093/molbev/msaa141 ◽

2020 ◽

Vol 37 (9) ◽

pp. 2763-2774 ◽

Cited By ~ 5

Author(s):

Benoit Morel ◽

Alexey M Kozlov ◽

Alexandros Stamatakis ◽

Gergely J Szöllősi

Keyword(s):

Maximum Likelihood ◽

Phylogenetic Trees ◽

Large Scale ◽

Simulated Data ◽

Gene Families ◽

Species Tree ◽

Homologous Gene ◽

Sequence Alignments ◽

Full Likelihood ◽

True Tree

Abstract Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson–Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).

Download Full-text

GPT: a web-server to map phylogenetic trees on a virtual globe

10.7287/peerj.preprints.840v1 ◽

2015 ◽

Author(s):

Pere Puigbo ◽

Jacqueline M Major

Keyword(s):

Phylogenetic Trees ◽

Web Server ◽

Epidemiological Studies ◽

Google Earth ◽

Web Browser ◽

Virtual Globe ◽

Wide Range ◽

Global Positioning ◽

Minimum Requirements ◽

The Web

GPT (Global Positioning Trees) is a web-server that maps phylogenetic trees on a virtual globe. The minimum requirements are a phylogenetic tree and geographical coordinates of leaves to generate a Keyhole Markup Language (KML) file that can be viewed on Google Earth. An advantage of GPT is the results may be pre-visualized directly on the web. This web-server also implements several tools to display geolocation and geotrack data. GPT has been designed to be an easy-to-use tool to track evolutionary processes and will be useful for phylogeographical and spatial epidemiological studies. It covers a wide-range of visualizations divided in three components increasingly complex: geolocation, geotrack and GPT. This web-server is freely available at http://ppuigbo.me/programs/GPT and only requires Internet access, a web browser, and an earth browser able to read KML files. Several examples and a tutorial are accessible from the web-server’s home page.

Download Full-text

GPT: a web-server to map phylogenetic trees on a virtual globe

10.7287/peerj.preprints.840 ◽

2015 ◽

Author(s):

Pere Puigbo ◽

Jacqueline M Major

Keyword(s):

Phylogenetic Trees ◽

Web Server ◽

Epidemiological Studies ◽

Google Earth ◽

Web Browser ◽

Virtual Globe ◽

Wide Range ◽

Global Positioning ◽

Minimum Requirements ◽

The Web

GPT (Global Positioning Trees) is a web-server that maps phylogenetic trees on a virtual globe. The minimum requirements are a phylogenetic tree and geographical coordinates of leaves to generate a Keyhole Markup Language (KML) file that can be viewed on Google Earth. An advantage of GPT is the results may be pre-visualized directly on the web. This web-server also implements several tools to display geolocation and geotrack data. GPT has been designed to be an easy-to-use tool to track evolutionary processes and will be useful for phylogeographical and spatial epidemiological studies. It covers a wide-range of visualizations divided in three components increasingly complex: geolocation, geotrack and GPT. This web-server is freely available at http://ppuigbo.me/programs/GPT and only requires Internet access, a web browser, and an earth browser able to read KML files. Several examples and a tutorial are accessible from the web-server’s home page.

Download Full-text

APLIKASI PENCARI INFORMASI PERMINTAAN LAYANAN WEB YANG ERROR MENGGUNAKAN ALGORITMA BOYER-MOORE

Unes journal of Information System ◽

10.31933/ujis.1.1.001-010.2016 ◽

2016 ◽

Vol 1 (1) ◽

pp. 001

Author(s):

Harry Setya Hadi

Keyword(s):

Data Storage ◽

Computer Network ◽

Web Server ◽

Access Methods ◽

Ip Address ◽

String Searching ◽

Common Process ◽

Web Server Logs ◽

Different Parts ◽

The Web

String searching is a common process in the processes that made the computer because the text is the main form of data storage. Boyer-Moore is the search string from right to left is considered the most efficient methods in practice, and matching string from the specified direction specifically an algorithm that has the best results theoretically. A system that is connected to a computer network that literally pick a web server that is accessed by multiple users in different parts of both good and bad aim. Any activity performed by the user, will be stored in Web server logs. With a log report contained in the web server can help a web server administrator to search the web request error. Web server log is a record of the activities of a web site that contains the data associated with the IP address, time of access, the page is opened, activities, and access methods. The amount of data contained in the resulting log is a log shed useful information.

Download Full-text

Expression of a gene duplication encoding conserved sperm tail proteins is translationally regulated in Drosophila melanogaster.

Molecular and Cellular Biology ◽

10.1128/mcb.13.3.1708 ◽

1993 ◽

Vol 13 (3) ◽

pp. 1708-1718 ◽

Cited By ~ 36

Author(s):

M Schäfer ◽

D Börsch ◽

A Hülster ◽

U Schäfer

Keyword(s):

Drosophila Melanogaster ◽

Gene Family ◽

Translational Control ◽

Germ Line ◽

Gene Families ◽

Homologous Gene ◽

Male Sterile ◽

Sperm Tail ◽

Repetitive Motif ◽

Carboxy Terminal

We have analyzed a locus of Drosophila melanogaster located at 98C on chromosome 3, which contains two tandemly arranged genes, named Mst98Ca and Mst98Cb. They are two additional members of the Mst(3)CGP gene family by three criteria. (i) Both genes are exclusively transcribed in the male germ line. (ii) Both transcripts encode a protein with a high proportion of the repetitive motif Cys-Gly-Pro. (iii) Their expression is translationally controlled; while transcripts can be detected in diploid stages of spermatogenesis, association with polysomes can be shown only in haploid stages of sperm development. The genes differ markedly from the other members of the gene family in structure; they do not contain introns, they are of much larger size, and they have the Cys-Gly-Pro motifs clustered at the carboxy-terminal end of the encoded proteins. An antibody generated against the Mst98Ca protein recognizes both Mst98C proteins in D. melanogaster. In a male-sterile mutation in which spermiogenesis is blocked before individualization of sperm, both of these proteins are no longer synthesized. This finding provides proof of late translation for the Mst98C proteins and thereby independent proof of translational control of expression. Northern (RNA) and Western immunoblot analyses indicate the presence of homologous gene families in many other Drosophila species. The Mst98C proteins share sequence homology with proteins of the outer dense fibers in mammalian spermatozoa and can be localized to the sperm tail by immunofluorescence with an anti-Mst98Ca antibody.

Download Full-text

SSMBS: a web server to locate sequentially separated motifs in biological sequences

Journal of Applied Crystallography ◽

10.1107/s0021889809047050 ◽

2009 ◽

Vol 43 (1) ◽

pp. 203-205 ◽

Cited By ~ 1

Author(s):

Chetan Kumar ◽

K. Sekar

Keyword(s):

Amino Acids ◽

Web Server ◽

Nucleotide Sequences ◽

Regular Expressions ◽

Biological Sequences ◽

Sequence Motifs ◽

Specific Order ◽

The Web

The identification of sequence (amino acids or nucleotides) motifs in a particular order in biological sequences has proved to be of interest. This paper describes a computing server,SSMBS, which can locate and display the occurrences of user-defined biologically important sequence motifs (a maximum of five) present in a specific order in protein and nucleotide sequences. While the server can efficiently locate motifs specified using regular expressions, it can also find occurrences of long and complex motifs. The computation is carried out by an algorithm developed using the concepts of quantifiers in regular expressions. The web server is available to users around the clock at http://dicsoft1.physics.iisc.ernet.in/ssmbs/.

Download Full-text