Optimal Network Alignment with Graphlet Degree Vectors

Cancer Informatics ◽

10.4137/cin.s4744 ◽

2010 ◽

Vol 9 ◽

pp. CIN.S4744 ◽

Cited By ~ 95

Author(s):

Tijana Milenković ◽

Weng Leong Ng ◽

Wayne Hayes ◽

NatašA PržUlj

Keyword(s):

Cost Function ◽

Network Topology ◽

Biological Networks ◽

Phylogenetic Trees ◽

Biological Function ◽

Network Alignment ◽

New Method ◽

Biological Information ◽

Global Alignment ◽

Sequence Alignments

Important biological information is encoded in the topology of biological networks. Comparative analyses of biological networks are proving to be valuable, as they can lead to transfer of knowledge between species and give deeper insights into biological function, disease, and evolution. We introduce a new method that uses the Hungarian algorithm to produce optimal global alignment between two networks using any cost function. We design a cost function based solely on network topology and use it in our network alignment. Our method can be applied to any two networks, not just biological ones, since it is based only on network topology. We use our new method to align protein-protein interaction networks of two eukaryotic species and demonstrate that our alignment exposes large and topologically complex regions of network similarity. At the same time, our alignment is biologically valid, since many of the aligned protein pairs perform the same biological function. From the alignment, we predict function of yet unannotated proteins, many of which we validate in the literature. Also, we apply our method to find topological similarities between metabolic networks of different species and build phylogenetic trees based on our network alignment score. The phylogenetic trees obtained in this way bear a striking resemblance to the ones obtained by sequence alignments. Our method detects topologically similar regions in large networks that are statistically significant. It does this independent of protein sequence or any other information external to network topology.

Download Full-text

Topological network alignment uncovers biological function and phylogeny

Journal of The Royal Society Interface ◽

10.1098/rsif.2010.0063 ◽

2010 ◽

Vol 7 (50) ◽

pp. 1341-1354 ◽

Cited By ~ 181

Author(s):

Oleksii Kuchaiev ◽

Tijana Milenković ◽

Vesna Memišević ◽

Wayne Hayes ◽

Nataša Pržulj

Keyword(s):

Network Topology ◽

Biological Networks ◽

Biological Function ◽

Protein Interaction Networks ◽

Phylogenetic Information ◽

Protein Protein Interaction ◽

Species Phylogeny ◽

Topological Network ◽

Protein Protein Interaction Networks ◽

Life On Earth

Sequence comparison and alignment has had an enormous impact on our understanding of evolution, biology and disease. Comparison and alignment of biological networks will probably have a similar impact. Existing network alignments use information external to the networks, such as sequence, because no good algorithm for purely topological alignment has yet been devised. In this paper, we present a novel algorithm based solely on network topology, that can be used to align any two networks. We apply it to biological networks to produce by far the most complete topological alignments of biological networks to date. We demonstrate that both species phylogeny and detailed biological function of individual proteins can be extracted from our alignments. Topology-based alignments have the potential to provide a completely new, independent source of phylogenetic information. Our alignment of the protein–protein interaction networks of two very different species—yeast and human—indicate that even distant species share a surprising amount of network topology, suggesting broad similarities in internal cellular wiring across all life on Earth.

Download Full-text

Boosting alignment accuracy through adaptive local realignment

10.1101/063131 ◽

2016 ◽

Cited By ~ 1

Author(s):

Dan DeBlasio ◽

John Kececioglu

Keyword(s):

New Method ◽

Alignment Accuracy ◽

Global Alignment ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Local Realignment ◽

First Time ◽

Final Alignment

AbstractMotivationWhile mutation rates can vary across the residues of a protein, when computing alignments of protein sequences the same setting of values for substitution score and gap penalty parameters is typically used across their entire length. We provide for the first time a new method called adaptive local realignment that automatically uses diverse parameter settings in different regions of the input sequences when computing multiple sequence alignments. This allows parameter settings to adapt to more closely match the local mutation rate across a protein.MethodOur method builds on our prior work on global alignment parameter advising with the Facet alignment accuracy estimator. Given a computed alignment, in each region that has low estimated accuracy, a collection of candidate realignments is generated using a precomputed set of alternate parameter settings. If one of these alternate realignments has higher estimated accuracy than the original subalignment, the region is replaced with the new realignment, and the concatenation of these realigned regions forms the final alignment that is output.ResultsAdaptive local realignment significantly improves the quality of alignments over using the single best default parameter setting. In particular, this new method of local advising, when combined with prior methods for global advising, boosts alignment accuracy by as much as 26% over the best default setting on hard-to-align benchmarks (and by 6.4% over using global advising alone).AvailabilityA new version of the Opal multiple sequence aligner that incorporates adaptive local realignment using Facet for parameter advising, is available free for non-commercial use at http://[email protected]

Download Full-text

Faculty Opinions recommendation of Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1163036.623685 ◽

2009 ◽

Author(s):

Oliver Pybus

Keyword(s):

Phylogenetic Trees ◽

Large Scale ◽

Sequence Alignments

Download Full-text

A new method for reference network considering network topology optimization

2017 Chinese Automation Congress (CAC) ◽

10.1109/cac.2017.8242961 ◽

2017 ◽

Author(s):

Zhongfu Jiang ◽

Donglei Sun ◽

Long Zhao ◽

Xiaoming Liu ◽

Shan Li ◽

...

Keyword(s):

Topology Optimization ◽

Network Topology ◽

New Method ◽

Reference Network ◽

Network Topology Optimization

Download Full-text

GeneRax: A tool for species tree-aware maximum likelihood based gene family tree inference under gene duplication, transfer, and loss

10.1101/779066 ◽

2019 ◽

Cited By ~ 3

Author(s):

Benoit Morel ◽

Alexey M. Kozlov ◽

Alexandros Stamatakis ◽

Gergely J. Szöllősi

Keyword(s):

Maximum Likelihood ◽

Phylogenetic Trees ◽

Large Scale ◽

Simulated Data ◽

Gene Families ◽

Species Tree ◽

Homologous Gene ◽

Sequence Alignments ◽

Full Likelihood ◽

True Tree

AbstractInferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges species tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data pre-processing (e.g., computing bootstrap trees), and rely on approximations and heuristics that limit the degree of tree space exploration. Here we present GeneRax, the first maximum likelihood species tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared to competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical datasets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1099 Cyanobacteria families in eight minutes on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax.

Download Full-text

Is cytochrome c oxidase subunit I (COI) the right DNA barcoding marker for the Chaetopteryx villosa group?

ARPHA Conference Abstracts ◽

10.3897/aca.4.e64707 ◽

2021 ◽

Vol 4 ◽

Author(s):

Dalila Destanović ◽

Lejla Ušanović ◽

Lejla Lasić ◽

Jasna Hanjalić ◽

Belma Kalamujić Stroil

Keyword(s):

Phylogenetic Trees ◽

Pairwise Distance ◽

Zoological Museum ◽

Species Determination ◽

Sequence Alignments ◽

Multiple Sequence ◽

Software Analysis ◽

Group Data ◽

Species Specific ◽

The Right

Chaetopteryx villosa (Fabricius, 1798) is a caddisfly species distributed throughout Europe, except in the Balkan and Apennine Peninsula. However, phylogenetically close species belonging to the C. villosa group are widespread throughout entire Europe. Species of this group (C. villosa, C. gessneri, C. fusca, C. sahlbergi, C. atlantica, C. bosniaca, C. vulture, and C. trinacriae) have distinct distributions with some overlaps. Adult forms of these species are morphologically similar, whereas larval morphology is only known for some species. There are also indications of species hybridization (e.g., C. villosa x fusca). Presumably, the molecular approach for the species determination of this group would be highly beneficial. In the BOLD database, there are 154 specimens with COI-5P barcodes of C. villosa species. Out of the remaining species, C. sahlbergi has 27 specimens with a barcode, C. fusca 20, C. gessneri 5, C. bosniaca 5, and C. atlantica 1, whereas sequences from the species C. vulture and C. trinacriae are missing. Therefore, we tested the power of discrimination of the COI-5P marker in the C. villosa group, as the most common barcoding markers for species identification in animals. Only sequences from public records originating from experienced research groups or taxonomists and containing a specimen photograph were taken as input. A total of 75 sequences from the BOLD database were obtained. Out of these sequences, 11 belonged to C. fusca, 5 to C. gessneri, 52 to C. villosa, 5 to C. bosniaca, and 2 to C. sahlbergi. For the generation of overview trees, COI-5P barcodes of Rhyacophila fasciata and Rh. nubila were used as outgroups. All sequences were trimmed at 5’ and 3’ ends, resulting in a final alignment length of 516 base pairs. Multiple sequence alignments and editing were done in the MEGA-X software. Analysis of nucleotide polymorphism was done in DNASP6 software. MEGA-X was used to calculate the pairwise distance and overall mean p-distance, and to construct the overview trees. Analysis of DNA polymorphism revealed 14 haplotypes of C. villosa, 3 haplotypes of C. fusca, 2 haplotypes of C. gessneri, and one for species C. bosniaca and C. sahlbergi. There were no significant interspecific and intraspecific differences among haplotypes based on pairwise distances. The p-distance between one of the haplotypes of C. fusca and C. villosa was 0.000, whereas the p-distance among haplotypes of C. villosa varied from 0.001 to about 0.055. The mean overall p-distance among haplotypes of all species equaled 0.03. No species-specific clusters were observed when phylogenetic trees were constructed except for C. gessneri, regardless of the method used (i.e., NJ, UPGMA, ML, ME, or MP). To minimize the possibility of species misidentification, we used only records submitted by NTNU-Norwegian University of Science and Technology (Norway), SNSB-Zoologische Staatssammlung Muenchen (Germany), Zoologisches Forschungsmuseum Alexander Koenig (Germany), University of Oulu, Zoological Museum (Finland), prof Hans Malicky and prof Mladen Kučinić. No records identified as hybrids were included in the analyses. With the exception of C. gessneri, COI-5P marker failed to separate the species of the C. villosa group. However, it is highly unlikely that poor species determination was the basis for such a result. To enable the comprehensive and unbiased evaluation of the relationships within this group, data coverage in BOLD database for most of the studied species should be enhanced, encompassing different geographical distribution of samples. Further studies are needed to detect the array of molecular markers suitable for the species delineation in a complex group such as C. villosa.

Download Full-text

A review of protein–protein interaction network alignment: From pathway comparison to global alignment

Computational and Structural Biotechnology Journal ◽

10.1016/j.csbj.2020.09.011 ◽

2020 ◽

Vol 18 ◽

pp. 2647-2656

Author(s):

Cheng-Yu Ma ◽

Chung-Shou Liao

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Interaction Network ◽

Network Alignment ◽

Global Alignment ◽

Protein Protein Interaction ◽

Protein Protein Interaction Network

Download Full-text

An Improved Method for Completely Uncertain Biological Network Alignment

BioMed Research International ◽

10.1155/2015/253854 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Bin Shen ◽

Muwei Zhao ◽

Wei Zhong ◽

Jieyue He

Keyword(s):

Biological Networks ◽

Biological Network ◽

Evaluation Criteria ◽

Network Alignment ◽

Global Network ◽

Uncertain Information ◽

Improved Method ◽

Network Comparison ◽

Probabilistic Network ◽

Alignment Problem

With the continuous development of biological experiment technology, more and more data related to uncertain biological networks needs to be analyzed. However, most of current alignment methods are designed for the deterministic biological network. Only a few can solve the probabilistic network alignment problem. However, these approaches only use the part of probabilistic data in the original networks allowing only one of the two networks to be probabilistic. To overcome the weakness of current approaches, an improved method called completely probabilistic biological network comparison alignment (C_PBNA) is proposed in this paper. This new method is designed for complete probabilistic biological network alignment based on probabilistic biological network alignment (PBNA) in order to take full advantage of the uncertain information of biological network. The degree of consistency (agreement) indicates that C_PBNA can find the results neglected by PBNA algorithm. Furthermore, the GO consistency (GOC) and global network alignment score (GNAS) have been selected as evaluation criteria, and all of them proved that C_PBNA can obtain more biologically significant results than those of PBNA algorithm.

Download Full-text

Biological function through network topology: a survey of the human diseasome

Briefings in Functional Genomics ◽

10.1093/bfgp/els037 ◽

2012 ◽

Vol 11 (6) ◽

pp. 522-532 ◽

Cited By ~ 29

Author(s):

V. Janjic ◽

N. Przulj

Keyword(s):

Network Topology ◽

Biological Function

Download Full-text

GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss

Molecular Biology and Evolution ◽

10.1093/molbev/msaa141 ◽

2020 ◽

Vol 37 (9) ◽

pp. 2763-2774 ◽

Cited By ~ 5

Author(s):

Benoit Morel ◽

Alexey M Kozlov ◽

Alexandros Stamatakis ◽

Gergely J Szöllősi

Keyword(s):

Maximum Likelihood ◽

Phylogenetic Trees ◽

Large Scale ◽

Simulated Data ◽

Gene Families ◽

Species Tree ◽

Homologous Gene ◽

Sequence Alignments ◽

Full Likelihood ◽

True Tree

Abstract Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson–Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).

Download Full-text