Deciphering interaction fingerprints from protein molecular surfaces

deepNF: Deep network fusion for protein function prediction

10.1101/223339 ◽

2017 ◽

Cited By ~ 2

Author(s):

Vladimir Gligorijević ◽

Meet Barot ◽

Richard Bonneau

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Predictive Performance ◽

Substantial Improvement ◽

Function Prediction ◽

Interaction Networks ◽

Highly Nonlinear ◽

High Level ◽

String Networks

AbstractThe prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provide a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that cannot capture complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity.AvailabilitydeepNF is freely available at: https://github.com/VGligorijevic/deepNF

Download Full-text

Simplified geometric representations of protein structures identify complementary interaction interfaces

10.1101/2019.12.18.880575 ◽

2019 ◽

Cited By ~ 1

Author(s):

Caitlyn L. McCafferty ◽

Edward M. Marcotte ◽

David W. Taylor

Keyword(s):

Conformational Changes ◽

Protein Interaction ◽

Protein Function ◽

Large Scale ◽

Predictive Accuracy ◽

Protein Complexes ◽

Protein Structures ◽

Molecular Properties ◽

3D Structures ◽

Geometric Representations

ABSTRACTProtein-protein interactions are critical to protein function, but three-dimensional (3D) arrangements of interacting proteins have proven hard to predict, even given the identities and 3D structures of the interacting partners. Specifically, identifying the relevant pairwise interaction surfaces remains difficult, often relying on shape complementarity with molecular docking while accounting for molecular motions to optimize rigid 3D translations and rotations. However, such approaches can be computationally expensive, and faster, less accurate approximations may prove useful for large-scale prediction and assembly of 3D structures of multi-protein complexes. We asked if a reduced representation of protein geometry retains enough information about molecular properties to predict pairwise protein interaction interfaces that are tolerant of limited structural rearrangements. Here, we describe a cuboid transformation of 3D protein accessible surfaces on which molecular properties such as charge, hydrophobicity, and mutation rate can be easily mapped, implemented in the MorphProt package. Pairs of surfaces are compared to rapidly assess partner-specific potential surface complementarity. On two available benchmarks of 85 overall known protein complexes, we observed F1 scores (a weighted combination of precision and recall) of 19-34% at correctly identifying protein interaction surfaces, comparable to more computationally intensive 3D docking methods in the annual Critical Assessment of PRedicted Interactions. Furthermore, we examined the effect of molecular motion through normal mode simulation on a benchmark receptor-ligand pair and observed no marked loss of predictive accuracy for distortions of up to 6 Å RMSD. Thus, a cuboid transformation of protein surfaces retains considerable information about surface complementarity, offers enhanced speed of comparison relative to more complex geometric representations, and exhibits tolerance to conformational changes.

Download Full-text

ReSAPP: Predicting overlapping protein complexes by merging multiple-sampled partitions of proteins

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720014420049 ◽

2014 ◽

Vol 12 (06) ◽

pp. 1442004 ◽

Cited By ~ 5

Author(s):

So Kobiki ◽

Osamu Maruyama

Keyword(s):

Simulated Annealing ◽

Large Scale ◽

Protein Complexes ◽

Performance Comparison ◽

Challenging Problem ◽

Protein Protein Interaction ◽

Second Stage ◽

Novel Method ◽

Previous Algorithm ◽

F Measure

Many proteins are known to perform their own functions when they form particular groups of proteins, called protein complexes. With the advent of large-scale protein–protein interaction (PPI) studies, it has been a challenging problem in systems biology to predict protein complexes from PPIs. In this paper, we propose a novel method, called Repeated Simulated Annealing of Partitions of Proteins (ReSAPP), which predicts protein complexes from weighted PPIs. ReSAPP, in the first stage, generates multiple (possibly different) partitions of all proteins of given PPIs by repeatedly applying a simulated annealing based optimization algorithm to the PPIs. In the second stage, all different clusters of size two or more in those multiple partitions are merged into a collection of those clusters, which are outputted as predicted protein complexes. In performance comparison of ReSAPP with our previous algorithm, PPSampler2, as well as other various tools, MCL, MCODE, DPClus, CMC, COACH, RRW, NWE, and PPSampler1, ReSAPP is shown to outperform the other methods. Furthermore, the value of F-measure of ReSAPP is higher than that of the variant of ReSAPP without merging partitions. Thus, we empirically conclude that the combination of sampling multiple partitions and merging them is effective to predict protein complexes.

Download Full-text

Protein complex similarity based on Weisfeiler-Lehman labeling

10.7287/peerj.preprints.26612 ◽

2018 ◽

Author(s):

Bianca K Stöcker ◽

Till Schäfer ◽

Petra Mutzel ◽

Johannes Köster ◽

Nils Kriege ◽

...

Keyword(s):

Protein Complex ◽

Large Scale ◽

Protein Complexes ◽

Similarity Measures ◽

Interaction Networks ◽

Protein Protein Interaction ◽

Protein Complex Prediction ◽

Protein Protein Interaction Networks ◽

Human Integrin ◽

Good Agreement

Being able to quantify the similarity between two protein complexes is essential for numerous applications. Prominent examples are database searches for known complexes with a given query complex, comparison of the output of different protein complex prediction algorithms, or summarizing and clustering protein complexes, e.g., for visualization. While the corresponding problems have received much attention on single proteins and protein families, the question about how to model and compute similarity between protein complexes has not yet been systematically studied. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. Because the connectivity (graph topology) of real complexes is often unknown and hard to obtain experimentally, we use both known protein-protein interaction networks and known interdependencies (constraints) between interactions to simulate more realistic complexes than from interaction networks alone. We empirically show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be much more efficiently computed. It can therefore be used in large-scale studies and simulations and serve as a basis for further refinements of modeling protein complex similarity.

Download Full-text

Protein complex similarity based on Weisfeiler-Lehman labeling

10.7287/peerj.preprints.26612v1 ◽

2018 ◽

Author(s):

Bianca K Stöcker ◽

Till Schäfer ◽

Petra Mutzel ◽

Johannes Köster ◽

Nils Kriege ◽

...

Keyword(s):

Protein Complex ◽

Large Scale ◽

Protein Complexes ◽

Similarity Measures ◽

Interaction Networks ◽

Protein Protein Interaction ◽

Protein Complex Prediction ◽

Protein Protein Interaction Networks ◽

Human Integrin ◽

Good Agreement

Being able to quantify the similarity between two protein complexes is essential for numerous applications. Prominent examples are database searches for known complexes with a given query complex, comparison of the output of different protein complex prediction algorithms, or summarizing and clustering protein complexes, e.g., for visualization. While the corresponding problems have received much attention on single proteins and protein families, the question about how to model and compute similarity between protein complexes has not yet been systematically studied. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. Because the connectivity (graph topology) of real complexes is often unknown and hard to obtain experimentally, we use both known protein-protein interaction networks and known interdependencies (constraints) between interactions to simulate more realistic complexes than from interaction networks alone. We empirically show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be much more efficiently computed. It can therefore be used in large-scale studies and simulations and serve as a basis for further refinements of modeling protein complex similarity.

Download Full-text

The role of 3′UTR-protein complexes in the regulation of protein multifunctionality and subcellular localization

Nucleic Acids Research ◽

10.1093/nar/gkaa462 ◽

2020 ◽

Vol 48 (12) ◽

pp. 6491-6502

Author(s):

Diogo M Ribeiro ◽

Alexis Prod’homme ◽

Adrien Teixeira ◽

Andreas Zanzoni ◽

Christine Brun

Keyword(s):

Protein Trafficking ◽

Cellular Localization ◽

Large Scale ◽

Protein Complexes ◽

Interaction Network ◽

Multifunctional Proteins ◽

Protein Protein Interaction ◽

Moonlighting Proteins ◽

Rna Interaction

Abstract Multifunctional proteins often perform their different functions when localized in different subcellular compartments. However, the mechanisms leading to their localization are largely unknown. Recently, 3′UTRs were found to regulate the cellular localization of newly synthesized proteins through the formation of 3′UTR-protein complexes. Here, we investigate the formation of 3′UTR-protein complexes involving multifunctional proteins by exploiting large-scale protein-protein and protein-RNA interaction networks. Focusing on 238 human ‘extreme multifunctional’ (EMF) proteins, we predicted 1411 3′UTR-protein complexes involving 54% of those proteins and evaluated their role in regulating protein cellular localization and multifunctionality. We find that EMF proteins lacking localization addressing signals, yet present at both the nucleus and cell surface, often form 3′UTR-protein complexes, and that the formation of these complexes could provide EMF proteins with the diversity of interaction partners necessary to their multifunctionality. Our findings are reinforced by archetypal moonlighting proteins predicted to form 3′UTR-protein complexes. Finally, the formation of 3′UTR-protein complexes that involves up to 17% of the proteins in the human protein-protein interaction network, may be a common and yet underestimated protein trafficking mechanism, particularly suited to regulate the localization of multifunctional proteins.

Download Full-text

A scalable strategy for high-throughput GFP tagging of endogenous human proteins

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1606731113 ◽

2016 ◽

Vol 113 (25) ◽

pp. E3501-E3508 ◽

Cited By ~ 116

Author(s):

Manuel D. Leonetti ◽

Sayaka Sekine ◽

Daichi Kamiyama ◽

Jonathan S. Weissman ◽

Bo Huang

Keyword(s):

Cell Lines ◽

Human Cell ◽

Protein Function ◽

Large Scale ◽

Protein Complexes ◽

Protein Localization ◽

Human Cell Lines ◽

Guide Rna ◽

Human Genes ◽

Endogenous Loci

A central challenge of the postgenomic era is to comprehensively characterize the cellular role of the ∼20,000 proteins encoded in the human genome. To systematically study protein function in a native cellular background, libraries of human cell lines expressing proteins tagged with a functional sequence at their endogenous loci would be very valuable. Here, using electroporation of Cas9 nuclease/single-guide RNA ribonucleoproteins and taking advantage of a split-GFP system, we describe a scalable method for the robust, scarless, and specific tagging of endogenous human genes with GFP. Our approach requires no molecular cloning and allows a large number of cell lines to be processed in parallel. We demonstrate the scalability of our method by targeting 48 human genes and show that the resulting GFP fluorescence correlates with protein expression levels. We next present how our protocols can be easily adapted for the tagging of a given target with GFP repeats, critically enabling the study of low-abundance proteins. Finally, we show that our GFP tagging approach allows the biochemical isolation of native protein complexes for proteomic studies. Taken together, our results pave the way for the large-scale generation of endogenously tagged human cell lines for the proteome-wide analysis of protein localization and interaction networks in a native cellular context.

Download Full-text

A compendium of co-regulated protein complexes in breast cancer reveals collateral loss events

10.1101/155333 ◽

2017 ◽

Author(s):

Colm J. Ryan ◽

Susan Kennedy ◽

Ilirjana Bajrami ◽

David Matallanas ◽

Christopher J. Lord

Keyword(s):

Breast Cancer ◽

Large Scale ◽

Transcriptional Control ◽

Protein Complexes ◽

Adherens Junction ◽

Interaction Network ◽

Cancer Subtypes ◽

Protein Protein Interaction ◽

Highly Correlated ◽

E Cadherin

SummaryProtein complexes are responsible for the bulk of activities within the cell, but how their behavior and composition varies across tumors remains poorly understood. By combining proteomic profiles of breast tumors with a large-scale protein-protein interaction network, we have identified a set of 258 high-confidence protein complexes whose subunits have highly correlated protein abundance across tumor samples. We used this set to identify complexes that are reproducibly under- or over-expressed in specific breast cancer subtypes. We found that mutation or deletion of one subunit of a complex was often associated with a collateral reduction in protein expression of additional complex members. This collateral loss phenomenon was evident from proteomic, but not transcriptomic, profiles suggesting post-transcriptional control. Mutation of the tumor suppressor E-cadherin (CDH1)was associated with a collateral loss of members of the adherens junction complex, an effect we validated using an engineered model of E-cadherin loss.

Download Full-text

A comprehensive set of protein complexes in yeast: mining large scale protein-protein interaction screens

Bioinformatics ◽

10.1093/bioinformatics/btg344 ◽

2003 ◽

Vol 19 (15) ◽

pp. 1901-1908 ◽

Cited By ~ 22

Author(s):

R. Krause ◽

C. von Mering ◽

P. Bork

Keyword(s):

Protein Interaction ◽

Large Scale ◽

Protein Complexes ◽

Protein Protein Interaction

Download Full-text

Experimental and bioinformatic approaches for interrogating protein–protein interactions to determine protein function

Journal of Molecular Endocrinology ◽

10.1677/jme.1.01693 ◽

2005 ◽

Vol 34 (2) ◽

pp. 263-280 ◽

Cited By ~ 35

Author(s):

Arnaud Droit ◽

Guy G Poirier ◽

Joanna M Hunter

Keyword(s):

Protein Interactions ◽

Protein Function ◽

Large Scale ◽

Protein Protein Interactions ◽

Biologically Relevant ◽

Protein Protein Interaction ◽

Bioinformatic Approaches ◽

Disparate Data ◽

Protein Protein Interaction Networks ◽

Novel Protein

An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. One strategy to determine protein function is to identify the protein–protein interactions. The increasing use of high-throughput and large-scale bioinformatics-based studies has generated a massive amount of data stored in a number of different databases. A challenge for bioinformatics is to explore this disparate data and to uncover biologically relevant interactions and pathways. In parallel, there is clearly a need for the development of approaches that can predict novel protein–protein interaction networks in silico. Here, we present an overview of different experimental and bioinformatic methods to elucidate protein–protein interactions.

Download Full-text