scholarly journals Deciphering interaction fingerprints from protein molecular surfaces

2019 ◽  
Author(s):  
P Gainza ◽  
F Sverrisson ◽  
F Monti ◽  
E Rodolà ◽  
MM Bronstein ◽  
...  

AbstractPredicting interactions between proteins and other biomolecules purely based on structure is an unsolved problem in biology. A high-level description of protein structure, the molecular surface, displays patterns of chemical and geometric features thatfingerprinta protein’s modes of interactions with other biomolecules. We hypothesize that proteins performing similar interactions may share common fingerprints, independent of their evolutionary history. Fingerprints may be difficult to grasp by visual analysis but could be learned from large-scale datasets. We presentMaSIF, a conceptual framework based on a new geometric deep learning method to capture fingerprints that are important for specific biomolecular interactions. We showcase MaSIF with three prediction challenges: protein pocket-ligand prediction, protein-protein interaction site prediction, and ultrafast scanning of protein surfaces for prediction of protein-protein complexes. We anticipate that our conceptual framework will lead to improvements in our understanding of protein function and design.

2017 ◽  
Author(s):  
Vladimir Gligorijević ◽  
Meet Barot ◽  
Richard Bonneau

AbstractThe prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provide a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that cannot capture complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity.AvailabilitydeepNF is freely available at: https://github.com/VGligorijevic/deepNF


Author(s):  
Caitlyn L. McCafferty ◽  
Edward M. Marcotte ◽  
David W. Taylor

ABSTRACTProtein-protein interactions are critical to protein function, but three-dimensional (3D) arrangements of interacting proteins have proven hard to predict, even given the identities and 3D structures of the interacting partners. Specifically, identifying the relevant pairwise interaction surfaces remains difficult, often relying on shape complementarity with molecular docking while accounting for molecular motions to optimize rigid 3D translations and rotations. However, such approaches can be computationally expensive, and faster, less accurate approximations may prove useful for large-scale prediction and assembly of 3D structures of multi-protein complexes. We asked if a reduced representation of protein geometry retains enough information about molecular properties to predict pairwise protein interaction interfaces that are tolerant of limited structural rearrangements. Here, we describe a cuboid transformation of 3D protein accessible surfaces on which molecular properties such as charge, hydrophobicity, and mutation rate can be easily mapped, implemented in the MorphProt package. Pairs of surfaces are compared to rapidly assess partner-specific potential surface complementarity. On two available benchmarks of 85 overall known protein complexes, we observed F1 scores (a weighted combination of precision and recall) of 19-34% at correctly identifying protein interaction surfaces, comparable to more computationally intensive 3D docking methods in the annual Critical Assessment of PRedicted Interactions. Furthermore, we examined the effect of molecular motion through normal mode simulation on a benchmark receptor-ligand pair and observed no marked loss of predictive accuracy for distortions of up to 6 Å RMSD. Thus, a cuboid transformation of protein surfaces retains considerable information about surface complementarity, offers enhanced speed of comparison relative to more complex geometric representations, and exhibits tolerance to conformational changes.


2014 ◽  
Vol 12 (06) ◽  
pp. 1442004 ◽  
Author(s):  
So Kobiki ◽  
Osamu Maruyama

Many proteins are known to perform their own functions when they form particular groups of proteins, called protein complexes. With the advent of large-scale protein–protein interaction (PPI) studies, it has been a challenging problem in systems biology to predict protein complexes from PPIs. In this paper, we propose a novel method, called Repeated Simulated Annealing of Partitions of Proteins (ReSAPP), which predicts protein complexes from weighted PPIs. ReSAPP, in the first stage, generates multiple (possibly different) partitions of all proteins of given PPIs by repeatedly applying a simulated annealing based optimization algorithm to the PPIs. In the second stage, all different clusters of size two or more in those multiple partitions are merged into a collection of those clusters, which are outputted as predicted protein complexes. In performance comparison of ReSAPP with our previous algorithm, PPSampler2, as well as other various tools, MCL, MCODE, DPClus, CMC, COACH, RRW, NWE, and PPSampler1, ReSAPP is shown to outperform the other methods. Furthermore, the value of F-measure of ReSAPP is higher than that of the variant of ReSAPP without merging partitions. Thus, we empirically conclude that the combination of sampling multiple partitions and merging them is effective to predict protein complexes.


2018 ◽  
Author(s):  
Bianca K Stöcker ◽  
Till Schäfer ◽  
Petra Mutzel ◽  
Johannes Köster ◽  
Nils Kriege ◽  
...  

Being able to quantify the similarity between two protein complexes is essential for numerous applications. Prominent examples are database searches for known complexes with a given query complex, comparison of the output of different protein complex prediction algorithms, or summarizing and clustering protein complexes, e.g., for visualization. While the corresponding problems have received much attention on single proteins and protein families, the question about how to model and compute similarity between protein complexes has not yet been systematically studied. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. Because the connectivity (graph topology) of real complexes is often unknown and hard to obtain experimentally, we use both known protein-protein interaction networks and known interdependencies (constraints) between interactions to simulate more realistic complexes than from interaction networks alone. We empirically show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be much more efficiently computed. It can therefore be used in large-scale studies and simulations and serve as a basis for further refinements of modeling protein complex similarity.


2018 ◽  
Author(s):  
Bianca K Stöcker ◽  
Till Schäfer ◽  
Petra Mutzel ◽  
Johannes Köster ◽  
Nils Kriege ◽  
...  

Being able to quantify the similarity between two protein complexes is essential for numerous applications. Prominent examples are database searches for known complexes with a given query complex, comparison of the output of different protein complex prediction algorithms, or summarizing and clustering protein complexes, e.g., for visualization. While the corresponding problems have received much attention on single proteins and protein families, the question about how to model and compute similarity between protein complexes has not yet been systematically studied. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. Because the connectivity (graph topology) of real complexes is often unknown and hard to obtain experimentally, we use both known protein-protein interaction networks and known interdependencies (constraints) between interactions to simulate more realistic complexes than from interaction networks alone. We empirically show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be much more efficiently computed. It can therefore be used in large-scale studies and simulations and serve as a basis for further refinements of modeling protein complex similarity.


2020 ◽  
Vol 48 (12) ◽  
pp. 6491-6502
Author(s):  
Diogo M Ribeiro ◽  
Alexis Prod’homme ◽  
Adrien Teixeira ◽  
Andreas Zanzoni ◽  
Christine Brun

Abstract Multifunctional proteins often perform their different functions when localized in different subcellular compartments. However, the mechanisms leading to their localization are largely unknown. Recently, 3′UTRs were found to regulate the cellular localization of newly synthesized proteins through the formation of 3′UTR-protein complexes. Here, we investigate the formation of 3′UTR-protein complexes involving multifunctional proteins by exploiting large-scale protein-protein and protein-RNA interaction networks. Focusing on 238 human ‘extreme multifunctional’ (EMF) proteins, we predicted 1411 3′UTR-protein complexes involving 54% of those proteins and evaluated their role in regulating protein cellular localization and multifunctionality. We find that EMF proteins lacking localization addressing signals, yet present at both the nucleus and cell surface, often form 3′UTR-protein complexes, and that the formation of these complexes could provide EMF proteins with the diversity of interaction partners necessary to their multifunctionality. Our findings are reinforced by archetypal moonlighting proteins predicted to form 3′UTR-protein complexes. Finally, the formation of 3′UTR-protein complexes that involves up to 17% of the proteins in the human protein-protein interaction network, may be a common and yet underestimated protein trafficking mechanism, particularly suited to regulate the localization of multifunctional proteins.


2016 ◽  
Vol 113 (25) ◽  
pp. E3501-E3508 ◽  
Author(s):  
Manuel D. Leonetti ◽  
Sayaka Sekine ◽  
Daichi Kamiyama ◽  
Jonathan S. Weissman ◽  
Bo Huang

A central challenge of the postgenomic era is to comprehensively characterize the cellular role of the ∼20,000 proteins encoded in the human genome. To systematically study protein function in a native cellular background, libraries of human cell lines expressing proteins tagged with a functional sequence at their endogenous loci would be very valuable. Here, using electroporation of Cas9 nuclease/single-guide RNA ribonucleoproteins and taking advantage of a split-GFP system, we describe a scalable method for the robust, scarless, and specific tagging of endogenous human genes with GFP. Our approach requires no molecular cloning and allows a large number of cell lines to be processed in parallel. We demonstrate the scalability of our method by targeting 48 human genes and show that the resulting GFP fluorescence correlates with protein expression levels. We next present how our protocols can be easily adapted for the tagging of a given target with GFP repeats, critically enabling the study of low-abundance proteins. Finally, we show that our GFP tagging approach allows the biochemical isolation of native protein complexes for proteomic studies. Taken together, our results pave the way for the large-scale generation of endogenously tagged human cell lines for the proteome-wide analysis of protein localization and interaction networks in a native cellular context.


2017 ◽  
Author(s):  
Colm J. Ryan ◽  
Susan Kennedy ◽  
Ilirjana Bajrami ◽  
David Matallanas ◽  
Christopher J. Lord

SummaryProtein complexes are responsible for the bulk of activities within the cell, but how their behavior and composition varies across tumors remains poorly understood. By combining proteomic profiles of breast tumors with a large-scale protein-protein interaction network, we have identified a set of 258 high-confidence protein complexes whose subunits have highly correlated protein abundance across tumor samples. We used this set to identify complexes that are reproducibly under- or over-expressed in specific breast cancer subtypes. We found that mutation or deletion of one subunit of a complex was often associated with a collateral reduction in protein expression of additional complex members. This collateral loss phenomenon was evident from proteomic, but not transcriptomic, profiles suggesting post-transcriptional control. Mutation of the tumor suppressor E-cadherin (CDH1)was associated with a collateral loss of members of the adherens junction complex, an effect we validated using an engineered model of E-cadherin loss.


2005 ◽  
Vol 34 (2) ◽  
pp. 263-280 ◽  
Author(s):  
Arnaud Droit ◽  
Guy G Poirier ◽  
Joanna M Hunter

An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. One strategy to determine protein function is to identify the protein–protein interactions. The increasing use of high-throughput and large-scale bioinformatics-based studies has generated a massive amount of data stored in a number of different databases. A challenge for bioinformatics is to explore this disparate data and to uncover biologically relevant interactions and pathways. In parallel, there is clearly a need for the development of approaches that can predict novel protein–protein interaction networks in silico. Here, we present an overview of different experimental and bioinformatic methods to elucidate protein–protein interactions.


Sign in / Sign up

Export Citation Format

Share Document