Protein-protein docking using learned three-dimensional representations

Mapping Intimacies ◽

10.1101/738690 ◽

2019 ◽

Cited By ~ 1

Author(s):

Georgy Derevyanko ◽

Guillaume Lamoureux

Keyword(s):

Protein Interactions ◽

Network Architecture ◽

Protein Complexes ◽

Three Dimensional ◽

Spatial Arrangement ◽

Protein Docking ◽

Protein Protein Interactions ◽

Translational Invariance ◽

Shape Complementarity ◽

Spatial Features

AbstractProtein-protein interactions are determined by a number of hard-to-capture features related to shape complementarity, electrostatics, and hydrophobicity. These features may be intrinsic to the protein or induced by the presence of a partner. A conventional approach to protein-protein docking consists in engineering a small number of spatial features for each protein, and in minimizing the sum of their correlations with respect to the spatial arrangement of the two proteins. To generalize this approach, we introduce a deep neural network architecture that transforms the raw atomic densities of each protein into complex three-dimensional representations. Each point in the volume containing the protein is described by 48 learned features, which are correlated and combined with the features of a second protein to produce a score dependent on the relative position and orientation of the two proteins. The architecture is based on multiple layers of SE(3)-equivariant convolutional neural networks, which provide built-in rotational and translational invariance of the score with respect to the structure of the complex. The model is trained end-to-end on a set of decoy conformations generated from 851 nonredundant protein-protein complexes and is tested on data from the Protein-Protein Docking Benchmark Version 4.0.

Download Full-text

Mass spectrometry-based cross-linking study shows that the Psb28 protein binds to cytochrome b559 in Photosystem II

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1620360114 ◽

2017 ◽

Vol 114 (9) ◽

pp. 2224-2229 ◽

Cited By ~ 18

Author(s):

Daniel A. Weisz ◽

Haijun Liu ◽

Hao Zhang ◽

Sundarapandian Thangapandian ◽

Emad Tajkhorshid ◽

...

Keyword(s):

Mass Spectrometry ◽

Photosystem Ii ◽

Protein Interactions ◽

Protein Complexes ◽

Protein Docking ◽

Cross Linking ◽

Protein Protein Interactions ◽

Cytochrome B559 ◽

Reaction Center Complex ◽

Assembly Intermediate

Photosystem II (PSII), a large pigment protein complex, undergoes rapid turnover under natural conditions. During assembly of PSII, oxidative damage to vulnerable assembly intermediate complexes must be prevented. Psb28, the only cytoplasmic extrinsic protein in PSII, protects the RC47 assembly intermediate of PSII and assists its efficient conversion into functional PSII. Its role is particularly important under stress conditions when PSII damage occurs frequently. Psb28 is not found, however, in any PSII crystal structure, and its structural location has remained unknown. In this study, we used chemical cross-linking combined with mass spectrometry to capture the transient interaction of Psb28 with PSII. We detected three cross-links between Psb28 and the α- and β-subunits of cytochrome b559, an essential component of the PSII reaction-center complex. These distance restraints enable us to position Psb28 on the cytosolic surface of PSII directly above cytochrome b559, in close proximity to the QB site. Protein–protein docking results also support Psb28 binding in this region. Determination of the Psb28 binding site and other biochemical evidence allow us to propose a mechanism by which Psb28 exerts its protective effect on the RC47 intermediate. This study also shows that isotope-encoded cross-linking with the “mass tags” selection criteria allows confident identification of more cross-linked peptides in PSII than has been previously reported. This approach thus holds promise to identify other transient protein–protein interactions in membrane protein complexes.

Download Full-text

Pushing the accuracy limit of shape complementarity for protein-protein docking

BMC Bioinformatics ◽

10.1186/s12859-019-3270-y ◽

2019 ◽

Vol 20 (S25) ◽

Cited By ~ 8

Author(s):

Yumeng Yan ◽

Sheng-You Huang

Keyword(s):

Success Rate ◽

Protein Interactions ◽

Shape Representation ◽

Scoring Function ◽

Protein Docking ◽

Protein Protein Interactions ◽

Second Best ◽

Shape Complementarity ◽

Docking Program ◽

Docking Approach

Abstract Background Protein-protein docking is a valuable computational approach for investigating protein-protein interactions. Shape complementarity is the most basic component of a scoring function and plays an important role in protein-protein docking. Despite significant progresses, shape representation remains an open question in the development of protein-protein docking algorithms, especially for grid-based docking approaches. Results We have proposed a new pairwise shape-based scoring function (LSC) for protein-protein docking which adopts an exponential form to take into account long-range interactions between protein atoms. The LSC scoring function was incorporated into our FFT-based docking program and evaluated for both bound and unbound docking on the protein docking benchmark 4.0. It was shown that our LSC achieved a significantly better performance than four other similar docking methods, ZDOCK 2.1, MolFit/G, GRAMM, and FTDock/G, in both success rate and number of hits. When considering the top 10 predictions, LSC obtained a success rate of 51.71% and 6.82% for bound and unbound docking, respectively, compared to 42.61% and 4.55% for the second-best program ZDOCK 2.1. LSC also yielded an average of 8.38 and 3.94 hits per complex in the top 1000 predictions for bound and unbound docking, respectively, followed by 6.38 and 2.96 hits for the second-best ZDOCK 2.1. Conclusions The present LSC method will not only provide an initial-stage docking approach for post-docking processes but also have a general implementation for accurate representation of other energy terms on grids in protein-protein docking. The software has been implemented in our HDOCK web server at http://hdock.phys.hust.edu.cn/.

Download Full-text

Inferring interaction partners from protein sequences

10.1101/050732 ◽

2016 ◽

Cited By ~ 2

Author(s):

Anne-Florence Bitbol ◽

Robert S. Dwyer ◽

Lucy J. Colwell ◽

Ned S. Wingreen

Keyword(s):

Protein Interactions ◽

Sequence Data ◽

Protein Complexes ◽

A Priori ◽

Specific Interaction ◽

Three Dimensional ◽

Specific Protein ◽

Protein Protein Interactions ◽

Entropy Model ◽

Interaction Partners

Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multi-protein complexes, and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners. Hence, the sequences of interacting partners are correlated. Here we exploit these correlations to accurately identify which proteins are specific interaction partners from sequence data alone. Our general approach, which employs a pairwise maximum entropy model to infer direct couplings between residues, has been successfully used to predict the three-dimensional structures of proteins from sequences. Building on this approach, we introduce an iterative algorithm to predict specific interaction partners from among the members of two protein families. We assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. The algorithm proves successful without any a priori knowledge of interaction partners, yielding a striking 0.93 true positive fraction on our complete dataset, and we uncover the origin of this surprising success. Finally, we discuss how our method could be used to predict novel protein-protein interactions.

Download Full-text

Subcellular micropatterning for visual immunoprecipitation reveals differences in cytosolic protein complexes downstream the EGFR

10.1101/2021.05.25.445547 ◽

2021 ◽

Author(s):

Roland Hager ◽

Ulrike Mueller ◽

Nicole Ollinger ◽

Julian Weghuber ◽

Peter Lanzerstorfer

Keyword(s):

Protein Interactions ◽

Protein Complexes ◽

Growth Factor Receptor ◽

Spatial Arrangement ◽

Spatiotemporal Analysis ◽

Protein Domain ◽

Protein Protein Interactions ◽

Cytosolic Protein ◽

Bait Construct ◽

Biochemical Analyses

Analysis of protein-protein interactions in living cells by protein micropatterning is currently limited to the spatial arrangement of transmembrane proteins and their corresponding downstream molecules. Here we present a robust method for visual immunoprecipitation of cytosolic protein complexes by use of an artificial transmembrane bait construct in combination with micropatterned antibody arrays on cyclic olefin polymer (COP) substrates. The method was used to characterize Grb2-mediated signalling pathways downstream the epidermal growth factor receptor (EGFR). Ternary protein complexes (Shc1:Grb2:SOS1 and Grb2:Gab1:PI3K) were identified and we found that EGFR downstream signalling is based on constitutively bound (Grb2:SOS1 and Grb2:Gab1) as well as on agonist-dependent protein associations with transient interaction properties (Grb2:Shc1 and Grb2:PI3K). Spatiotemporal analysis further revealed significant differences in stability and exchange kinetics of protein interactions. Furthermore, we could show that this approach is well suited to study the efficacy and specificity of SH2 and SH3 protein domain inhibitors in a live cell context. Altogether, this method represents a significant enhancement of quantitative subcellular micropatterning approaches as an alternative to standard biochemical analyses.

Download Full-text

Inferring interaction partners from protein sequences using mutual information

10.1101/378042 ◽

2018 ◽

Author(s):

Anne-Florence Bitbol

Keyword(s):

Mutual Information ◽

Protein Interactions ◽

Sequence Data ◽

Protein Complexes ◽

Three Dimensional ◽

Specific Protein ◽

Amino Acid Residues ◽

Protein Protein Interactions ◽

Cellular Processes ◽

Interaction Partners

AbstractSpecific protein-protein interactions are crucial in most cellular processes. They enable multiprotein complexes to assemble and to remain stable, and they allow signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interacting partners, and thus in correlations between their sequences. Pairwise maximum-entropy based models have enabled successful inference of pairs of amino-acid residues that are in contact in the three-dimensional structure of multi-protein complexes, starting from the correlations in the sequence data of known interaction partners. Recently, algorithms inspired by these methods have been developed to identify which proteins are specific interaction partners among the paralogous proteins of two families, starting from sequence data alone. Here, we demonstrate that a slightly higher performance for partner identification can be reached by an approximate maximization of the mutual information between the sequence alignments of the two protein families. This stands in contrast with structure prediction of proteins and of multiprotein complexes from sequence data, where pairwise maximum-entropy based global statistical models substantially improve performance compared to mutual information. Our findings entail that the statistical dependences allowing interaction partner prediction from sequence data are not restricted to the residue pairs that are in direct contact at the interface between the partner proteins.Author summarySpecific protein-protein interactions are at the heart of most intra-cellular processes. Mapping these interactions is thus crucial to a systems-level understanding of cells, and has broad applications to areas such as drug targeting. Systematic experimental identification of protein interaction partners is still challenging. However, a large and rapidly growing amount of sequence data is now available. Recently, algorithms have been proposed to identify which proteins interact from their sequences alone, thanks to the co-variation of the sequences of interacting proteins. These algorithms build upon inference methods that have been used with success to predict the three-dimensional structures of proteins and multi-protein complexes, and their focus is on the amino-acid residues that are in direct contact. Here, we propose a simpler method to identify which proteins interact among the paralogous proteins of two families, starting from their sequences alone. Our method relies on an approximate maximization of mutual information between the sequences of the two families, without specifically emphasizing the contacting residue pairs. We demonstrate that this method slightly outperforms the earlier one. This result highlights that partner prediction does not only rely on the identities and interactions of directly contacting amino-acids.

Download Full-text

Protein Interaction Z Score Assessment (PIZSA): an empirical scoring scheme for evaluation of protein–protein interactions

Nucleic Acids Research ◽

10.1093/nar/gkz368 ◽

2019 ◽

Vol 47 (W1) ◽

pp. W331-W337 ◽

Cited By ~ 3

Author(s):

Ankit A Roy ◽

Abhilesh S Dhawanjewar ◽

Parichit Sharma ◽

Gulzar Singh ◽

M S Madhusudhan

Keyword(s):

Protein Interactions ◽

Protein Complexes ◽

Three Dimensional ◽

Optimal Number ◽

Dimensional Structure ◽

Side Chain ◽

Interface Residue ◽

Z Score ◽

Protein Protein Interactions ◽

Residue Contacts

Abstract Our web server, PIZSA (http://cospi.iiserpune.ac.in/pizsa), assesses the likelihood of protein–protein interactions by assigning a Z Score computed from interface residue contacts. Our score takes into account the optimal number of atoms that mediate the interaction between pairs of residues and whether these contacts emanate from the main chain or side chain. We tested the score on 174 native interactions for which 100 decoys each were constructed using ZDOCK. The native structure scored better than any of the decoys in 146 cases and was able to rank within the 95th percentile in 162 cases. This easily outperforms a competing method, CIPS. We also benchmarked our scoring scheme on 15 targets from the CAPRI dataset and found that our method had results comparable to that of CIPS. Further, our method is able to analyse higher order protein complexes without the need to explicitly identify chains as receptors or ligands. The PIZSA server is easy to use and could be used to score any input three-dimensional structure and provide a residue pair-wise break up of the results. Attractively, our server offers a platform for users to upload their own potentials and could serve as an ideal testing ground for this class of scoring schemes.

Download Full-text

InterPep2: global peptide–protein docking using interaction surface templates

Bioinformatics ◽

10.1093/bioinformatics/btaa005 ◽

2020 ◽

Vol 36 (8) ◽

pp. 2458-2465 ◽

Cited By ~ 2

Author(s):

Isak Johansson-Åkhe ◽

Claudio Mirabello ◽

Björn Wallner

Keyword(s):

Protein Interactions ◽

Protein Complexes ◽

Structural Features ◽

Protein Docking ◽

Supplementary Information ◽

Peptide Ligand ◽

Protein Protein Interactions ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Improved Performance

Abstract Motivation Interactions between proteins and peptides or peptide-like intrinsically disordered regions are involved in many important biological processes, such as gene expression and cell life-cycle regulation. Experimentally determining the structure of such interactions is time-consuming and difficult because of the inherent flexibility of the peptide ligand. Although several prediction-methods exist, most are limited in performance or availability. Results InterPep2 is a freely available method for predicting the structure of peptide–protein interactions. Improved performance is obtained by using templates from both peptide–protein and regular protein–protein interactions, and by a random forest trained to predict the DockQ-score for a given template using sequence and structural features. When tested on 252 bound peptide–protein complexes from structures deposited after the complexes used in the construction of the training and templates sets of InterPep2, InterPep2-Refined correctly positioned 67 peptides within 4.0 Å LRMSD among top10, similar to another state-of-the-art template-based method which positioned 54 peptides correctly. However, InterPep2 displays a superior ability to evaluate the quality of its own predictions. On a previously established set of 27 non-redundant unbound-to-bound peptide–protein complexes, InterPep2 performs on-par with leading methods. The extended InterPep2-Refined protocol managed to correctly model 15 of these complexes within 4.0 Å LRMSD among top10, without using templates from homologs. In addition, combining the template-based predictions from InterPep2 with ab initio predictions from PIPER-FlexPepDock resulted in 22% more near-native predictions compared to the best single method (22 versus 18). Availability and implementation The program is available from: http://wallnerlab.org/InterPep2. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Text mining for modeling of protein complexes enhanced by machine learning

Bioinformatics ◽

10.1093/bioinformatics/btaa823 ◽

2020 ◽

Author(s):

Varsha D Badal ◽

Petras J Kundrotas ◽

Ilya A Vakser

Keyword(s):

Machine Learning ◽

Text Mining ◽

Protein Interactions ◽

Full Text ◽

Protein Complexes ◽

Protein Docking ◽

Supplementary Information ◽

Support Vector ◽

Learning Approaches ◽

Protein Protein Interactions

Abstract Motivation Procedures for structural modeling of protein-protein complexes (protein docking) produce a number of models which need to be further analyzed and scored. Scoring can be based on independently determined constraints on the structure of the complex, such as knowledge of amino acids essential for the protein interaction. Previously, we showed that text mining of residues in freely available PubMed abstracts of papers on studies of protein-protein interactions may generate such constraints. However, absence of post-processing of the spotted residues reduced usability of the constraints, as a significant number of the residues were not relevant for the binding of the specific proteins. Results We explored filtering of the irrelevant residues by two machine learning approaches, Deep Recursive Neural Network (DRNN) and Support Vector Machine (SVM) models with different training/testing schemes. The results showed that the DRNN model is superior to the SVM model when training is performed on the PMC-OA full-text articles and applied to classification (interface or non-interface) of the residues spotted in the PubMed abstracts. When both training and testing is performed on full-text articles or on abstracts, the performance of these models is similar. Thus, in such cases, there is no need to utilize computationally demanding DRNN approach, which is computationally expensive especially at the training stage. The reason is that SVM success is often determined by the similarity in data/text patterns in the training and the testing sets, whereas the sentence structures in the abstracts are, in general, different from those in the full text articles. Availability The code and the datasets generated in this study are available at https://gitlab.ku.edu/vakser-lab-public/text-mining/-/tree/2020-09-04. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Predicting direct physical interactions in multimeric proteins with deep learning

10.1101/2021.11.09.467949 ◽

2021 ◽

Author(s):

Mu Gao ◽

Davi Nakajima An ◽

Jerry M Parks ◽

Jeffrey Skolnick

Keyword(s):

Protein Interactions ◽

Protein Complexes ◽

Network Models ◽

Protein Docking ◽

Protein Protein Interactions ◽

Neural Network Models ◽

Sequence Alignments ◽

Multiple Sequence ◽

Atomic Structures ◽

Cytochrome C Biogenesis

Accurate descriptions of protein-protein interactions are essential for understanding biological systems. Very recently, AlphaFold2 has been shown to be remarkably accurate for predicting the atomic structures of individual proteins. Here, we demonstrate that the same neural network models developed for AlphaFold2 can be adapted to predict the structures of multimeric protein complexes without retraining. In contrast to common approaches that require paired multiple sequence alignments, our method, AF2Complex, works without using such paired alignments. It achieves higher accuracy than complex strategies that combine AlphaFold2 and protein-protein docking. New metrics are then introduced for predicting direct protein-protein interactions between arbitrary protein pairs. The approach is successfully validated on some challenging CASP14 multimeric targets, a small but appropriate benchmark set, and the E. coli proteome. Lastly, using the cytochrome c biogenesis system as an example, we present high-confidence models of three sought-after assemblies formed by eight members of this system.

Download Full-text

Hotspot coevolution at protein-protein interfaces is a key identifier of native protein complexes

10.1101/698233 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sambit K. Mishra ◽

Sarah J. Cooper ◽

Jerry M. Parks ◽

Julie C. Mitchell

Keyword(s):

Protein Interactions ◽

Protein Complexes ◽

Fundamental Problem ◽

Scoring Function ◽

Binding Mode ◽

Protein Docking ◽

Evolutionary Information ◽

Protein Protein Interactions ◽

Novel Approach ◽

Protein Interfaces

AbstractProtein-protein interactions play a key role in mediating numerous biological functions, with more than half the proteins in living organisms existing as either homo- or hetero-oligomeric assemblies. Protein subunits that form oligomers minimize the free energy of the complex, but exhaustive computational search-based docking methods have not comprehensively addressed the protein docking challenge of distinguishing a natively bound complex from non-native forms. In this study, we propose a scoring function, KFC-E, that accounts for both conservation and coevolution of putative binding hotspot residues at protein-protein interfaces. For a benchmark set of 53 bound complexes, KFC-E identifies a near-native binding mode as the top-scoring pose in 38% and in the top 5 in 55% of the complexes. For a set of 17 unbound complexes, KFC-E identifies a near-native pose in the top 10 ranked poses in more than 50% of the cases. By contrast, a scoring function that incorporates information on coevolution at predicted non-hotspots performs poorly by comparison. Our study highlights the importance of coevolution at hotspot residues in forming natively bound complexes and suggests a novel approach for coevolutionary scoring in protein docking.Author SummaryA fundamental problem in biology is to distinguish between the native and non-native bound forms of protein-protein complexes. Experimental methods are often used to detect the native bound forms of proteins but, are demanding in terms of time and resources. Computational approaches have proven to be a useful alternative; they sample the different binding configurations for a pair of interacting proteins and then use an heuristic or physical model to score them. In this study we propose a new scoring approach, KFC-E, which focuses on the evolutionary contributions from a subset of key interface residues (hotspots) to identify native bound complexes. KFC-E capitalizes on the wealth of information in protein sequence databases by incorporating residue-level conservation and coevolution of putative binding hotspots. As hotspot residues mediate the binding energetics of protein-protein interactions, we hypothesize that the knowledge of putative hotspots coupled with their evolutionary information should be helpful in the identification of native bound protein-protein complexes.

Download Full-text