scholarly journals Inferring interaction partners from protein sequences

2016 ◽  
Author(s):  
Anne-Florence Bitbol ◽  
Robert S. Dwyer ◽  
Lucy J. Colwell ◽  
Ned S. Wingreen

Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multi-protein complexes, and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners. Hence, the sequences of interacting partners are correlated. Here we exploit these correlations to accurately identify which proteins are specific interaction partners from sequence data alone. Our general approach, which employs a pairwise maximum entropy model to infer direct couplings between residues, has been successfully used to predict the three-dimensional structures of proteins from sequences. Building on this approach, we introduce an iterative algorithm to predict specific interaction partners from among the members of two protein families. We assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. The algorithm proves successful without any a priori knowledge of interaction partners, yielding a striking 0.93 true positive fraction on our complete dataset, and we uncover the origin of this surprising success. Finally, we discuss how our method could be used to predict novel protein-protein interactions.

2018 ◽  
Author(s):  
Anne-Florence Bitbol

AbstractSpecific protein-protein interactions are crucial in most cellular processes. They enable multiprotein complexes to assemble and to remain stable, and they allow signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interacting partners, and thus in correlations between their sequences. Pairwise maximum-entropy based models have enabled successful inference of pairs of amino-acid residues that are in contact in the three-dimensional structure of multi-protein complexes, starting from the correlations in the sequence data of known interaction partners. Recently, algorithms inspired by these methods have been developed to identify which proteins are specific interaction partners among the paralogous proteins of two families, starting from sequence data alone. Here, we demonstrate that a slightly higher performance for partner identification can be reached by an approximate maximization of the mutual information between the sequence alignments of the two protein families. This stands in contrast with structure prediction of proteins and of multiprotein complexes from sequence data, where pairwise maximum-entropy based global statistical models substantially improve performance compared to mutual information. Our findings entail that the statistical dependences allowing interaction partner prediction from sequence data are not restricted to the residue pairs that are in direct contact at the interface between the partner proteins.Author summarySpecific protein-protein interactions are at the heart of most intra-cellular processes. Mapping these interactions is thus crucial to a systems-level understanding of cells, and has broad applications to areas such as drug targeting. Systematic experimental identification of protein interaction partners is still challenging. However, a large and rapidly growing amount of sequence data is now available. Recently, algorithms have been proposed to identify which proteins interact from their sequences alone, thanks to the co-variation of the sequences of interacting proteins. These algorithms build upon inference methods that have been used with success to predict the three-dimensional structures of proteins and multi-protein complexes, and their focus is on the amino-acid residues that are in direct contact. Here, we propose a simpler method to identify which proteins interact among the paralogous proteins of two families, starting from their sequences alone. Our method relies on an approximate maximization of mutual information between the sequences of the two families, without specifically emphasizing the contacting residue pairs. We demonstrate that this method slightly outperforms the earlier one. This result highlights that partner prediction does not only rely on the identities and interactions of directly contacting amino-acids.


2016 ◽  
Vol 113 (43) ◽  
pp. 12180-12185 ◽  
Author(s):  
Anne-Florence Bitbol ◽  
Robert S. Dwyer ◽  
Lucy J. Colwell ◽  
Ned S. Wingreen

Specific protein−protein interactions are crucial in the cell, both to ensure the formation and stability of multiprotein complexes and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners, causing their sequences to be correlated. Here we exploit these correlations to accurately identify, from sequence data alone, which proteins are specific interaction partners. Our general approach, which employs a pairwise maximum entropy model to infer couplings between residues, has been successfully used to predict the 3D structures of proteins from sequences. Thus inspired, we introduce an iterative algorithm to predict specific interaction partners from two protein families whose members are known to interact. We first assess the algorithm’s performance on histidine kinases and response regulators from bacterial two-component signaling systems. We obtain a striking 0.93 true positive fraction on our complete dataset without any a priori knowledge of interaction partners, and we uncover the origin of this success. We then apply the algorithm to proteins from ATP-binding cassette (ABC) transporter complexes, and obtain accurate predictions in these systems as well. Finally, we present two metrics that accurately distinguish interacting protein families from noninteracting ones, using only sequence data.


2018 ◽  
Vol 46 (6) ◽  
pp. 1593-1603 ◽  
Author(s):  
Chenkang Zheng ◽  
Patricia C. Dos Santos

Iron–sulfur (Fe–S) clusters are ubiquitous cofactors present in all domains of life. The chemistries catalyzed by these inorganic cofactors are diverse and their associated enzymes are involved in many cellular processes. Despite the wide range of structures reported for Fe–S clusters inserted into proteins, the biological synthesis of all Fe–S clusters starts with the assembly of simple units of 2Fe–2S and 4Fe–4S clusters. Several systems have been associated with the formation of Fe–S clusters in bacteria with varying phylogenetic origins and number of biosynthetic and regulatory components. All systems, however, construct Fe–S clusters through a similar biosynthetic scheme involving three main steps: (1) sulfur activation by a cysteine desulfurase, (2) cluster assembly by a scaffold protein, and (3) guided delivery of Fe–S units to either final acceptors or biosynthetic enzymes involved in the formation of complex metalloclusters. Another unifying feature on the biological formation of Fe–S clusters in bacteria is that these systems are tightly regulated by a network of protein interactions. Thus, the formation of transient protein complexes among biosynthetic components allows for the direct transfer of reactive sulfur and Fe–S intermediates preventing oxygen damage and reactions with non-physiological targets. Recent studies revealed the importance of reciprocal signature sequence motifs that enable specific protein–protein interactions and consequently guide the transactions between physiological donors and acceptors. Such findings provide insights into strategies used by bacteria to regulate the flow of reactive intermediates and provide protein barcodes to uncover yet-unidentified cellular components involved in Fe–S metabolism.


2019 ◽  
Author(s):  
Georgy Derevyanko ◽  
Guillaume Lamoureux

AbstractProtein-protein interactions are determined by a number of hard-to-capture features related to shape complementarity, electrostatics, and hydrophobicity. These features may be intrinsic to the protein or induced by the presence of a partner. A conventional approach to protein-protein docking consists in engineering a small number of spatial features for each protein, and in minimizing the sum of their correlations with respect to the spatial arrangement of the two proteins. To generalize this approach, we introduce a deep neural network architecture that transforms the raw atomic densities of each protein into complex three-dimensional representations. Each point in the volume containing the protein is described by 48 learned features, which are correlated and combined with the features of a second protein to produce a score dependent on the relative position and orientation of the two proteins. The architecture is based on multiple layers of SE(3)-equivariant convolutional neural networks, which provide built-in rotational and translational invariance of the score with respect to the structure of the complex. The model is trained end-to-end on a set of decoy conformations generated from 851 nonredundant protein-protein complexes and is tested on data from the Protein-Protein Docking Benchmark Version 4.0.


Inorganics ◽  
2019 ◽  
Vol 7 (7) ◽  
pp. 85 ◽  
Author(s):  
Yap Shing Nim ◽  
Kam-Bo Wong

Maturation of urease involves post-translational insertion of nickel ions to form an active site with a carbamylated lysine ligand and is assisted by urease accessory proteins UreD, UreE, UreF and UreG. Here, we review our current understandings on how these urease accessory proteins facilitate the urease maturation. The urease maturation pathway involves the transfer of Ni2+ from UreE → UreG → UreF/UreD → urease. To avoid the release of the toxic metal to the cytoplasm, Ni2+ is transferred from one urease accessory protein to another through specific protein–protein interactions. One central theme depicts the role of guanosine triphosphate (GTP) binding/hydrolysis in regulating the binding/release of nickel ions and the formation of the protein complexes. The urease and [NiFe]-hydrogenase maturation pathways cross-talk with each other as UreE receives Ni2+ from hydrogenase maturation factor HypA. Finally, the druggability of the urease maturation pathway is reviewed.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 782 ◽  
Author(s):  
Virja Mehta ◽  
Laura Trinkle-Mulcahy

Protein-protein interactions (PPIs) underlie most, if not all, cellular functions. The comprehensive mapping of these complex networks of stable and transient associations thus remains a key goal, both for systems biology-based initiatives (where it can be combined with other ‘omics’ data to gain a better understanding of functional pathways and networks) and for focused biological studies. Despite the significant challenges of such an undertaking, major strides have been made over the past few years. They include improvements in the computation prediction of PPIs and the literature curation of low-throughput studies of specific protein complexes, but also an increase in the deposition of high-quality data from non-biased high-throughput experimental PPI mapping strategies into publicly available databases.


2019 ◽  
Author(s):  
Guillaume Marmier ◽  
Martin Weigt ◽  
Anne-Florence Bitbol

AbstractDetermining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among the paralogs of ubiquitous prokaryotic proteins families, starting from sequence data alone. Since DCA allows to infer the three-dimensional structure of protein complexes, its success in predicting protein-protein interactions could be mainly based on contacting residues coevolving to remain physicochemically complementary. However, interacting proteins often possess similar evolutionary histories, which also gives rise to correlations among their sequences. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involves phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that only share evolutionary history. It performs as well as methods explicitly based on sequence similarity, and even slightly better with large and accurate training sets. We further demonstrate the ability of these various methods to correctly predict pairings among actual paralogous proteins with genome proximity but no known direct physical interaction, which illustrates the importance of phylogenetic correlations in real data. However, for actually interacting and strongly coevolving proteins, DCA and mutual information outperform sequence similarity.Author summaryMany biologically important protein-protein interactions are conserved over evolutionary time scales. This leads to two different signals that can be used to computationally predict interactions between protein families and to identify specific interaction partners. First, the shared evolutionary history leads to highly similar phylogenetic relationships between interacting proteins of the two families. Second, the need to keep the interaction surfaces of partner proteins biophysically compatible causes a correlated amino-acid usage of interface residues. Employing simulated data, we show that the shared history alone can be used to detect partner proteins. Similar accuracies are achieved by algorithms comparing phylogenetic relationships and by coevolutionary methods based on Direct Coupling Analysis, which are a priori designed to detect the second type of signal. Using real sequence data, we show that in cases with shared evolutionary but without known physical interactions, both methods work with similar accuracy, while for physically interacting systems, methods based on correlated amino-acid usage outperform purely phylogenetic ones.


2019 ◽  
Vol 47 (W1) ◽  
pp. W331-W337 ◽  
Author(s):  
Ankit A Roy ◽  
Abhilesh S Dhawanjewar ◽  
Parichit Sharma ◽  
Gulzar Singh ◽  
M S Madhusudhan

Abstract Our web server, PIZSA (http://cospi.iiserpune.ac.in/pizsa), assesses the likelihood of protein–protein interactions by assigning a Z Score computed from interface residue contacts. Our score takes into account the optimal number of atoms that mediate the interaction between pairs of residues and whether these contacts emanate from the main chain or side chain. We tested the score on 174 native interactions for which 100 decoys each were constructed using ZDOCK. The native structure scored better than any of the decoys in 146 cases and was able to rank within the 95th percentile in 162 cases. This easily outperforms a competing method, CIPS. We also benchmarked our scoring scheme on 15 targets from the CAPRI dataset and found that our method had results comparable to that of CIPS. Further, our method is able to analyse higher order protein complexes without the need to explicitly identify chains as receptors or ligands. The PIZSA server is easy to use and could be used to score any input three-dimensional structure and provide a residue pair-wise break up of the results. Attractively, our server offers a platform for users to upload their own potentials and could serve as an ideal testing ground for this class of scoring schemes.


eLife ◽  
2015 ◽  
Vol 4 ◽  
Author(s):  
Anna Vangone ◽  
Alexandre MJJ Bonvin

Almost all critical functions in cells rely on specific protein–protein interactions. Understanding these is therefore crucial in the investigation of biological systems. Despite all past efforts, we still lack a thorough understanding of the energetics of association of proteins. Here, we introduce a new and simple approach to predict binding affinity based on functional and structural features of the biological system, namely the network of interfacial contacts. We assess its performance against a protein–protein binding affinity benchmark and show that both experimental methods used for affinity measurements and conformational changes have a strong impact on prediction accuracy. Using a subset of complexes with reliable experimental binding affinities and combining our contacts and contact-types-based model with recent observations on the role of the non-interacting surface in protein–protein interactions, we reach a high prediction accuracy for such a diverse dataset outperforming all other tested methods.


2019 ◽  
Author(s):  
Carlos A. Gandarilla-Pérez ◽  
Pierre Mergny ◽  
Martin Weigt ◽  
Anne-Florence Bitbol

Identifying protein-protein interactions is crucial for a systems-level understanding of the cell. Recently, algorithms based on inverse statistical physics, e.g. Direct Coupling Analysis (DCA), have allowed to use evolutionarily related sequences to address two conceptually related inference tasks: finding pairs of interacting proteins, and identifying pairs of residues which form contacts between interacting proteins. Here we address two underlying questions: How are the performances of both inference tasks related? How does performance depend on dataset size and the quality? To this end, we formalize both tasks using Ising models defined over stochastic block models, with individual blocks representing single proteins, and inter-block couplings protein-protein interactions; controlled synthetic sequence data are generated by Monte-Carlo simulations. We show that DCA is able to address both inference tasks accurately when sufficiently large training sets of known interaction partners are available, and that an iterative pairing algorithm (IPA) allows to make predictions even without a training set. Noise in the training data deteriorates performance. In both tasks we find a quadratic scaling relating dataset quality and size that is consistent with noise adding in square-root fashion and signal adding linearly when increasing the dataset. This implies that it is generally good to incorporate more data even if its quality is imperfect, thereby shedding light on the empirically observed performance of DCA applied to natural protein sequences.


Sign in / Sign up

Export Citation Format

Share Document