A reference map of the human protein interactome

Mapping Intimacies ◽

10.1101/605451 ◽

2019 ◽

Cited By ~ 10

Author(s):

Katja Luck ◽

Dae-Kyum Kim ◽

Luke Lambourne ◽

Kerstin Spirohn ◽

Bridget E. Begg ◽

...

Keyword(s):

Protein Interactions ◽

Protein Function ◽

Molecular Mechanisms ◽

Human Genetics ◽

Genomic Variation ◽

Small Scale ◽

Human Interactome ◽

Tissue Specific ◽

Cellular Context ◽

Reference Map

AbstractGlobal insights into cellular organization and function require comprehensive understanding of interactome networks. Similar to how a reference genome sequence revolutionized human genetics, a reference map of the human interactome network is critical to fully understand genotype-phenotype relationships. Here we present the first human “all-by-all” binary reference interactome map, or “HuRI”. With ~53,000 high-quality protein-protein interactions (PPIs), HuRI is approximately four times larger than the information curated from small-scale studies available in the literature. Integrating HuRI with genome, transcriptome and proteome data enables the study of cellular function within essentially any physiological or pathological cellular context. We demonstrate the use of HuRI in identifying specific subcellular roles of PPIs and protein function modulation via splicing during brain development. Inferred tissue-specific networks reveal general principles for the formation of cellular context-specific functions and elucidate potential molecular mechanisms underlying tissue-specific phenotypes of Mendelian diseases. HuRI thus represents an unprecedented, systematic reference linking genomic variation to phenotypic outcomes.

Download Full-text

An integration of deep learning with feature embedding for protein–protein interaction prediction

PeerJ ◽

10.7717/peerj.7126 ◽

2019 ◽

Vol 7 ◽

pp. e7126 ◽

Cited By ~ 7

Author(s):

Yu Yao ◽

Xiuquan Du ◽

Yanyu Diao ◽

Huaixu Zhu

Keyword(s):

Deep Learning ◽

Drug Discovery ◽

Protein Interactions ◽

Protein Function ◽

Molecular Mechanisms ◽

Protein Protein Interactions ◽

Model Combining ◽

Representation Method ◽

Structure Knowledge ◽

Matthew’S Correlation Coefficient

Protein–protein interactions are closely relevant to protein function and drug discovery. Hence, accurately identifying protein–protein interactions will help us to understand the underlying molecular mechanisms and significantly facilitate the drug discovery. However, the majority of existing computational methods for protein–protein interactions prediction are focused on the feature extraction and combination of features and there have been limited gains from the state-of-the-art models. In this work, a new residue representation method named Res2vec is designed for protein sequence representation. Residue representations obtained by Res2vec describe more precisely residue-residue interactions from raw sequence and supply more effective inputs for the downstream deep learning model. Combining effective feature embedding with powerful deep learning techniques, our method provides a general computational pipeline to infer protein–protein interactions, even when protein structure knowledge is entirely unknown. The proposed method DeepFE-PPI is evaluated on the S. Cerevisiae and human datasets. The experimental results show that DeepFE-PPI achieves 94.78% (accuracy), 92.99% (recall), 96.45% (precision), 89.62% (Matthew’s correlation coefficient, MCC) and 98.71% (accuracy), 98.54% (recall), 98.77% (precision), 97.43% (MCC), respectively. In addition, we also evaluate the performance of DeepFE-PPI on five independent species datasets and all the results are superior to the existing methods. The comparisons show that DeepFE-PPI is capable of predicting protein–protein interactions by a novel residue representation method and a deep learning classification framework in an acceptable level of accuracy. The codes along with instructions to reproduce this work are available from https://github.com/xal2019/DeepFE-PPI.

Download Full-text

A reliable and unbiased human protein network with the disparity filter

10.1101/207761 ◽

2017 ◽

Cited By ~ 1

Author(s):

Gregorio Alanis-Lobato ◽

Miguel A. Andrade-Navarro

Keyword(s):

Experimental Evidence ◽

Protein Interactions ◽

Living Cell ◽

Human Proteome ◽

Human Protein ◽

Protein Network ◽

Small Scale ◽

Cellular Functions ◽

Wiring Diagram ◽

Human Interactome

AbstractThe living cell operates thanks to an intricate network of protein interactions. Proteins activate, transport, degrade, stabilise and participate in the production of other proteins. As a result, a reliable and systematically generated protein wiring diagram is crucial for a deeper understanding of cellular functions. Unfortunately, current human protein networks are noisy and incomplete. Also, they suffer from both study and technical biases: heavily studied proteins (e.g. those of pharmaceutical interest) are known to be involved in more interactions than proteins described in only a few publications. Here, we use the experimental evidence supporting the interaction between proteins, in conjunction with the so-called disparity filter, to construct a reliable and unbiased proteome-scale human interactome. The application of a global filter, i.e. only considering interactions with multiple pieces of evidence, would result in an excessively pruned network. In contrast, the disparity filter preserves interactions supported by a statistically significant number of studies and does not overlook small-scale protein associations. The resulting disparity-filtered protein network covers 67% of the human proteome and retains most of the network’s weight and connectivity properties.

Download Full-text

PINA 3.0: mining cancer interactome

Nucleic Acids Research ◽

10.1093/nar/gkaa1075 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1351-D1357

Author(s):

Yang Du ◽

Meng Cai ◽

Xiaofang Xing ◽

Jiafu Ji ◽

Ence Yang ◽

...

Keyword(s):

Protein Interactions ◽

Molecular Mechanisms ◽

Interaction Network ◽

Cancer Type ◽

Interacting Proteins ◽

Protein Protein Interactions ◽

Specific Context ◽

Human Interactome ◽

Interactive Network ◽

Cancer Types

Abstract Protein–protein interactions (PPIs) are crucial to mediate biological functions, and understanding PPIs in cancer type-specific context could help decipher the underlying molecular mechanisms of tumorigenesis and identify potential therapeutic options. Therefore, we update the Protein Interaction Network Analysis (PINA) platform to version 3.0, to integrate the unified human interactome with RNA-seq transcriptomes and mass spectrometry-based proteomes across tens of cancer types. A number of new analytical utilities were developed to help characterize the cancer context for a PPI network, which includes inferring proteins with expression specificity and identifying candidate prognosis biomarkers, putative cancer drivers, and therapeutic targets for a specific cancer type; as well as identifying pairs of co-expressing interacting proteins across cancer types. Furthermore, a brand-new web interface has been designed to integrate these new utilities within an interactive network visualization environment, which allows users to quickly and comprehensively investigate the roles of human interacting proteins in a cancer type-specific context. PINA is freely available at https://omics.bjcancer.org/pina/.

Download Full-text

Multilayer view of pathogenic SNVs in human interactome throughin-silicoedgetic profiling

10.1101/299891 ◽

2018 ◽

Author(s):

Hongzhu Cui ◽

Nan Zhao ◽

Dmitry Korkin

Keyword(s):

Protein Interactions ◽

In Silico ◽

Molecular Mechanisms ◽

Synergistic Effects ◽

Patient Specific ◽

Single Nucleotide Variants ◽

Human Interactome ◽

Frameshift Mutations ◽

Next Generation Sequencing Technology ◽

Synonymous Mutations

ABSTRACTNon-synonymous mutations linked to the complex diseases often have a global impact on a biological system, affecting large biomolecular networks and pathways. However, the magnitude of the mutation-driven effects on the macromolecular network is yet to be fully explored. In this work, we present an systematic multi-level characterization of human mutations associated with genetic disorders by determining their individual and combined interaction-rewiring, “edgetic”, effects on the human interactome. Ourin-silicoanalysis highlights the intrinsic differences and important similarities between the pathogenic single nucleotide variants (SNVs) and frameshift mutations. We show that pathogenic SNVs are more likely to cause gene pleiotropy than pathogenic frameshift mutations and are enriched on the protein interaction interfaces. Functional profiling of SNVs indicates widespread disruption of the protein-protein interactions and synergistic effects of SNVs. The coverage of our approach is several times greater than the recently published experimental study and has the minimal overlap with it, while the distributions of determined edgotypes between the two sets of profiled mutations are remarkably similar. Case studies reveal the central role of interaction-disrupting mutations in type 2 diabetes mellitus, and suggest the importance of studying mutations that abnormally strengthen the protein interactions in cancer. With the advancement of next-generation sequencing technology that drives precision medicine, there is an increasing demand in understanding the changes in molecular mechanisms caused by the patient-specific genetic variation. The current and futurein-silicoedgotyping tools present a cheap and fast solution to deal with the rapidly growing datasets of discovered mutations.

Download Full-text

Prediction of kinase-specific phosphorylation sites through an integrative model of protein context and sequence

10.1101/043679 ◽

2016 ◽

Author(s):

Ralph Patrick ◽

Coralie Horin ◽

Bostjan Kobe ◽

Kim-Anh Lê Cao ◽

Mikael Bodén

Keyword(s):

Protein Interactions ◽

Binding Sites ◽

Protein Function ◽

Computational Prediction ◽

Alternative Methods ◽

Integrative Model ◽

Model Organisms ◽

Phosphorylation Sites ◽

Nuclear Localisation ◽

Cellular Context

AbstractThe identification of kinase substrates and the specific phosphorylation sites they regulate is an important factor in understanding protein function regulation and signalling pathways. Computational prediction of kinase targets – assigning kinases to putative substrates, and selecting from protein sequence the sites that kinases can phosphorylate – requires the consideration of both the cellular context that kinases operate in, as well as their binding affinity. This consideration enables investigation of how phosphorylation influences a range of biological processes.We report here a novel probabilistic model for the classification of kinase-specific phosphorylation sites from sequence across three model organisms: human, mouse and yeast. The model incorporates position-specific amino acid frequencies, and counts of co-occurring amino acids from kinase binding sites in a kinase‐ and family-specific manner. We show how this model can be seamlessly integrated with protein interactions and cell-cycle abundance profiles. When evaluating the prediction accuracy of our method, PhosphoPICK, on an independent hold-out set of kinase-specific phosphorylation sites, we found it achieved an average specificity of 97% while correctly predicting 32% of true positives. We also compared PhosphoPICK’s ability, through cross-validation, to predict kinase-specific phosphorylation sites with alternative methods, and found that at high levels of specificity PhosphoPICK outperforms alternative methods for most comparisons made.We investigated the relationship between experimentally confirmed phosphorylation sites and predicted nuclear localisation signals by predicting the most likely kinases to be regulating the phosphorylated residues immediately upstream or downstream from the localisation signal. We show that kinases PKA, Akt1 and AurB have an over-representation of predicted binding sites at particular positions downstream from predicted nuclear localisation signals, demonstrating an important role for these kinases in regulating the nuclear import of proteins.PhosphoPICK is freely available online as a web-service at http://bioinf.scmb.uq.edu.au/phosphopick.

Download Full-text

Augmenting protein network embeddings with sequence information

10.1101/730481 ◽

2019 ◽

Cited By ~ 2

Author(s):

Hassan Kané ◽

Mohamed Coulibali ◽

Ali Abdalla ◽

Pelkins Ajanoh

Keyword(s):

Protein Interactions ◽

Protein Function ◽

Quaternary Structure ◽

Protein Function Prediction ◽

Representation Learning ◽

Specific Protein ◽

Sequence Information ◽

Protein Protein Interactions ◽

Tissue Specific ◽

Protein Protein Interaction

ABSTRACTComputational methods that infer the function of proteins are key to understanding life at the molecular level. In recent years, representation learning has emerged as a powerful paradigm to discover new patterns among entities as varied as images, words, speech, molecules. In typical representation learning, there is only one source of data or one level of abstraction at which the learned representation occurs. However, proteins can be described by their primary, secondary, tertiary, and quaternary structure or even as nodes in protein-protein interaction networks. Given that protein function is an emergent property of all these levels of interactions in this work, we learn joint representations from both amino acid sequence and multilayer networks representing tissue-specific protein-protein interactions. Using these hybrid representations, we show that simple machine learning models trained using these hybrid representations outperform existing network-based methods on the task of tissue-specific protein function prediction on 13 out of 13 tissues. Furthermore, these representations outperform existing ones by 14% on average.

Download Full-text

Assessment of community efforts to advance computational prediction of protein-protein interactions

10.1101/2021.09.22.461292 ◽

2021 ◽

Author(s):

Xu-Wen Wang ◽

Lorenzo Madeddu ◽

Kerstin Spirohn ◽

Leonardo Martini ◽

Adriano Fazzone ◽

...

Keyword(s):

Protein Interactions ◽

Molecular Mechanisms ◽

Computational Prediction ◽

Superior Performance ◽

Systematic Evaluation ◽

Human Interactome ◽

C Elegans ◽

Protein Protein Interaction ◽

Human Ppis ◽

Underlying Network

AbstractComprehensive insights from the human protein-protein interaction (PPI) network, known as the human interactome, can provide important insights into the molecular mechanisms of complex biological processes and diseases. Despite the remarkable experimental efforts undertaken to date to determine the structure of the human interactome, many PPIs remain unmapped. Computational approaches, especially network-based methods, can facilitate the identification of new PPIs. Many such approaches have been proposed. However, a systematic evaluation of existing network-based methods in predicting PPIs is still lacking. Here, we report community efforts initiated by the International Network Medicine Consortium to benchmark the ability of 24 representative network-based methods to predict PPIs across five different interactomes, including a synthetic interactome generated by the duplication-mutation-complementation model, and the interactomes of four different organisms: A. thaliana, C. elegans, S. cerevisiae, and H. sapiens. We selected the top-seven methods through a computational validation on the human interactome. We next experimentally validated their top-500 predicted PPIs (in total 3,276 predicted PPIs) using the yeast two-hybrid assay, finding 1,177 new human PPIs (involving 633 proteins). Our results indicate that task-tailored similarity-based methods, which leverage the underlying network characteristics of PPIs, show superior performance over other general link prediction methods. Through experimental validation, we confirmed that the top-ranking methods show promising performance externally. For example, from the top 500 PPIs predicted by an advanced similarity-base method [MPS(B&T)], 430 were successfully tested by Y2H with 376 testing positive, yielding a precision of 87.4%. These results establish advanced similarity-based methods as powerful tools for the prediction of human PPIs.

Download Full-text

Important Role for Phylogenetically Invariant PP2Acα Active Site and C-Terminal Residues Revealed by Mutational Analysis in Saccharomyces cerevisiae

Genetics ◽

10.1093/genetics/156.1.21 ◽

2000 ◽

Vol 156 (1) ◽

pp. 21-29 ◽

Cited By ~ 1

Author(s):

David R H Evans ◽

Brian A Hemmings

Keyword(s):

Protein Interactions ◽

Active Site ◽

Protein Function ◽

Mutational Analysis ◽

Yeast Cells ◽

Temperature Sensitive ◽

Active Site Residues ◽

Catalytic Function

Abstract PP2A is a central regulator of eukaryotic signal transduction. The human catalytic subunit PP2Acα functionally replaces the endogenous yeast enzyme, Pph22p, indicating a conservation of function in vivo. Therefore, yeast cells were employed to explore the role of invariant PP2Ac residues. The PP2Acα Y127N substitution abolished essential PP2Ac function in vivo and impaired catalysis severely in vitro, consistent with the prediction from structural studies that Tyr-127 mediates substrate binding and its side chain interacts with the key active site residues His-118 and Asp-88. The V159E substitution similarly impaired PP2Acα catalysis profoundly and may cause global disruption of the active site. Two conditional mutations in the yeast Pph22p protein, F232S and P240H, were found to cause temperature-sensitive impairment of PP2Ac catalytic function in vitro. Thus, the mitotic and cell lysis defects conferred by these mutations result from a loss of PP2Ac enzyme activity. Substitution of the PP2Acα C-terminal Tyr-307 residue by phenylalanine impaired protein function, whereas the Y307D and T304D substitutions abolished essential function in vivo. Nevertheless, Y307D did not reduce PP2Acα catalytic activity significantly in vitro, consistent with an important role for the C terminus in mediating essential protein-protein interactions. Our results identify key residues important for PP2Ac function and characterize new reagents for the study of PP2A in vivo.

Download Full-text

Adhesion and Growth of Neuralized Mouse Embryonic Stem Cells on Parylene-C/SiO2 Substrates

Materials ◽

10.3390/ma14123174 ◽

2021 ◽

Vol 14 (12) ◽

pp. 3174

Author(s):

Alan F. Murray ◽

Evangelos Delivopoulos

Keyword(s):

Stem Cells ◽

Embryonic Stem Cells ◽

Protein Interactions ◽

Molecular Mechanisms ◽

Embryonic Stem ◽

Mouse Embryonic Stem Cells ◽

Primary Neurons ◽

Tissue Engineering Scaffolds ◽

Parylene C ◽

Unexpected Outcome

Neuronal patterning on microfabricated architectures has developed rapidly over the past few years, together with the emergence of soft biocompatible materials and tissue engineering scaffolds. Previously, we introduced a patterning technique based on serum and the biopolymer parylene-C, achieving highly compliant growth of primary neurons and astrocytes on different geometries. Here, we expanded this technique and illustrated that neuralized cells derived from mouse embryonic stem cells (mESCs) followed stripes of variable widths with conformity equal to or higher than that of primary neurons and astrocytes. Our results indicate the presence of undifferentiated mESCs, which also conformed to the underlying patterns to a high degree. This is an exciting and unexpected outcome, as molecular mechanisms governing cell and ECM protein interactions are different in stem cells and primary cells. Our study enables further investigations into the development and electrophysiology of differentiating patterned neural stem cells.

Download Full-text

Avoided motifs: short amino acid strings missing from protein datasets

Biological Chemistry ◽

10.1515/hsz-2020-0383 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Pablo Mier ◽

Miguel A. Andrade-Navarro

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Protein Function ◽

Large Protein ◽

New Approach ◽

Cellular Context ◽

Human Proteins ◽

Context Specific ◽

Protein Datasets

Abstract According to the amino acid composition of natural proteins, it could be expected that all possible sequences of three or four amino acids will occur at least once in large protein datasets purely by chance. However, in some species or cellular context, specific short amino acid motifs are missing due to unknown reasons. We describe these as Avoided Motifs, short amino acid combinations missing from biological sequences. Here we identify 209 human and 154 bacterial Avoided Motifs of length four amino acids, and discuss their possible functionality according to their presence in other species. Furthermore, we determine two Avoided Motifs of length three amino acids in human proteins specifically located in the cytoplasm, and two more in secreted proteins. Our results support the hypothesis that the characterization of Avoided Motifs in particular contexts can provide us with information about functional motifs, pointing to a new approach in the use of molecular sequences for the discovery of protein function.

Download Full-text