ComplexBrowser: a tool for identification and quantification of protein complexes in large scale proteomics datasets

MODE-TASK: Large-scale protein motion tools

10.1101/217505 ◽

2017 ◽

Author(s):

Caroline Ross ◽

Bilal Nizami ◽

Michael Glenister ◽

Olivier Sheik Amamuddy ◽

Ali Rana Atilgan ◽

...

Keyword(s):

Large Scale ◽

Protein Complexes ◽

Normal Mode Analysis ◽

Md Simulations ◽

Supplementary Information ◽

Mode Analysis ◽

Analysis Tool ◽

Link Type ◽

Supplementary Material ◽

Anisotropic Network

AbstractSummaryMODE-TASK, a novel software suite, comprises Principle Component Analysis, Multidimensional Scaling, and t-Distributed Stochastic Neighbor Embedding techniques using molecular dynamics trajectories. MODE-TASK also includes a Normal Mode Analysis tool based on Anisotropic Network Model so as to provide a variety of ways to analyse and compare large-scale motions of protein complexes for which long MD simulations are prohibitive.Availability and ImplementationMODE-TASK has been open-sourced, and is available for download from https://github.com/RUBi-ZA/MODE-TASK, implemented in Python and C++.Supplementary informationDocumentation available at http://mode-task.readthedocs.io.

Download Full-text

Using Coarse-Grained Simulations to Characterize the Mechanisms of Protein–Protein Association

Biomolecules ◽

10.3390/biom10071056 ◽

2020 ◽

Vol 10 (7) ◽

pp. 1056 ◽

Cited By ~ 1

Author(s):

Kalyani Dhusia ◽

Zhaoqian Su ◽

Yinghao Wu

Keyword(s):

Protein Complex ◽

Large Scale ◽

Protein Complexes ◽

Kinetic Property ◽

Simulation Method ◽

Coarse Grained ◽

Structural Level ◽

Simulation Approach ◽

Protein Association ◽

Order Of Magnitude

The formation of functionally versatile protein complexes underlies almost every biological process. The estimation of how fast these complexes can be formed has broad implications for unravelling the mechanism of biomolecular recognition. This kinetic property is traditionally quantified by association rates, which can be measured through various experimental techniques. To complement these time-consuming and labor-intensive approaches, we developed a coarse-grained simulation approach to study the physical processes of protein–protein association. We systematically calibrated our simulation method against a large-scale benchmark set. By combining a physics-based force field with a statistically-derived potential in the simulation, we found that the association rates of more than 80% of protein complexes can be correctly predicted within one order of magnitude relative to their experimental measurements. We further showed that a mixture of force fields derived from complementary sources was able to describe the process of protein–protein association with mechanistic details. For instance, we show that association of a protein complex contains multiple steps in which proteins continuously search their local binding orientations and form non-native-like intermediates through repeated dissociation and re-association. Moreover, with an ensemble of loosely bound encounter complexes observed around their native conformation, we suggest that the transition states of protein–protein association could be highly diverse on the structural level. Our study also supports the idea in which the association of a protein complex is driven by a “funnel-like” energy landscape. In summary, these results shed light on our understanding of how protein–protein recognition is kinetically modulated, and our coarse-grained simulation approach can serve as a useful addition to the existing experimental approaches that measure protein–protein association rates.

Download Full-text

Towards a systematic characterization of protein complex function: a natural language processing and machine-learning framework

10.1101/2021.02.24.432789 ◽

2021 ◽

Author(s):

Varun S. Sharma ◽

Andrea Fossati ◽

Rodolfo Ciuffa ◽

Marija Buljan ◽

Evan G. Williams ◽

...

Keyword(s):

Natural Language ◽

Language Processing ◽

Protein Complex ◽

Large Scale ◽

Protein Complexes ◽

Complex Function ◽

Word Embedding ◽

Biological Processes ◽

Go Terms

SummaryIt is a general assumption of molecular biology that the ensemble of expressed molecules, their activities and interactions determine biological processes, cellular states and phenotypes. Quantitative abundance of transcripts, proteins and metabolites are now routinely measured with considerable depth via an array of “OMICS” technologies, and recently a number of methods have also been introduced for the parallel analysis of the abundance, subunit composition and cell state specific changes of protein complexes. In comparison to the measurement of the molecular entities in a cell, the determination of their function remains experimentally challenging and labor-intensive. This holds particularly true for determining the function of protein complexes, which constitute the core functional assemblies of the cell. Therefore, the tremendous progress in multi-layer molecular profiling has been slow to translate into increased functional understanding of biological processes, cellular states and phenotypes. In this study we describe PCfun, a computational framework for the systematic annotation of protein complex function using Gene Ontology (GO) terms. This work is built upon the use of word embedding— natural language text embedded into continuous vector space that preserves semantic relationships— generated from the machine reading of 1 million open access PubMed Central articles. PCfun leverages the embedding for rapid annotation of protein complex function by integrating two approaches: (1) an unsupervised approach that obtains the nearest neighbor (NN) GO term word vectors for a protein complex query vector, and (2) a supervised approach using Random Forest (RF) models trained specifically for recovering the GO terms of protein complex queries described in the CORUM protein complex database. PCfun consolidates both approaches by performing the statistical test for the enrichment of the top NN GO terms within the child terms of the predicted GO terms by RF models. Thus, PCfun amalgamates information learned from the gold-standard protein-complex database, CORUM, with the unbiased predictions obtained directly from the word embedding, thereby enabling PCfun to identify the potential functions of putative protein complexes. The documentation and examples of the PCfun package are available at https://github.com/sharmavaruns/PCfun. We anticipate that PCfun will serve as a useful tool and novel paradigm for the large-scale characterization of protein complex function.

Download Full-text

Protein complex similarity based on Weisfeiler-Lehman labeling

10.7287/peerj.preprints.26612 ◽

2018 ◽

Author(s):

Bianca K Stöcker ◽

Till Schäfer ◽

Petra Mutzel ◽

Johannes Köster ◽

Nils Kriege ◽

...

Keyword(s):

Protein Complex ◽

Large Scale ◽

Protein Complexes ◽

Similarity Measures ◽

Interaction Networks ◽

Protein Protein Interaction ◽

Protein Complex Prediction ◽

Protein Protein Interaction Networks ◽

Human Integrin ◽

Good Agreement

Being able to quantify the similarity between two protein complexes is essential for numerous applications. Prominent examples are database searches for known complexes with a given query complex, comparison of the output of different protein complex prediction algorithms, or summarizing and clustering protein complexes, e.g., for visualization. While the corresponding problems have received much attention on single proteins and protein families, the question about how to model and compute similarity between protein complexes has not yet been systematically studied. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. Because the connectivity (graph topology) of real complexes is often unknown and hard to obtain experimentally, we use both known protein-protein interaction networks and known interdependencies (constraints) between interactions to simulate more realistic complexes than from interaction networks alone. We empirically show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be much more efficiently computed. It can therefore be used in large-scale studies and simulations and serve as a basis for further refinements of modeling protein complex similarity.

Download Full-text

Increasing the precision of orthology-based complex prediction through network alignment

10.7287/peerj.preprints.280v1 ◽

2014 ◽

Author(s):

Roland Pache ◽

Patrick Aloy

Keyword(s):

Protein Interactions ◽

In Silico ◽

Large Scale ◽

Current Knowledge ◽

Protein Complexes ◽

Functional Divergence ◽

Network Alignment ◽

Model Organisms ◽

Macromolecular Assemblies ◽

Complex Components

Macromolecular assemblies play an important role in almost all cellular processes. However, despite several large-scale studies, our current knowledge about protein complexes is still quite limited, thus advocating the use of in silico predictions to gather information on complex composition in model organisms. Since protein-protein interactions present certain constraints on the functional divergence of macromolecular assemblies during evolution, it is possible to predict complexes based on orthology data. Here, we show that incorporating interaction information through network alignment significantly increases the precision of orthology-based complex prediction. Moreover, we performed a large-scale in silico screen for protein complexes in human, yeast and fly, through the alignment of hundreds of known complexes to whole organism interactomes. Systematic comparison of the resulting network alignments to all complexes currently known in those species revealed many conserved complexes, as well as several novel complex components. In addition to validating our predictions using orthogonal data, we were able to assign specific functional roles to the predicted complexes. In several cases, the incorporation of interaction data through network alignment allowed to distinguish real complex components from other orthologous proteins. Our analyses indicate that current knowledge of yeast protein complexes exceeds that in other organisms and that predicting complexes in fly based on human and yeast data is complementary rather than redundant. Lastly, assessing the conservation of protein complexes of the human pathogen Mycoplasma pneumoniae, we discovered that its complexes repertoire is different from that of eukaryotes, suggesting new points of therapeutic intervention, whereas targeting the pathogen’s Restriction enzyme complex might lead to adverse effects due to its similarity to ATP-dependent metalloproteases in the human host.

Download Full-text

Protein complex similarity based on Weisfeiler-Lehman labeling

10.7287/peerj.preprints.26612v1 ◽

2018 ◽

Author(s):

Bianca K Stöcker ◽

Till Schäfer ◽

Petra Mutzel ◽

Johannes Köster ◽

Nils Kriege ◽

...

Keyword(s):

Protein Complex ◽

Large Scale ◽

Protein Complexes ◽

Similarity Measures ◽

Interaction Networks ◽

Protein Protein Interaction ◽

Protein Complex Prediction ◽

Protein Protein Interaction Networks ◽

Human Integrin ◽

Good Agreement

Being able to quantify the similarity between two protein complexes is essential for numerous applications. Prominent examples are database searches for known complexes with a given query complex, comparison of the output of different protein complex prediction algorithms, or summarizing and clustering protein complexes, e.g., for visualization. While the corresponding problems have received much attention on single proteins and protein families, the question about how to model and compute similarity between protein complexes has not yet been systematically studied. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. Because the connectivity (graph topology) of real complexes is often unknown and hard to obtain experimentally, we use both known protein-protein interaction networks and known interdependencies (constraints) between interactions to simulate more realistic complexes than from interaction networks alone. We empirically show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be much more efficiently computed. It can therefore be used in large-scale studies and simulations and serve as a basis for further refinements of modeling protein complex similarity.

Download Full-text

RoDiCE: robust differential protein co-expression analysis for cancer complexome

Bioinformatics ◽

10.1093/bioinformatics/btab612 ◽

2021 ◽

Author(s):

Yusuke Matsui ◽

Yuichi Abe ◽

Kohei Uno ◽

Satoru Miyano

Keyword(s):

Protein Complex ◽

Drug Targets ◽

Large Scale ◽

Protein Complexes ◽

Specific Protein ◽

Identification Accuracy ◽

Supplementary Information ◽

Full Spectrum ◽

Differential Protein ◽

Proteomic Data

Abstract Motivation The full spectrum of abnormalities in cancer-associated protein complexes remains largely unknown. Comparing the co-expression structure of each protein complex between tumor and healthy cells may provide insights regarding cancer-specific protein dysfunction. However, the technical limitations of mass spectrometry-based proteomics, including contamination with biological protein variants, causes noise that leads to non-negligible over- (or under-) estimating co-expression. Results We propose a robust algorithm for identifying protein complex aberrations in cancer based on differential protein co-expression testing. Our method based on a copula is sufficient for improving identification accuracy with noisy data compared to conventional linear correlation-based approaches. As an application, we use large-scale proteomic data from renal cancer to show that important protein complexes, regulatory signaling pathways and drug targets can be identified. The proposed approach surpasses traditional linear correlations to provide insights into higher-order differential co-expression structures. Availability and implementation https://github.com/ymatts/RoDiCE. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The Lair: A resource for exploratory analysis of published RNA-Seq data

10.1101/056200 ◽

2016 ◽

Author(s):

Harold Pimentel ◽

Pascal Sturmfels ◽

Nicolas Bray ◽

Páll Melsted ◽

Lior Pachter

Keyword(s):

Large Scale ◽

Exploratory Analysis ◽

Technical Expertise ◽

Rna Seq ◽

Sequencing Data ◽

Short Read ◽

Link Type ◽

Short Read Archive ◽

Published Research

AbstractIncreased emphasis on reproducibility of published research in the last few years has led to the large-scale archiving of sequencing data. While this data can, in theory, be used to reproduce results in papers, it is typically not easily usable in practice. We introduce a series of tools for processing and analyzing RNA-Seq data in the Short Read Archive, that together have allowed us to build an easily extendable resource for analysis of data underlying published papers. Our system makes the exploration of data easily accessible and usable without technical expertise. Our database and associated tools can be accessed at The Lair: http://pachterlab.github.io/lair

Download Full-text

A Co-Fractionation Mass Spectrometry-based Prediction of Protein Complex Assemblies in the Developing Rice Aleurone-subaleurone

10.1101/2021.06.16.448567 ◽

2021 ◽

Author(s):

Youngwoo Lee ◽

Thomas W Okita ◽

Daniel B Szymanski

Keyword(s):

Mass Spectrometry ◽

Protein Complex ◽

Large Scale ◽

Rna Binding ◽

Seed Quality ◽

Protein Complexes ◽

Early Stage ◽

Practical Significance ◽

Rice Seed ◽

Complex Composition

Multiprotein complexes execute and coordinate diverse cellular processes such as organelle biogenesis, vesicle trafficking, cell signaling, and metabolism. Knowledge about their composition and localization provides useful clues about the mechanisms of cellular homeostasis and systems-level control. This is of great biological importance and practical significance in heterotrophic rice endosperm and aleurone-subaleurone tissues that are a primary source of seed vitamins and stored energy. Dozens of protein complexes have been implicated in the synthesis, transport, and storage of seed proteins, lipids, vitamins, and minerals. Mutations in protein complexes that control RNA transport result in aberrant endosperm with shrunken and floury phenotypes, significantly reducing seed yield and quality. The purpose of this research is to broadly predict protein complex composition in the aleurone-subaleurone layers of developing rice seeds using co-fractionation mass spectrometry. Following orthogonal chromatographic separations of biological replicates, thousands of protein elution profiles were subjected to distance-based clustering to enable a large-scale determination of multimerization state and complex composition. Predictions included evolutionarily conserved proteins across diverse functional categories, including novel heteromeric RNA binding protein complexes that influence seed quality. This effective and open-ended proteomics pipeline provides useful clues about systems-level controls in the early stage of rice seed development.

Download Full-text

Increasing the precision of orthology-based complex prediction through network alignment

10.7287/peerj.preprints.280v2 ◽

2014 ◽

Author(s):

Roland Pache ◽

Patrick Aloy

Keyword(s):

Protein Interactions ◽

In Silico ◽

Large Scale ◽

Current Knowledge ◽

Protein Complexes ◽

Functional Divergence ◽

Network Alignment ◽

Model Organisms ◽

Macromolecular Assemblies ◽

Complex Components

Macromolecular assemblies play an important role in almost all cellular processes. However, despite several large-scale studies, our current knowledge about protein complexes is still quite limited, thus advocating the use of in silico predictions to gather information on complex composition in model organisms. Since protein-protein interactions present certain constraints on the functional divergence of macromolecular assemblies during evolution, it is possible to predict complexes based on orthology data. Here, we show that incorporating interaction information through network alignment significantly increases the precision of orthology-based complex prediction. Moreover, we performed a large-scale in silico screen for protein complexes in human, yeast and fly, through the alignment of hundreds of known complexes to whole organism interactomes. Systematic comparison of the resulting network alignments to all complexes currently known in those species revealed many conserved complexes, as well as several novel complex components. In addition to validating our predictions using orthogonal data, we were able to assign specific functional roles to the predicted complexes. In several cases, the incorporation of interaction data through network alignment allowed to distinguish real complex components from other orthologous proteins. Our analyses indicate that current knowledge of yeast protein complexes exceeds that in other organisms and that predicting complexes in fly based on human and yeast data is complementary rather than redundant. Lastly, assessing the conservation of protein complexes of the human pathogen Mycoplasma pneumoniae, we discovered that its complexes repertoire is different from that of eukaryotes, suggesting new points of therapeutic intervention, whereas targeting the pathogen’s Restriction enzyme complex might lead to adverse effects due to its similarity to ATP-dependent metalloproteases in the human host.

Download Full-text