scholarly journals ComplexBrowser: a tool for identification and quantification of protein complexes in large scale proteomics datasets

2019 ◽  
Author(s):  
Wojciech Michalak ◽  
Vasileios Tsiamis ◽  
Veit Schwämmle ◽  
Adelina Rogowska-Wrzesińska

AbstractWe have developed ComplexBrowser, an open source, online platform for supervised analysis of quantitative proteomics data that focuses on protein complexes. The software uses information from CORUM and Complex Portal databases to identify protein complex components. Based on the expression changes of individual complex subunits across the proteomics experiment it calculates Complex Fold Change (CFC) factor that characterises the overall protein complex expression trend and the level of subunit co-regulation. Thus up- and down-regulated complexes can be identified. It provides interactive visualisation of protein complexes composition and expression for exploratory analysis. It also incorporates a quality control step that includes normalisation and statistical analysis based on Limma test. ComplexBrowser performance was tested on two previously published proteomics studies identifying changes in protein expression in human adenocarcinoma tissue and during activation of mouse T-cells. The analysis revealed 1519 and 332 protein complexes, of which 233 and 41 were found co-ordinately regulated in the respective studies. The adopted approach provided evidence for a shift to glucose-based metabolism and high proliferation in adenocarcinoma tissues and identification of chromatin remodelling complexes involved in mouse T-cell activation. The results correlate with the original interpretation of the experiments and also provide novel biological details about protein complexes affected. ComplexBrowser is, to our knowledge, the first tool to automate quantitative protein complex analysis for high-throughput studies, providing insights into protein complex regulation within minutes of analysis.A fully functional demo version of ComplexBrowser v1.0 is available online via http://computproteomics.bmb.sdu.dk/Apps/ComplexBrowser/The source code can be downloaded from: https://bitbucket.org/michalakw/complexbrowserHighlightsAutomated analysis of protein complexes in proteomics experimentsQuantitative measure of the coordinated changes in protein complex componentsInteractive visualisations for exploratory analysis of proteomics resultsIn briefComplexBrowser is capable of identifying protein complexes in datasets obtained from large scale quantitative proteomics experiments. It provides, in the form of the CFC factor, a quantitative measure of the coordinated changes in complex components. This facilitates assessing the overall trends in the processes governed by the identified protein complexes providing a new and complementary way of interpreting proteomics experiments.

2017 ◽  
Author(s):  
Caroline Ross ◽  
Bilal Nizami ◽  
Michael Glenister ◽  
Olivier Sheik Amamuddy ◽  
Ali Rana Atilgan ◽  
...  

AbstractSummaryMODE-TASK, a novel software suite, comprises Principle Component Analysis, Multidimensional Scaling, and t-Distributed Stochastic Neighbor Embedding techniques using molecular dynamics trajectories. MODE-TASK also includes a Normal Mode Analysis tool based on Anisotropic Network Model so as to provide a variety of ways to analyse and compare large-scale motions of protein complexes for which long MD simulations are prohibitive.Availability and ImplementationMODE-TASK has been open-sourced, and is available for download from https://github.com/RUBi-ZA/MODE-TASK, implemented in Python and C++.Supplementary informationDocumentation available at http://mode-task.readthedocs.io.


Biomolecules ◽  
2020 ◽  
Vol 10 (7) ◽  
pp. 1056 ◽  
Author(s):  
Kalyani Dhusia ◽  
Zhaoqian Su ◽  
Yinghao Wu

The formation of functionally versatile protein complexes underlies almost every biological process. The estimation of how fast these complexes can be formed has broad implications for unravelling the mechanism of biomolecular recognition. This kinetic property is traditionally quantified by association rates, which can be measured through various experimental techniques. To complement these time-consuming and labor-intensive approaches, we developed a coarse-grained simulation approach to study the physical processes of protein–protein association. We systematically calibrated our simulation method against a large-scale benchmark set. By combining a physics-based force field with a statistically-derived potential in the simulation, we found that the association rates of more than 80% of protein complexes can be correctly predicted within one order of magnitude relative to their experimental measurements. We further showed that a mixture of force fields derived from complementary sources was able to describe the process of protein–protein association with mechanistic details. For instance, we show that association of a protein complex contains multiple steps in which proteins continuously search their local binding orientations and form non-native-like intermediates through repeated dissociation and re-association. Moreover, with an ensemble of loosely bound encounter complexes observed around their native conformation, we suggest that the transition states of protein–protein association could be highly diverse on the structural level. Our study also supports the idea in which the association of a protein complex is driven by a “funnel-like” energy landscape. In summary, these results shed light on our understanding of how protein–protein recognition is kinetically modulated, and our coarse-grained simulation approach can serve as a useful addition to the existing experimental approaches that measure protein–protein association rates.


2021 ◽  
Author(s):  
Varun S. Sharma ◽  
Andrea Fossati ◽  
Rodolfo Ciuffa ◽  
Marija Buljan ◽  
Evan G. Williams ◽  
...  

SummaryIt is a general assumption of molecular biology that the ensemble of expressed molecules, their activities and interactions determine biological processes, cellular states and phenotypes. Quantitative abundance of transcripts, proteins and metabolites are now routinely measured with considerable depth via an array of “OMICS” technologies, and recently a number of methods have also been introduced for the parallel analysis of the abundance, subunit composition and cell state specific changes of protein complexes. In comparison to the measurement of the molecular entities in a cell, the determination of their function remains experimentally challenging and labor-intensive. This holds particularly true for determining the function of protein complexes, which constitute the core functional assemblies of the cell. Therefore, the tremendous progress in multi-layer molecular profiling has been slow to translate into increased functional understanding of biological processes, cellular states and phenotypes. In this study we describe PCfun, a computational framework for the systematic annotation of protein complex function using Gene Ontology (GO) terms. This work is built upon the use of word embedding— natural language text embedded into continuous vector space that preserves semantic relationships— generated from the machine reading of 1 million open access PubMed Central articles. PCfun leverages the embedding for rapid annotation of protein complex function by integrating two approaches: (1) an unsupervised approach that obtains the nearest neighbor (NN) GO term word vectors for a protein complex query vector, and (2) a supervised approach using Random Forest (RF) models trained specifically for recovering the GO terms of protein complex queries described in the CORUM protein complex database. PCfun consolidates both approaches by performing the statistical test for the enrichment of the top NN GO terms within the child terms of the predicted GO terms by RF models. Thus, PCfun amalgamates information learned from the gold-standard protein-complex database, CORUM, with the unbiased predictions obtained directly from the word embedding, thereby enabling PCfun to identify the potential functions of putative protein complexes. The documentation and examples of the PCfun package are available at https://github.com/sharmavaruns/PCfun. We anticipate that PCfun will serve as a useful tool and novel paradigm for the large-scale characterization of protein complex function.


2018 ◽  
Author(s):  
Bianca K Stöcker ◽  
Till Schäfer ◽  
Petra Mutzel ◽  
Johannes Köster ◽  
Nils Kriege ◽  
...  

Being able to quantify the similarity between two protein complexes is essential for numerous applications. Prominent examples are database searches for known complexes with a given query complex, comparison of the output of different protein complex prediction algorithms, or summarizing and clustering protein complexes, e.g., for visualization. While the corresponding problems have received much attention on single proteins and protein families, the question about how to model and compute similarity between protein complexes has not yet been systematically studied. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. Because the connectivity (graph topology) of real complexes is often unknown and hard to obtain experimentally, we use both known protein-protein interaction networks and known interdependencies (constraints) between interactions to simulate more realistic complexes than from interaction networks alone. We empirically show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be much more efficiently computed. It can therefore be used in large-scale studies and simulations and serve as a basis for further refinements of modeling protein complex similarity.


2014 ◽  
Author(s):  
Roland Pache ◽  
Patrick Aloy

Macromolecular assemblies play an important role in almost all cellular processes. However, despite several large-scale studies, our current knowledge about protein complexes is still quite limited, thus advocating the use of in silico predictions to gather information on complex composition in model organisms. Since protein-protein interactions present certain constraints on the functional divergence of macromolecular assemblies during evolution, it is possible to predict complexes based on orthology data. Here, we show that incorporating interaction information through network alignment significantly increases the precision of orthology-based complex prediction. Moreover, we performed a large-scale in silico screen for protein complexes in human, yeast and fly, through the alignment of hundreds of known complexes to whole organism interactomes. Systematic comparison of the resulting network alignments to all complexes currently known in those species revealed many conserved complexes, as well as several novel complex components. In addition to validating our predictions using orthogonal data, we were able to assign specific functional roles to the predicted complexes. In several cases, the incorporation of interaction data through network alignment allowed to distinguish real complex components from other orthologous proteins. Our analyses indicate that current knowledge of yeast protein complexes exceeds that in other organisms and that predicting complexes in fly based on human and yeast data is complementary rather than redundant. Lastly, assessing the conservation of protein complexes of the human pathogen Mycoplasma pneumoniae, we discovered that its complexes repertoire is different from that of eukaryotes, suggesting new points of therapeutic intervention, whereas targeting the pathogen’s Restriction enzyme complex might lead to adverse effects due to its similarity to ATP-dependent metalloproteases in the human host.


2018 ◽  
Author(s):  
Bianca K Stöcker ◽  
Till Schäfer ◽  
Petra Mutzel ◽  
Johannes Köster ◽  
Nils Kriege ◽  
...  

Being able to quantify the similarity between two protein complexes is essential for numerous applications. Prominent examples are database searches for known complexes with a given query complex, comparison of the output of different protein complex prediction algorithms, or summarizing and clustering protein complexes, e.g., for visualization. While the corresponding problems have received much attention on single proteins and protein families, the question about how to model and compute similarity between protein complexes has not yet been systematically studied. Because protein complexes can be naturally modeled as graphs, in principle general graph similarity measures may be used, but these are often computationally hard to obtain and do not take typical properties of protein complexes into account. Here we propose a parametric family of similarity measures based on Weisfeiler-Lehman labeling. We evaluate it on simulated complexes of the extended human integrin adhesome network. Because the connectivity (graph topology) of real complexes is often unknown and hard to obtain experimentally, we use both known protein-protein interaction networks and known interdependencies (constraints) between interactions to simulate more realistic complexes than from interaction networks alone. We empirically show that the defined family of similarity measures is in good agreement with edit similarity, a similarity measure derived from graph edit distance, but can be much more efficiently computed. It can therefore be used in large-scale studies and simulations and serve as a basis for further refinements of modeling protein complex similarity.


Author(s):  
Yusuke Matsui ◽  
Yuichi Abe ◽  
Kohei Uno ◽  
Satoru Miyano

Abstract Motivation The full spectrum of abnormalities in cancer-associated protein complexes remains largely unknown. Comparing the co-expression structure of each protein complex between tumor and healthy cells may provide insights regarding cancer-specific protein dysfunction. However, the technical limitations of mass spectrometry-based proteomics, including contamination with biological protein variants, causes noise that leads to non-negligible over- (or under-) estimating co-expression. Results We propose a robust algorithm for identifying protein complex aberrations in cancer based on differential protein co-expression testing. Our method based on a copula is sufficient for improving identification accuracy with noisy data compared to conventional linear correlation-based approaches. As an application, we use large-scale proteomic data from renal cancer to show that important protein complexes, regulatory signaling pathways and drug targets can be identified. The proposed approach surpasses traditional linear correlations to provide insights into higher-order differential co-expression structures. Availability and implementation https://github.com/ymatts/RoDiCE. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Harold Pimentel ◽  
Pascal Sturmfels ◽  
Nicolas Bray ◽  
Páll Melsted ◽  
Lior Pachter

AbstractIncreased emphasis on reproducibility of published research in the last few years has led to the large-scale archiving of sequencing data. While this data can, in theory, be used to reproduce results in papers, it is typically not easily usable in practice. We introduce a series of tools for processing and analyzing RNA-Seq data in the Short Read Archive, that together have allowed us to build an easily extendable resource for analysis of data underlying published papers. Our system makes the exploration of data easily accessible and usable without technical expertise. Our database and associated tools can be accessed at The Lair: http://pachterlab.github.io/lair


2021 ◽  
Author(s):  
Youngwoo Lee ◽  
Thomas W Okita ◽  
Daniel B Szymanski

Multiprotein complexes execute and coordinate diverse cellular processes such as organelle biogenesis, vesicle trafficking, cell signaling, and metabolism. Knowledge about their composition and localization provides useful clues about the mechanisms of cellular homeostasis and systems-level control. This is of great biological importance and practical significance in heterotrophic rice endosperm and aleurone-subaleurone tissues that are a primary source of seed vitamins and stored energy. Dozens of protein complexes have been implicated in the synthesis, transport, and storage of seed proteins, lipids, vitamins, and minerals. Mutations in protein complexes that control RNA transport result in aberrant endosperm with shrunken and floury phenotypes, significantly reducing seed yield and quality. The purpose of this research is to broadly predict protein complex composition in the aleurone-subaleurone layers of developing rice seeds using co-fractionation mass spectrometry. Following orthogonal chromatographic separations of biological replicates, thousands of protein elution profiles were subjected to distance-based clustering to enable a large-scale determination of multimerization state and complex composition. Predictions included evolutionarily conserved proteins across diverse functional categories, including novel heteromeric RNA binding protein complexes that influence seed quality. This effective and open-ended proteomics pipeline provides useful clues about systems-level controls in the early stage of rice seed development.


2014 ◽  
Author(s):  
Roland Pache ◽  
Patrick Aloy

Macromolecular assemblies play an important role in almost all cellular processes. However, despite several large-scale studies, our current knowledge about protein complexes is still quite limited, thus advocating the use of in silico predictions to gather information on complex composition in model organisms. Since protein-protein interactions present certain constraints on the functional divergence of macromolecular assemblies during evolution, it is possible to predict complexes based on orthology data. Here, we show that incorporating interaction information through network alignment significantly increases the precision of orthology-based complex prediction. Moreover, we performed a large-scale in silico screen for protein complexes in human, yeast and fly, through the alignment of hundreds of known complexes to whole organism interactomes. Systematic comparison of the resulting network alignments to all complexes currently known in those species revealed many conserved complexes, as well as several novel complex components. In addition to validating our predictions using orthogonal data, we were able to assign specific functional roles to the predicted complexes. In several cases, the incorporation of interaction data through network alignment allowed to distinguish real complex components from other orthologous proteins. Our analyses indicate that current knowledge of yeast protein complexes exceeds that in other organisms and that predicting complexes in fly based on human and yeast data is complementary rather than redundant. Lastly, assessing the conservation of protein complexes of the human pathogen Mycoplasma pneumoniae, we discovered that its complexes repertoire is different from that of eukaryotes, suggesting new points of therapeutic intervention, whereas targeting the pathogen’s Restriction enzyme complex might lead to adverse effects due to its similarity to ATP-dependent metalloproteases in the human host.


Sign in / Sign up

Export Citation Format

Share Document