Disorder and interfaces in proteins are two sides of the same coin

Mapping Intimacies ◽

10.1101/863571 ◽

2019 ◽

Author(s):

Beatriz Seoane ◽

Alessandra Carbone

Keyword(s):

Large Scale ◽

Protein Complexes ◽

Data Bank ◽

Structural Disorder ◽

Homologous Sequences ◽

Large Scale Analysis ◽

Crystallographic Structures ◽

Two Sides ◽

Alternative Interfaces ◽

Disordered Regions

The importance of unstructured biology has quickly grown during the last decades accompanying the explosion of the number of experimentally resolved structures. The idea that structural disorder might be a novel mechanism of protein interaction is widespread in the literature, although the number of statistically significant structural studies supporting this idea is surprisingly low. In this work, through a large-scale-analysis of all the crystallographic structures of the Protein Data Bank averaged over clusters of homologous sequences, we show clear evidences that both the (experimentally verified) interaction interfaces and the disordered regions are involving roughly the same amino-acids of the protein. And beyond, disordered regions appear to carry information about the location of alternative interfaces when the protein lies within complexes, thus playing an important role in determining the order of assembly of protein complexes.

Download Full-text

The complexity of protein interactions unravelled from structural disorder

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008546 ◽

2021 ◽

Vol 17 (1) ◽

pp. e1008546

Author(s):

Beatriz Seoane ◽

Alessandra Carbone

Keyword(s):

Complex Formation ◽

Protein Interactions ◽

Large Scale ◽

Protein Structures ◽

Data Bank ◽

Structural Disorder ◽

Dna Binding Domains ◽

Alternative Structures ◽

Binding Domains ◽

Disordered Regions

The importance of unstructured biology has quickly grown during the last decades accompanying the explosion of the number of experimentally resolved protein structures. The idea that structural disorder might be a novel mechanism of protein interaction is widespread in the literature, although the number of statistically significant structural studies supporting this idea is surprisingly low. At variance with previous works, our conclusions rely exclusively on a large-scale analysis of all the 134337 X-ray crystallographic structures of the Protein Data Bank averaged over clusters of almost identical protein sequences. In this work, we explore the complexity of the organisation of all the interaction interfaces observed when a protein lies in alternative complexes, showing that interfaces progressively add up in a hierarchical way, which is reflected in a logarithmic law for the size of the union of the interface regions on the number of distinct interfaces. We further investigate the connection of this complexity with different measures of structural disorder: the standard missing residues and a new definition, called “soft disorder”, that covers all the flexible and structurally amorphous residues of a protein. We show evidences that both the interaction interfaces and the soft disordered regions tend to involve roughly the same amino-acids of the protein, and preliminary results suggesting that soft disorder spots those surface regions where new interfaces are progressively accommodated by complex formation. In fact, our results suggest that structurally disordered regions not only carry crucial information about the location of alternative interfaces within complexes, but also about the order of the assembly. We verify these hypotheses in several examples, such as the DNA binding domains of P53 and P73, the C3 exoenzyme, and two known biological orders of assembly. We finally compare our measures of structural disorder with several disorder bioinformatics predictors, showing that these latter are optimised to predict the residues that are missing in all the alternative structures of a protein and they are not able to catch the progressive evolution of the disordered regions upon complex formation. Yet, the predicted residues, when not missing, tend to be characterised as soft disordered regions.

Download Full-text

Scalable Extraction of Big Macromolecular Data in Azure Data Lake Environment

Molecules ◽

10.3390/molecules24010179 ◽

2019 ◽

Vol 24 (1) ◽

pp. 179 ◽

Cited By ~ 5

Author(s):

Dariusz Mrozek ◽

Tomasz Dąbek ◽

Bożena Małysiak-Mrozek

Keyword(s):

Nucleic Acids ◽

Large Scale ◽

Protein Complexes ◽

Data Bank ◽

Structural Features ◽

Significant Loss ◽

Processing Efficiency ◽

Acid Protein ◽

The Face ◽

Public Repositories

Calculation of structural features of proteins, nucleic acids, and nucleic acid-protein complexes on the basis of their geometries and studying various interactions within these macromolecules, for which high-resolution structures are stored in Protein Data Bank (PDB), require parsing and extraction of suitable data stored in text files. To perform these operations on large scale in the face of the growing amount of macromolecular data in public repositories, we propose to perform them in the distributed environment of Azure Data Lake and scale the calculations on the Cloud. In this paper, we present dedicated data extractors for PDB files that can be used in various types of calculations performed over protein and nucleic acids structures in the Azure Data Lake. Results of our tests show that the Cloud storage space occupied by the macromolecular data can be successfully reduced by using compression of PDB files without significant loss of data processing efficiency. Moreover, our experiments show that the performed calculations can be significantly accelerated when using large sequential files for storing macromolecular data and by parallelizing the calculations and data extractions that precede them. Finally, the paper shows how all the calculations can be performed in a declarative way in U-SQL scripts for Data Lake Analytics.

Download Full-text

Reliable identification of protein-protein interactions by crosslinking mass spectrometry

10.1101/2020.05.25.114256 ◽

2020 ◽

Author(s):

Swantje Lenz ◽

Ludwig R. Sinn ◽

Francis J. O’Reilly ◽

Lutz Fischer ◽

Fritz Wegner ◽

...

Keyword(s):

Mass Spectrometry ◽

Protein Interactions ◽

Large Scale ◽

Protein Complexes ◽

Estimation Procedure ◽

Scale Analysis ◽

Protein Protein Interactions ◽

False Discovery ◽

Large Scale Analysis ◽

False Discovery Rate Estimation

Crosslinking mass spectrometry is widening its scope from structural analyzes of purified multi-protein complexes towards systems-wide analyzes of protein-protein interactions. Assessing the error in these large datasets is currently a challenge. Using a controlled large-scale analysis of Escherichia coli cell lysate, we demonstrate a reliable false-discovery rate estimation procedure for protein-protein interactions identified by crosslinking mass spectrometry.

Download Full-text

Large-Scale Analysis of the Evolution of Functions Mediated by Intrinsically Disordered Regions

Biophysical Journal ◽

10.1016/j.bpj.2017.11.480 ◽

2018 ◽

Vol 114 (3) ◽

pp. 79a ◽

Cited By ~ 1

Author(s):

Mary O.G. Richardson ◽

Alex S. Holehouse ◽

Iris Langstein ◽

Philipp Korber ◽

Rohit V. Pappu

Keyword(s):

Large Scale ◽

Scale Analysis ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Evolution Of Functions ◽

Large Scale Analysis ◽

Disordered Regions

Download Full-text

Duplication accelerates the evolution of structural complexity in protein quaternary structure

10.1101/2020.04.22.054783 ◽

2020 ◽

Author(s):

Alexander S. Leonard ◽

Sebastian E. Ahnert

Keyword(s):

Gene Duplication ◽

Lattice Model ◽

Self Assembly ◽

Large Scale ◽

Quaternary Structure ◽

Protein Complexes ◽

Structural Complexity ◽

Data Bank ◽

Coarse Grained ◽

Duplication Events

AbstractGene duplication, from single genes to whole genomes, has been observed in organisms across all taxa. Despite its prevalence, the evolutionary benefits of this mechanism are the subject of ongoing debate. Gene duplication can significantly alter the self-assembly of protein quaternary structures, impacting the dosage or interaction proclivity. Here we use a lattice model of self-assembly as a coarse-grained representation of protein complex assembly, and show that it can be used to examine potential evolutionary advantages of duplication. Duplication provides a unique mechanism for increasing the evolvability of protein complexes by enabling the transformation of symmetric homomeric interactions into heteromeric ones. This transformation is extensively observed in in silico evolutionary simulations of the lattice model, with duplication events significantly accelerating the rate at which structural complexity increases. These coarse-grained simulation results are corroborated with a large-scale analysis of complexes from the Protein Data Bank.

Download Full-text

A preliminary study on the cistrome of human postsynaptic density from an evolutionary and network-based perspective

10.1101/2021.01.25.428072 ◽

2021 ◽

Author(s):

Zsofia E. Kalman ◽

Zoltán Gáspári

Keyword(s):

Transcription Factors ◽

Binding Sites ◽

Large Scale ◽

Protein Complexes ◽

Postsynaptic Density ◽

Large Scale Analysis ◽

Protein Complex Formation ◽

Level Information ◽

Preliminary Study ◽

Using Data

AbstractThe postsynaptic density (PSD) is a neuronal organelle that consists of thousands of protein complexes, having a role in signal transduction. The emergence of the complexes is dependent on the presence of proteins provided by gene expression. In this research we used Chip-seq data supported by protein level information. We developed a pipeline using data from five neuronal transcription factors, which reduces the false-positive hits of identified binding sites. In addition we found correlation between co-regulation and protein complex formation. The developed method paves the way for a future for large scale analysis utilizing a more comprehensive set of transcription factors.

Download Full-text

Reliable identification of protein-protein interactions by crosslinking mass spectrometry

Nature Communications ◽

10.1038/s41467-021-23666-z ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Swantje Lenz ◽

Ludwig R. Sinn ◽

Francis J. O’Reilly ◽

Lutz Fischer ◽

Fritz Wegner ◽

...

Keyword(s):

Mass Spectrometry ◽

Protein Interactions ◽

Large Scale ◽

Structural Information ◽

Protein Complexes ◽

Interaction Network ◽

Protein Protein Interactions ◽

False Discovery Rates ◽

Large Scale Analysis ◽

Exit Tunnel

AbstractProtein-protein interactions govern most cellular pathways and processes, and multiple technologies have emerged to systematically map them. Assessing the error of interaction networks has been a challenge. Crosslinking mass spectrometry is currently widening its scope from structural analyses of purified multi-protein complexes towards systems-wide analyses of protein-protein interactions (PPIs). Using a carefully controlled large-scale analysis of Escherichia coli cell lysate, we demonstrate that false-discovery rates (FDR) for PPIs identified by crosslinking mass spectrometry can be reliably estimated. We present an interaction network comprising 590 PPIs at 1% decoy-based PPI-FDR. The structural information included in this network localises the binding site of the hitherto uncharacterised protein YacL to near the DNA exit tunnel on the RNA polymerase.

Download Full-text