scholarly journals Disorder and interfaces in proteins are two sides of the same coin

2019 ◽  
Author(s):  
Beatriz Seoane ◽  
Alessandra Carbone

The importance of unstructured biology has quickly grown during the last decades accompanying the explosion of the number of experimentally resolved structures. The idea that structural disorder might be a novel mechanism of protein interaction is widespread in the literature, although the number of statistically significant structural studies supporting this idea is surprisingly low. In this work, through a large-scale-analysis of all the crystallographic structures of the Protein Data Bank averaged over clusters of homologous sequences, we show clear evidences that both the (experimentally verified) interaction interfaces and the disordered regions are involving roughly the same amino-acids of the protein. And beyond, disordered regions appear to carry information about the location of alternative interfaces when the protein lies within complexes, thus playing an important role in determining the order of assembly of protein complexes.

2021 ◽  
Vol 17 (1) ◽  
pp. e1008546
Author(s):  
Beatriz Seoane ◽  
Alessandra Carbone

The importance of unstructured biology has quickly grown during the last decades accompanying the explosion of the number of experimentally resolved protein structures. The idea that structural disorder might be a novel mechanism of protein interaction is widespread in the literature, although the number of statistically significant structural studies supporting this idea is surprisingly low. At variance with previous works, our conclusions rely exclusively on a large-scale analysis of all the 134337 X-ray crystallographic structures of the Protein Data Bank averaged over clusters of almost identical protein sequences. In this work, we explore the complexity of the organisation of all the interaction interfaces observed when a protein lies in alternative complexes, showing that interfaces progressively add up in a hierarchical way, which is reflected in a logarithmic law for the size of the union of the interface regions on the number of distinct interfaces. We further investigate the connection of this complexity with different measures of structural disorder: the standard missing residues and a new definition, called “soft disorder”, that covers all the flexible and structurally amorphous residues of a protein. We show evidences that both the interaction interfaces and the soft disordered regions tend to involve roughly the same amino-acids of the protein, and preliminary results suggesting that soft disorder spots those surface regions where new interfaces are progressively accommodated by complex formation. In fact, our results suggest that structurally disordered regions not only carry crucial information about the location of alternative interfaces within complexes, but also about the order of the assembly. We verify these hypotheses in several examples, such as the DNA binding domains of P53 and P73, the C3 exoenzyme, and two known biological orders of assembly. We finally compare our measures of structural disorder with several disorder bioinformatics predictors, showing that these latter are optimised to predict the residues that are missing in all the alternative structures of a protein and they are not able to catch the progressive evolution of the disordered regions upon complex formation. Yet, the predicted residues, when not missing, tend to be characterised as soft disordered regions.


Molecules ◽  
2019 ◽  
Vol 24 (1) ◽  
pp. 179 ◽  
Author(s):  
Dariusz Mrozek ◽  
Tomasz Dąbek ◽  
Bożena Małysiak-Mrozek

Calculation of structural features of proteins, nucleic acids, and nucleic acid-protein complexes on the basis of their geometries and studying various interactions within these macromolecules, for which high-resolution structures are stored in Protein Data Bank (PDB), require parsing and extraction of suitable data stored in text files. To perform these operations on large scale in the face of the growing amount of macromolecular data in public repositories, we propose to perform them in the distributed environment of Azure Data Lake and scale the calculations on the Cloud. In this paper, we present dedicated data extractors for PDB files that can be used in various types of calculations performed over protein and nucleic acids structures in the Azure Data Lake. Results of our tests show that the Cloud storage space occupied by the macromolecular data can be successfully reduced by using compression of PDB files without significant loss of data processing efficiency. Moreover, our experiments show that the performed calculations can be significantly accelerated when using large sequential files for storing macromolecular data and by parallelizing the calculations and data extractions that precede them. Finally, the paper shows how all the calculations can be performed in a declarative way in U-SQL scripts for Data Lake Analytics.


2020 ◽  
Author(s):  
Swantje Lenz ◽  
Ludwig R. Sinn ◽  
Francis J. O’Reilly ◽  
Lutz Fischer ◽  
Fritz Wegner ◽  
...  

Crosslinking mass spectrometry is widening its scope from structural analyzes of purified multi-protein complexes towards systems-wide analyzes of protein-protein interactions. Assessing the error in these large datasets is currently a challenge. Using a controlled large-scale analysis of Escherichia coli cell lysate, we demonstrate a reliable false-discovery rate estimation procedure for protein-protein interactions identified by crosslinking mass spectrometry.


2018 ◽  
Vol 114 (3) ◽  
pp. 79a ◽  
Author(s):  
Mary O.G. Richardson ◽  
Alex S. Holehouse ◽  
Iris Langstein ◽  
Philipp Korber ◽  
Rohit V. Pappu

2020 ◽  
Author(s):  
Alexander S. Leonard ◽  
Sebastian E. Ahnert

AbstractGene duplication, from single genes to whole genomes, has been observed in organisms across all taxa. Despite its prevalence, the evolutionary benefits of this mechanism are the subject of ongoing debate. Gene duplication can significantly alter the self-assembly of protein quaternary structures, impacting the dosage or interaction proclivity. Here we use a lattice model of self-assembly as a coarse-grained representation of protein complex assembly, and show that it can be used to examine potential evolutionary advantages of duplication. Duplication provides a unique mechanism for increasing the evolvability of protein complexes by enabling the transformation of symmetric homomeric interactions into heteromeric ones. This transformation is extensively observed in in silico evolutionary simulations of the lattice model, with duplication events significantly accelerating the rate at which structural complexity increases. These coarse-grained simulation results are corroborated with a large-scale analysis of complexes from the Protein Data Bank.


2021 ◽  
Author(s):  
Zsofia E. Kalman ◽  
Zoltán Gáspári

AbstractThe postsynaptic density (PSD) is a neuronal organelle that consists of thousands of protein complexes, having a role in signal transduction. The emergence of the complexes is dependent on the presence of proteins provided by gene expression. In this research we used Chip-seq data supported by protein level information. We developed a pipeline using data from five neuronal transcription factors, which reduces the false-positive hits of identified binding sites. In addition we found correlation between co-regulation and protein complex formation. The developed method paves the way for a future for large scale analysis utilizing a more comprehensive set of transcription factors.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Swantje Lenz ◽  
Ludwig R. Sinn ◽  
Francis J. O’Reilly ◽  
Lutz Fischer ◽  
Fritz Wegner ◽  
...  

AbstractProtein-protein interactions govern most cellular pathways and processes, and multiple technologies have emerged to systematically map them. Assessing the error of interaction networks has been a challenge. Crosslinking mass spectrometry is currently widening its scope from structural analyses of purified multi-protein complexes towards systems-wide analyses of protein-protein interactions (PPIs). Using a carefully controlled large-scale analysis of Escherichia coli cell lysate, we demonstrate that false-discovery rates (FDR) for PPIs identified by crosslinking mass spectrometry can be reliably estimated. We present an interaction network comprising 590 PPIs at 1% decoy-based PPI-FDR. The structural information included in this network localises the binding site of the hitherto uncharacterised protein YacL to near the DNA exit tunnel on the RNA polymerase.


Sign in / Sign up

Export Citation Format

Share Document