Direct Coupling Analysis of Epistasis in Allosteric Materials

Mapping Intimacies ◽

10.1101/519116 ◽

2019 ◽

Cited By ~ 1

Author(s):

Barbara Bravi ◽

Riccardo Ravasio ◽

Carolina Brito ◽

Matthieu Wyart

Keyword(s):

Drug Design ◽

Long Range ◽

Active Site ◽

De Novo ◽

Sequence Data ◽

Direct Coupling ◽

Coupling Analysis ◽

Promising Tool ◽

Direct Coupling Analysis ◽

The Impact

AbstractIn allosteric proteins, the binding of a ligand modifies function at a distant active site. Such al-losteric pathways can be used as target for drug design, generating considerable interest in inferring them from sequence alignment data. Currently, different methods lead to conflicting results, in particular on the existence of long-range evolutionary couplings between distant amino-acids mediating allostery. Here we propose a resolution of this conundrum, by studying epistasis and its inference in models where an allosteric material is evolved in silico to perform a mechanical task. We find four types of epistasis (Synergistic, Sign, Antagonistic, Saturation), which can be both short or long-range and have a simple mechanical interpretation. We perform a Direct Coupling Analysis (DCA) and find that DCA predicts well mutation costs but is a rather poor generative model. Strikingly, it can predict short-range epistasis but fails to capture long-range epistasis, in agreement with empirical findings. We propose that such failure is generic when function requires subparts to work in concert. We illustrate this idea with a simple model, which suggests that other methods may be better suited to capture long-range effects.Author summaryAllostery in proteins is the property of highly specific responses to ligand binding at a distant site. To inform protocols of de novo drug design, it is fundamental to understand the impact of mutations on allosteric regulation and whether it can be predicted from evolutionary correlations. In this work we consider allosteric architectures artificially evolved to optimize the cooperativity of binding at allosteric and active site. We first characterize the emergent pattern of epistasis as well as the underlying mechanical phenomena, finding four types of epistasis (Synergistic, Sign, Antagonistic, Saturation), which can be both short or long-range. The numerical evolution of these allosteric architectures allows us to benchmark Direct Coupling Analysis, a method which relies on co-evolution in sequence data to infer direct evolutionary couplings, in connection to allostery. We show that Direct Coupling Analysis predicts quantitatively mutation costs but underestimates strong long-range epistasis. We provide an argument, based on a simplified model, illustrating the reasons for this discrepancy and we propose neural networks as more promising tool to measure epistasis.

Download Full-text

FilterDCA: interpretable supervised contact prediction using inter-domain coevolution

10.1101/2019.12.24.887877 ◽

2019 ◽

Cited By ~ 1

Author(s):

Maureen Muscat ◽

Giancarlo Croce ◽

Edoardo Sarti ◽

Martin Weigt

Keyword(s):

Deep Learning ◽

De Novo ◽

Protein Complexes ◽

Protein Structures ◽

Direct Coupling ◽

Sequence Information ◽

Coupling Analysis ◽

Contact Patterns ◽

Direct Coupling Analysis ◽

Training Sets

AbstractPredicting three-dimensional protein structure and assembling protein complexes using sequence information belongs to the most prominent tasks in computational biology. Recently substantial progress has been obtained in the case of single proteins using a combination of unsupervised coevolutionary sequence analysis with structurally supervised deep learning. While reaching impressive accuracies in predicting residue-residue contacts, deep learning has a number of disadvantages. The need for large structural training sets limits the applicability to multi-protein complexes; and their deep architecture makes the interpretability of the convolutional neural networks intrinsically hard. Here we introduce FilterDCA, a simpler supervised predictor for inter-domain and inter-protein contacts. It is based on the fact that contact maps of proteins show typical contact patterns, which results from secondary structure and are reflected by patterns in coevolutionary analysis. We explicitly integrate averaged contacts patterns with coevolutionary scores derived by Direct Coupling Analysis, reaching results comparable to more complex deep-learning approaches, while remaining fully transparent and interpretable. The FilterDCA code is available at http://gitlab.lcqb.upmc.fr/muscat/FilterDCA.Author summaryThe de novo prediction of tertiary and quaternary protein structures has recently seen important advances, by combining unsupervised, purely sequence-based coevolutionary analyses with structure-based supervision using deep learning for contact-map prediction. While showing impressive performance, deep-learning methods require large training sets and pose severe obstacles for their interpretability. Here we construct a simple, transparent and therefore fully interpretable inter-domain contact predictor, which uses the results of coevolutionary Direct Coupling Analysis in combination with explicitly constructed filters reflecting typical contact patterns in a training set of known protein structures, and which improves the accuracy of predicted contacts significantly. Our approach thereby sheds light on the question how contact information is encoded in coevolutionary signals.

Download Full-text

Expanding Direct Coupling Analysis to Identify Heterodimeric Interfaces from Limited Protein Sequence Data

The Journal of Physical Chemistry B ◽

10.1021/acs.jpcb.1c07145 ◽

2021 ◽

Author(s):

Kareem M. Mehrabiani ◽

Ryan R. Cheng ◽

José N. Onuchic

Keyword(s):

Protein Sequence ◽

Sequence Data ◽

Direct Coupling ◽

Coupling Analysis ◽

Protein Sequence Data ◽

Direct Coupling Analysis ◽

Limited Protein

Download Full-text

pydca v1.0: a comprehensive software for direct coupling analysis of RNA and protein sequences

Bioinformatics ◽

10.1093/bioinformatics/btz892 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2264-2265 ◽

Cited By ~ 3

Author(s):

Mehari B Zerihun ◽

Fabrizio Pucci ◽

Emanuel K Peter ◽

Alexander Schug

Keyword(s):

Structure Prediction ◽

Sequence Data ◽

Direct Coupling ◽

Supplementary Information ◽

Spatial Proximity ◽

Homologous Proteins ◽

Coupling Analysis ◽

Multiple Sequence ◽

Wide Range ◽

Direct Coupling Analysis

Abstract Motivation The ongoing advances in sequencing technologies have provided a massive increase in the availability of sequence data. This made it possible to study the patterns of correlated substitution between residues in families of homologous proteins or RNAs and to retrieve structural and stability information. Direct coupling analysis (DCA) infers coevolutionary couplings between pairs of residues indicating their spatial proximity, making such information a valuable input for subsequent structure prediction. Results Here, we present pydca, a standalone Python-based software package for the DCA of protein- and RNA-homologous families. It is based on two popular inverse statistical approaches, namely, the mean-field and the pseudo-likelihood maximization and is equipped with a series of functionalities that range from multiple sequence alignment trimming to contact map visualization. Thanks to its efficient implementation, features and user-friendly command line interface, pydca is a modular and easy-to-use tool that can be used by researchers with a wide range of backgrounds. Availability and implementation pydca can be obtained from https://github.com/KIT-MBS/pydca or from the Python Package Index under the MIT License. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

pydca v1.0: a comprehensive software for Direct Coupling Analysis of RNA and Protein Sequences

10.1101/805523 ◽

2019 ◽

Cited By ~ 1

Author(s):

Mehari B. Zerihun ◽

Fabrizio Pucci ◽

Emanuel Karl Peter ◽

Alexander Schug

Keyword(s):

Structure Prediction ◽

Sequence Data ◽

Mean Field ◽

Direct Coupling ◽

Spatial Proximity ◽

Homologous Proteins ◽

Coupling Analysis ◽

Multiple Sequence ◽

Wide Range ◽

Direct Coupling Analysis

AbstractThe ongoing advances in sequencing technologies have provided a massive increase in the availability of sequence data. This made it possible to study the patterns of correlated substitution between residues in families of homologous proteins or RNAs and to retrieve structural and stability information. Direct coupling Analysis (DCA) infers coevolutionary couplings between pairs of residues indicating their spatial proximity, making such information a valuable input for subsequent structure prediction. Here we present pydca, a standalone Python-based software package for the DCA of protein- and RNA-homologous families. It is based on two popular inverse statistical approaches, namely, the mean-field and the pseudo-likelihood maximization and is equipped with a series of functionalities that range from multiple sequence alignment trimming to contact map visualization. Thanks to its efficient implementation, features and user-friendly command line interface, pydca is a modular and easy-to-use tool that can be used by researchers with a wide range of backgrounds.Availabilityhttps://github.com/KIT-MBS/pydca

Download Full-text

Protein structure without structure determination: direct coupling analysis based on in vitro evolution

10.1101/582056 ◽

2019 ◽

Author(s):

Marco Fantini ◽

Simonetta Lisi ◽

Paolo De Los Rios ◽

Antonino Cattaneo ◽

Annalisa Pastore

Keyword(s):

Structural Information ◽

Sequence Data ◽

Protein Structures ◽

Direct Coupling ◽

In Vitro Mutagenesis ◽

Evolutionary Analysis ◽

Protein Families ◽

Coupling Analysis ◽

Direct Coupling Analysis

AbstractDirect Coupling Analysis (DCA) is a powerful technique that enables to extract structural information of proteins belonging to large protein families exclusively by in silico analysis. This method is however limited by sequence availability and various biases. Here, we propose a method that exploits molecular evolution to circumvent these limitations: instead of relying on existing protein families, we used in vitro mutagenesis of TEM-1 beta lactamase combined with in vivo functional selection to generate the sequence data necessary for evolutionary analysis. We could reconstruct by this strategy, which we called CAMELS (CouplingAnalysis byMolecularEvolutionLibrarySequencing), the lactamase fold exclusively from sequence data. Through generating and sequencing large libraries of variants, we can deal with any protein, ancient or recent, from any species, having the only constraint of setting up a functional phenotypic selection of the protein. This method allows us to obtain protein structures without solving the structure experimentally.

Download Full-text

Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1607570113 ◽

2016 ◽

Vol 113 (43) ◽

pp. 12186-12191 ◽

Cited By ~ 71

Author(s):

Thomas Gueudré ◽

Carlo Baldassi ◽

Marco Zamparo ◽

Martin Weigt ◽

Andrea Pagnani

Keyword(s):

Protein Interactions ◽

Multiple Scales ◽

Sequence Data ◽

Direct Coupling ◽

Physical Contact ◽

Protein Protein Interactions ◽

Homologous Proteins ◽

Coupling Analysis ◽

Large Joint ◽

Direct Coupling Analysis

Understanding protein−protein interactions is central to our understanding of almost all complex biological processes. Computational tools exploiting rapidly growing genomic databases to characterize protein−protein interactions are urgently needed. Such methods should connect multiple scales from evolutionary conserved interactions between families of homologous proteins, over the identification of specifically interacting proteins in the case of multiple paralogs inside a species, down to the prediction of residues being in physical contact across interaction interfaces. Statistical inference methods detecting residue−residue coevolution have recently triggered considerable progress in using sequence data for quaternary protein structure prediction; they require, however, large joint alignments of homologous protein pairs known to interact. The generation of such alignments is a complex computational task on its own; application of coevolutionary modeling has, in turn, been restricted to proteins without paralogs, or to bacterial systems with the corresponding coding genes being colocalized in operons. Here we show that the direct coupling analysis of residue coevolution can be extended to connect the different scales, and simultaneously to match interacting paralogs, to identify interprotein residue−residue contacts and to discriminate interacting from noninteracting families in a multiprotein system. Our results extend the potential applications of coevolutionary analysis far beyond cases treatable so far.

Download Full-text

Identification of biophysical interaction patterns in direct coupling analysis

Physical Review E ◽

10.1103/physreve.103.042418 ◽

2021 ◽

Vol 103 (4) ◽

Author(s):

Michael Schmidt ◽

Kay Hamacher

Keyword(s):

Direct Coupling ◽

Interaction Patterns ◽

Coupling Analysis ◽

Direct Coupling Analysis

Download Full-text

Erratum: Three-body interactions improve contact prediction within direct-coupling analysis [Phys. Rev. E 96 , 052405 (2017)]

Physical Review E ◽

10.1103/physreve.104.019902 ◽

2021 ◽

Vol 104 (1) ◽

Author(s):

Michael Schmidt ◽

Kay Hamacher

Keyword(s):

Direct Coupling ◽

Coupling Analysis ◽

Contact Prediction ◽

Direct Coupling Analysis ◽

Three Body

Download Full-text

From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction

PLoS Computational Biology ◽

10.1371/journal.pcbi.1003176 ◽

2013 ◽

Vol 9 (8) ◽

pp. e1003176 ◽

Cited By ~ 81

Author(s):

Simona Cocco ◽

Remi Monasson ◽

Martin Weigt

Keyword(s):

Structure Prediction ◽

Principal Component ◽

Direct Coupling ◽

Coupling Analysis ◽

Direct Coupling Analysis

Download Full-text