pydca v1.0: a comprehensive software for Direct Coupling Analysis of RNA and Protein Sequences

Mapping Intimacies ◽

10.1101/805523 ◽

2019 ◽

Cited By ~ 1

Author(s):

Mehari B. Zerihun ◽

Fabrizio Pucci ◽

Emanuel Karl Peter ◽

Alexander Schug

Keyword(s):

Structure Prediction ◽

Sequence Data ◽

Mean Field ◽

Direct Coupling ◽

Spatial Proximity ◽

Homologous Proteins ◽

Coupling Analysis ◽

Multiple Sequence ◽

Wide Range ◽

Direct Coupling Analysis

AbstractThe ongoing advances in sequencing technologies have provided a massive increase in the availability of sequence data. This made it possible to study the patterns of correlated substitution between residues in families of homologous proteins or RNAs and to retrieve structural and stability information. Direct coupling Analysis (DCA) infers coevolutionary couplings between pairs of residues indicating their spatial proximity, making such information a valuable input for subsequent structure prediction. Here we present pydca, a standalone Python-based software package for the DCA of protein- and RNA-homologous families. It is based on two popular inverse statistical approaches, namely, the mean-field and the pseudo-likelihood maximization and is equipped with a series of functionalities that range from multiple sequence alignment trimming to contact map visualization. Thanks to its efficient implementation, features and user-friendly command line interface, pydca is a modular and easy-to-use tool that can be used by researchers with a wide range of backgrounds.Availabilityhttps://github.com/KIT-MBS/pydca

Download Full-text

pydca v1.0: a comprehensive software for direct coupling analysis of RNA and protein sequences

Bioinformatics ◽

10.1093/bioinformatics/btz892 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2264-2265 ◽

Cited By ~ 3

Author(s):

Mehari B Zerihun ◽

Fabrizio Pucci ◽

Emanuel K Peter ◽

Alexander Schug

Keyword(s):

Structure Prediction ◽

Sequence Data ◽

Direct Coupling ◽

Supplementary Information ◽

Spatial Proximity ◽

Homologous Proteins ◽

Coupling Analysis ◽

Multiple Sequence ◽

Wide Range ◽

Direct Coupling Analysis

Abstract Motivation The ongoing advances in sequencing technologies have provided a massive increase in the availability of sequence data. This made it possible to study the patterns of correlated substitution between residues in families of homologous proteins or RNAs and to retrieve structural and stability information. Direct coupling analysis (DCA) infers coevolutionary couplings between pairs of residues indicating their spatial proximity, making such information a valuable input for subsequent structure prediction. Results Here, we present pydca, a standalone Python-based software package for the DCA of protein- and RNA-homologous families. It is based on two popular inverse statistical approaches, namely, the mean-field and the pseudo-likelihood maximization and is equipped with a series of functionalities that range from multiple sequence alignment trimming to contact map visualization. Thanks to its efficient implementation, features and user-friendly command line interface, pydca is a modular and easy-to-use tool that can be used by researchers with a wide range of backgrounds. Availability and implementation pydca can be obtained from https://github.com/KIT-MBS/pydca or from the Python Package Index under the MIT License. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Liquid-theory analogy of direct-coupling analysis of multiple-sequence alignment and its implications for protein structure prediction

Biophysics and Physicobiology ◽

10.2142/biophysico.12.0_117 ◽

2015 ◽

Vol 12 (0) ◽

pp. 117-119 ◽

Cited By ~ 1

Author(s):

Akira R. Kinjo

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Direct Coupling ◽

Coupling Analysis ◽

Multiple Sequence ◽

Liquid Theory ◽

Direct Coupling Analysis

Download Full-text

Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1607570113 ◽

2016 ◽

Vol 113 (43) ◽

pp. 12186-12191 ◽

Cited By ~ 71

Author(s):

Thomas Gueudré ◽

Carlo Baldassi ◽

Marco Zamparo ◽

Martin Weigt ◽

Andrea Pagnani

Keyword(s):

Protein Interactions ◽

Multiple Scales ◽

Sequence Data ◽

Direct Coupling ◽

Physical Contact ◽

Protein Protein Interactions ◽

Homologous Proteins ◽

Coupling Analysis ◽

Large Joint ◽

Direct Coupling Analysis

Understanding protein−protein interactions is central to our understanding of almost all complex biological processes. Computational tools exploiting rapidly growing genomic databases to characterize protein−protein interactions are urgently needed. Such methods should connect multiple scales from evolutionary conserved interactions between families of homologous proteins, over the identification of specifically interacting proteins in the case of multiple paralogs inside a species, down to the prediction of residues being in physical contact across interaction interfaces. Statistical inference methods detecting residue−residue coevolution have recently triggered considerable progress in using sequence data for quaternary protein structure prediction; they require, however, large joint alignments of homologous protein pairs known to interact. The generation of such alignments is a complex computational task on its own; application of coevolutionary modeling has, in turn, been restricted to proteins without paralogs, or to bacterial systems with the corresponding coding genes being colocalized in operons. Here we show that the direct coupling analysis of residue coevolution can be extended to connect the different scales, and simultaneously to match interacting paralogs, to identify interprotein residue−residue contacts and to discriminate interacting from noninteracting families in a multiprotein system. Our results extend the potential applications of coevolutionary analysis far beyond cases treatable so far.

Download Full-text

From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction

PLoS Computational Biology ◽

10.1371/journal.pcbi.1003176 ◽

2013 ◽

Vol 9 (8) ◽

pp. e1003176 ◽

Cited By ~ 81

Author(s):

Simona Cocco ◽

Remi Monasson ◽

Martin Weigt

Keyword(s):

Structure Prediction ◽

Principal Component ◽

Direct Coupling ◽

Coupling Analysis ◽

Direct Coupling Analysis

Download Full-text

Expanding Direct Coupling Analysis to Identify Heterodimeric Interfaces from Limited Protein Sequence Data

The Journal of Physical Chemistry B ◽

10.1021/acs.jpcb.1c07145 ◽

2021 ◽

Author(s):

Kareem M. Mehrabiani ◽

Ryan R. Cheng ◽

José N. Onuchic

Keyword(s):

Protein Sequence ◽

Sequence Data ◽

Direct Coupling ◽

Coupling Analysis ◽

Protein Sequence Data ◽

Direct Coupling Analysis ◽

Limited Protein

Download Full-text

Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1615068114 ◽

2017 ◽

Vol 114 (13) ◽

pp. E2662-E2671 ◽

Cited By ~ 58

Author(s):

Guido Uguzzoni ◽

Shalini John Lovis ◽

Francesco Oteri ◽

Alexander Schug ◽

Hendrik Szurmant ◽

...

Keyword(s):

Structure Prediction ◽

Large Scale ◽

Structural Models ◽

Direct Coupling ◽

Pfam Family ◽

Sequence Information ◽

Structural Description ◽

Protein Families ◽

Coupling Analysis ◽

Direct Coupling Analysis

Proteins have evolved to perform diverse cellular functions, from serving as reaction catalysts to coordinating cellular propagation and development. Frequently, proteins do not exert their full potential as monomers but rather undergo concerted interactions as either homo-oligomers or with other proteins as hetero-oligomers. The experimental study of such protein complexes and interactions has been arduous. Theoretical structure prediction methods are an attractive alternative. Here, we investigate homo-oligomeric interfaces by tracing residue coevolution via the global statistical direct coupling analysis (DCA). DCA can accurately infer spatial adjacencies between residues. These adjacencies can be included as constraints in structure prediction techniques to predict high-resolution models. By taking advantage of the ongoing exponential growth of sequence databases, we go significantly beyond anecdotal cases of a few protein families and apply DCA to a systematic large-scale study of nearly 2,000 Pfam protein families with sufficient sequence information and structurally resolved homo-oligomeric interfaces. We find that large interfaces are commonly identified by DCA. We further demonstrate that DCA can differentiate between subfamilies with different binding modes within one large Pfam family. Sequence-derived contact information for the subfamilies proves sufficient to assemble accurate structural models of the diverse protein-oligomers. Thus, we provide an approach to investigate oligomerization for arbitrary protein families leading to structural models complementary to often-difficult experimental methods. Combined with ever more abundant sequential data, we anticipate that this study will be instrumental to allow the structural description of many heteroprotein complexes in the future.

Download Full-text

Improving RNA secondary structure prediction using direct coupling analysis

Chinese Physics B ◽

10.1088/1674-1056/ab889d ◽

2020 ◽

Vol 29 (7) ◽

pp. 078702

Author(s):

Xiaoling He ◽

Jun Wang ◽

Jian Wang ◽

Yi Xiao

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Secondary Structure Prediction ◽

Direct Coupling ◽

Coupling Analysis ◽

Rna Secondary Structure Prediction ◽

Direct Coupling Analysis

Download Full-text

Protein structure without structure determination: direct coupling analysis based on in vitro evolution

10.1101/582056 ◽

2019 ◽

Author(s):

Marco Fantini ◽

Simonetta Lisi ◽

Paolo De Los Rios ◽

Antonino Cattaneo ◽

Annalisa Pastore

Keyword(s):

Structural Information ◽

Sequence Data ◽

Protein Structures ◽

Direct Coupling ◽

In Vitro Mutagenesis ◽

Evolutionary Analysis ◽

Protein Families ◽

Coupling Analysis ◽

Direct Coupling Analysis

AbstractDirect Coupling Analysis (DCA) is a powerful technique that enables to extract structural information of proteins belonging to large protein families exclusively by in silico analysis. This method is however limited by sequence availability and various biases. Here, we propose a method that exploits molecular evolution to circumvent these limitations: instead of relying on existing protein families, we used in vitro mutagenesis of TEM-1 beta lactamase combined with in vivo functional selection to generate the sequence data necessary for evolutionary analysis. We could reconstruct by this strategy, which we called CAMELS (CouplingAnalysis byMolecularEvolutionLibrarySequencing), the lactamase fold exclusively from sequence data. Through generating and sequencing large libraries of variants, we can deal with any protein, ancient or recent, from any species, having the only constraint of setting up a functional phenotypic selection of the protein. This method allows us to obtain protein structures without solving the structure experimentally.

Download Full-text

Deep learning reveals many more inter-protein residue-residue contacts than direct coupling analysis

10.1101/240754 ◽

2017 ◽

Cited By ~ 4

Author(s):

Tian-ming Zhou ◽

Sheng Wang ◽

Jinbo Xu

Keyword(s):

Deep Learning ◽

Protein Interaction ◽

Interaction Network ◽

Protein Docking ◽

Residue Level ◽

Direct Coupling ◽

Coupling Analysis ◽

Multiple Sequence ◽

Contact Prediction ◽

Direct Coupling Analysis

AbstractIntra-protein residue-level contact prediction has drawn a lot of attentions in recent years and made very good progress, but much fewer methods are dedicated to inter-protein contact prediction, which are important for understanding how proteins interact at structure and residue level. Direct coupling analysis (DCA) is popular for intra-protein contact prediction, but extending it to inter-protein contact prediction is challenging since it requires too many interlogs (i.e., interacting homologs) to be effective, which cannot be easily fulfilled especially for a putative interacting protein pair in eukaryotes. We show that deep learning, even trained by only intra-protein contact maps, works much better than DCA for inter-protein contact prediction. We also show that a phylogeny-based method can generate a better multiple sequence alignment for eukaryotes than existing genome-based methods and thus, lead to better inter-protein contact prediction. Our method shall be useful for protein docking, protein interaction prediction and protein interaction network construction.

Download Full-text

Faculty Opinions recommendation of Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.725820375.793528052 ◽

2017 ◽

Author(s):

Janusz Bujnicki

Keyword(s):

Structure Prediction ◽

Tertiary Structure ◽

Direct Coupling ◽

Coupling Analysis ◽

Tertiary Structure Prediction ◽

Direct Coupling Analysis ◽

Secondary And Tertiary Structure

Download Full-text