scholarly journals Patch-DCA: Improved Protein Interface Prediction by utilizing Structural Information and Clustering DCA scores

2019 ◽  
Author(s):  
Amir Vajdi ◽  
Kourosh Zarringhalam ◽  
Nurit Haspel

AbstractOver the past decade there have been impressive advances in determining the 3D structures of protein complexes. However, there are still many complexes with unknown structures, even when the structures of the individual proteins are known. The advent of protein sequence information provides an opportunity to leverage evolutionary information to enhance the accuracy of protein-protein interface prediction. To this end, several statistical and machine learning methods have been proposed. In particular, direct coupling analysis has recently emerged as a promising approach for identification of protein contact maps from sequential information. However, the ability of these methods to detect protein-protein inter-residue contacts remains relatively limited.In this work, we propose a method to integrate sequential and co-evolution information with structural and functional information to increase the performance of protein-protein interface prediction. Further, we present a post-processing clustering method that improves the average relative F1 score by 70 % and 24 % and the precision by 80 % and 36 % in comparison with two state-of-the-art methods PSICOV and GREMLIN.

Author(s):  
Amir Vajdi ◽  
Kourosh Zarringhalam ◽  
Nurit Haspel

Abstract Motivation Over the past decade, there have been impressive advances in determining the 3D structures of protein complexes. However, there are still many complexes with unknown structures, even when the structures of the individual proteins are known. The advent of protein sequence information provides an opportunity to leverage evolutionary information to enhance the accuracy of protein–protein interface prediction. To this end, several statistical and machine learning methods have been proposed. In particular, direct coupling analysis has recently emerged as a promising approach for identification of protein contact maps from sequential information. However, the ability of these methods to detect protein–protein inter-residue contacts remains relatively limited. Results In this work, we propose a method to integrate sequential and co-evolution information with structural and functional information to increase the performance of protein–protein interface prediction. Further, we present a post-processing clustering method that improves the average relative F1 score by 70% and 24% and the average relative precision by 80% and 36% in comparison with two state-of-the-art methods, PSICOV and GREMLIN. Availability and implementation https://github.com/BioMLBoston/PatchDCA Supplementary information Supplementary data are available at Bioinformatics online.


Genes ◽  
2018 ◽  
Vol 9 (8) ◽  
pp. 394 ◽  
Author(s):  
Xiu-Juan Liu ◽  
Xiu-Jun Gong ◽  
Hua Yu ◽  
Jia-Hui Xu

Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature representation of the sequences and choosing an appropriate classifier are the most trivial tasks. Disclosing the significances and contributions of different feature spaces and classifiers to the final prediction is of the utmost importance, not only for the prediction performances, but also the practical clues of biological experiment designs. In this study, we propose a model stacking framework by orchestrating multi-view features and classifiers (MSFBinder) to investigate how to integrate and evaluate loosely-coupled models for predicting DNA-binding proteins. The framework integrates multi-view features including Local_DPP, 188D, Position-Specific Scoring Matrix (PSSM)_DWT and autocross-covariance of secondary structures(AC_Struc), which were extracted based on evolutionary information, sequence composition, physiochemical properties and predicted structural information, respectively. These features are fed into various loosely-coupled classifiers such as SVM and random forest. Then, a logistic regression model was applied to evaluate the contributions of these individual classifiers and to make the final prediction. When performing on the training dataset PDB1075, the proposed method achieves an accuracy of 83.53%. On the independent dataset PDB186, the method achieves an accuracy of 81.72%, which outperforms many existing methods. These results suggest that the framework is able to orchestrate various predicted models flexibly with good performances.


Biomolecules ◽  
2020 ◽  
Vol 10 (6) ◽  
pp. 938
Author(s):  
Kriti Chopra ◽  
Bhawna Burdak ◽  
Kaushal Sharma ◽  
Ajit Kembhavi ◽  
Shekhar C. Mande ◽  
...  

Decrypting the interface residues of the protein complexes provides insight into the functions of the proteins and, hence, the overall cellular machinery. Computational methods have been devised in the past to predict the interface residues using amino acid sequence information, but all these methods have been majorly applied to predict for prokaryotic protein complexes. Since the composition and rate of evolution of the primary sequence is different between prokaryotes and eukaryotes, it is important to develop a method specifically for eukaryotic complexes. Here, we report a new hybrid pipeline for predicting the protein-protein interaction interfaces in a pairwise manner from the amino acid sequence information of the interacting proteins. It is based on the framework of Co-evolution, machine learning (Random Forest), and Network Analysis named CoRNeA trained specifically on eukaryotic protein complexes. We use Co-evolution, physicochemical properties, and contact potential as major group of features to train the Random Forest classifier. We also incorporate the intra-contact information of the individual proteins to eliminate false positives from the predictions keeping in mind that the amino acid sequence of a protein also holds information for its own folding and not only the interface propensities. Our prediction on example datasets shows that CoRNeA not only enhances the prediction of true interface residues but also reduces false positive rates significantly.


2019 ◽  
Author(s):  
Kriti Chopra ◽  
Bhawna Burdak ◽  
Kaushal Sharma ◽  
Ajit Kembavi ◽  
Shekhar C. Mande ◽  
...  

AbstractComputational methods have been devised in the past to predict the interface residues using amino acid sequence information but have been majorly applied to predict for prokaryotic protein complexes. Since the composition and rate of evolution of the primary sequence are different between prokaryotes and eukaryotes, it is important to develop a method specifically for eukaryotic complexes. Here we report a new hybrid pipeline for the prediction of protein-protein interaction interfaces from the amino acid sequence information alone based on the framework of Co-evolution, machine learning (Random forest) and Network Analysis named CoRNeA trained specifically on eukaryotic protein complexes. We incorporate the intra contact information of the individual proteins to eliminate false positives from the predictions as the amino acid sequence also holds information for its own folding along with the interface propensities. Our prediction on various case studies shows that CoRNeA can successfully identify minimal interacting regions of two partner proteins with higher precision and recall.


2019 ◽  
Author(s):  
Zachary VanAernum ◽  
Florian Busch ◽  
Benjamin J. Jones ◽  
Mengxuan Jia ◽  
Zibo Chen ◽  
...  

It is important to assess the identity and purity of proteins and protein complexes during and after protein purification to ensure that samples are of sufficient quality for further biochemical and structural characterization, as well as for use in consumer products, chemical processes, and therapeutics. Native mass spectrometry (nMS) has become an important tool in protein analysis due to its ability to retain non-covalent interactions during measurements, making it possible to obtain protein structural information with high sensitivity and at high speed. Interferences from the presence of non-volatiles are typically alleviated by offline buffer exchange, which is timeconsuming and difficult to automate. We provide a protocol for rapid online buffer exchange (OBE) nMS to directly screen structural features of pre-purified proteins, protein complexes, or clarified cell lysates. Information obtained by OBE nMS can be used for fast (<5 min) quality control and can further guide protein expression and purification optimization.


2020 ◽  
Vol 27 (37) ◽  
pp. 6306-6355 ◽  
Author(s):  
Marian Vincenzi ◽  
Flavia Anna Mercurio ◽  
Marilisa Leone

Background:: Many pathways regarding healthy cells and/or linked to diseases onset and progression depend on large assemblies including multi-protein complexes. Protein-protein interactions may occur through a vast array of modules known as protein interaction domains (PIDs). Objective:: This review concerns with PIDs recognizing post-translationally modified peptide sequences and intends to provide the scientific community with state of art knowledge on their 3D structures, binding topologies and potential applications in the drug discovery field. Method:: Several databases, such as the Pfam (Protein family), the SMART (Simple Modular Architecture Research Tool) and the PDB (Protein Data Bank), were searched to look for different domain families and gain structural information on protein complexes in which particular PIDs are involved. Recent literature on PIDs and related drug discovery campaigns was retrieved through Pubmed and analyzed. Results and Conclusion:: PIDs are rather versatile as concerning their binding preferences. Many of them recognize specifically only determined amino acid stretches with post-translational modifications, a few others are able to interact with several post-translationally modified sequences or with unmodified ones. Many PIDs can be linked to different diseases including cancer. The tremendous amount of available structural data led to the structure-based design of several molecules targeting protein-protein interactions mediated by PIDs, including peptides, peptidomimetics and small compounds. More studies are needed to fully role out, among different families, PIDs that can be considered reliable therapeutic targets, however, attacking PIDs rather than catalytic domains of a particular protein may represent a route to obtain selective inhibitors.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Farhan Quadir ◽  
Raj S. Roy ◽  
Randal Halfmann ◽  
Jianlin Cheng

AbstractDeep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.


2005 ◽  
Vol 386 (6) ◽  
pp. 523-534 ◽  
Author(s):  
Annette Hillebrand ◽  
Reinhild Wurm ◽  
Artur Menzel ◽  
Rolf Wagner

AbstractRibosomal RNAs inE. coliare transcribed from seven operons, which are highly conserved in their organization and sequence. However, the upstream regulatory DNA regions differ considerably, suggesting differences in regulation. We have therefore analyzed the conformation of all seven DNA elements located upstream of the majorE. colirRNA P1 promoters. As judged by temperature-dependent gel electrophoresis with isolated DNA fragments comprising the individual P1 promoters and the complete upstream regulatory regions, all seven rRNA upstream sequences are intrinsically curved. The degree of intrinsic curvature was highest for therrnBandrrnDfragments and less pronounced for therrnAandrrnEoperons. Comparison of the experimentally determined differences in curvature with programs for the prediction of DNA conformation revealed a generally high degree of conformity. Moreover, the analysis showed that the center of curvature is located at about the same position in all fragments. The different upstream regions were analyzed for their capacity to bind the transcription factors FIS and H-NS, which are known as antagonists in the regulation of rRNA synthesis. Gel retardation experiments revealed that both proteins interact with the upstream promoter regions of all seven rDNA fragments, with the affinities of the different DNA fragments for FIS and H-NS and the structure of the resulting complexes deviating considerably. FIS binding was non-cooperative, and at comparable protein concentrations the occupancy of the different DNA fragments varied between two and four binding sites. In contrast, H-NS was shown to bind cooperatively and intermediate states of occupancy could not be resolved for each fragment. The different gel electrophoretic mobilities of the individual DNA/protein complexes indicate variable structures and topologies of the upstream activating sequence regulatory complexes. Our results are highly suggestive of differential regulation of the individual rRNA operons.


Sign in / Sign up

Export Citation Format

Share Document