Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age

AbstractRepeat proteins are an abundant class in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these families, the structure is not known. Recently, it has been shown that the structure of many protein families can be predicted by using contact predictions from direct coupling analysis and deep learning. However, their unique sequence features present in repeat proteins is a challenge for contact predictions DCA-methods. Here, we show that using the deep learning-based PconsC4 is more effective for predicting both intra and interunit contacts among a comprehensive set of repeat proteins. In a benchmark dataset of 819 repeat proteins about one third can be correctly modelled and among 51 PFAM families lacking a protein structure, we produce models of five families with estimated high accuracy.Author SummaryRepeat proteins are widespread among organisms and particularly abundant in eukaryotic proteomes. Their primary sequence present repetition in the amino acid sequences that origin structures with repeated folds/domains. Although the repeated units are easy to be recognized in primary sequence, often structure information are missing. Here we used contact prediction for predicting the structure of repeats protein directly from their primary sequences. We benchmark our method on a dataset comprehensive of all the known repeated structures. We evaluate the contact predictions and the obtained models set for different classes of proteins and different lengths of the target, and we benchmark the quality assessment of the models on repeats proteins. Finally, we applied the methods on the repeat PFAM families missing of resolved structures, five of them modelled with high accuracy.

Download Full-text

PconsC4: fast, free, easy, and accurate contact predictions

10.1101/383133 ◽

2018 ◽

Cited By ~ 2

Author(s):

Mirco Michel ◽

David Menéndez Hurtado ◽

Arne Elofsson

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Prediction Methods ◽

Coupling Analysis ◽

Learning Methods ◽

Contact Prediction ◽

Residue Contact ◽

Direct Coupling Analysis ◽

Computationally Expensive ◽

Contact Predictions

AbstractMotivationResidue contact prediction was revolutionized recently by the introduction of direct coupling analysis (DCA). Further improvements, in particular for small families, have been obtained by the combination of DCA and deep learning methods. However, existing deep learning contact prediction methods often rely on a number of external programs and are therefore computationally expensive.ResultsHere, we introduce a novel contact predictor, PconsC4, which performs on par with state of the art methods. PconsC4 is heavily optimized, does not use any external programs and therefore is significantly faster and easier to use than other methods.AvailabilityPconsC4 is freely available under the GPL license from https://github.com/ElofssonLab/PconsC4. Installation is easy using the pip command and works on any system with Python 3.5 or later and a modern GCC [email protected]

Download Full-text

A deep dilated convolutional residual network for predicting interchain contacts of protein homodimers

10.1101/2021.09.19.460941 ◽

2021 ◽

Author(s):

Raj Shekhor Roy ◽

Farhan Quadir ◽

Elham Soltanikazemi ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Tertiary Structure ◽

Quaternary Structure ◽

High Accuracy ◽

Residual Network ◽

Sequence Alignments ◽

Learning Methods ◽

Tertiary Structures ◽

Residue Contacts ◽

Contact Predictions

Deep learning has revolutionized protein tertiary structure prediction recently. The cutting-edge deep learning methods such as AlphaFold can predict high-accuracy tertiary structures for most individual protein chains. However, the accuracy of predicting quaternary structures of protein complexes consisting of multiple chains is still relatively low due to lack of advanced deep learning methods in the field. Because interchain residue-residue contacts can be used as distance restraints to guide quaternary structure modeling, here we develop a deep dilated convolutional residual network method (DRCon) to predict interchain residue-residue contacts in homodimers from residue-residue co-evolutionary signals derived from multiple sequence alignments of monomers, intrachain residue-residue contacts of monomers extracted from true/predicted tertiary structures or predicted by deep learning, and other sequence and structural features. Tested on three homodimer test datasets (Homo_std dataset, DeepHomo dataset, and CASP14-CAPRI dataset), the precision of DRCon for top L/5 interchain contact predictions (L: length of monomer in a homodimer) is 43.46%, 47.15%, and 24.81% respectively, which is substantially better than two existing deep learning interchain contact prediction methods. Moreover, our experiments demonstrate that using predicted tertiary structure or intrachain contacts of monomers in the unbound state as input, DRCon still performs reasonably well, even though its accuracy is lower than when true tertiary structures in the bound state are used as input. Finally, our case study shows that good interchain contact predictions can be used to build high-accuracy quaternary structure models of homodimers.

Download Full-text

Combination of deep neural network with attention mechanism enhances the explainability of protein contact prediction

10.1101/2020.09.04.283937 ◽

2020 ◽

Author(s):

Chen Chen ◽

Tianqi Wu ◽

Zhiye Guo ◽

Jianlin Cheng

Keyword(s):

Neural Network ◽

Deep Learning ◽

Attention Mechanism ◽

Contact Prediction ◽

Residue Contact ◽

Complementary Effect ◽

Essential Components ◽

Internal Mechanism ◽

Contact Predictions ◽

The Relationship

AbstractDeep learning has emerged as a revolutionary technology for protein residue-residue contact prediction since the 2012 CASP10 competition. Considerable advancements in the predictive power of the deep learning-based contact predictions have been achieved since then. However, little effort has been put into interpreting the black-box deep learning methods. Algorithms that can interpret the relationship between predicted contact maps and the internal mechanism of the deep learning architectures are needed to explore the essential components of contact inference and improve their explainability. In this study, we present an attention-based convolutional neural network for protein contact prediction, which consists of two attention mechanism-based modules: sequence attention and regional attention. Our benchmark results on the CASP13 free-modeling (FM) targets demonstrate that the two attention modules added on top of existing typical deep learning models exhibit a complementary effect that contributes to predictive improvements. More importantly, the inclusion of the attention mechanism provides interpretable patterns that contain useful insights into the key fold-determining residues in proteins. We expect the attention-based model can provide a reliable and practically interpretable technique that helps break the current bottlenecks in explaining deep neural networks for contact prediction.

Download Full-text

Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14

10.1101/2021.01.31.428975 ◽

2021 ◽

Author(s):

Xiao Chen ◽

Jian Liu ◽

Zhiye Guo ◽

Tianqi Wu ◽

Jie Hou ◽

...

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Structural Models ◽

Single Model ◽

Model Accuracy ◽

Model Quality ◽

Residue Contact ◽

Contact Distance ◽

Protein Model ◽

Contact Predictions

AbstractThe inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). During the 2020 CASP14 experiment, we developed and tested several EMA predictors that used deep learning with the new features based on inter-residue distance/contact predictions as well as the existing model quality features. The average global distance test (GDT-TS) score loss of ranking CASP14 structural models by three multi-model MULTICOM EMA predictors (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) is 0.073, 0.079, and 0.081, respectively, which are ranked first, second, and third places out of 68 CASP14 EMA predictors. The single-model EMA predictor (MULTICOM-DEEP) is ranked 10th place among all the single-model EMA methods in terms of GDT_TS score loss. The results show that deep learning and contact/distance predictions are useful in ranking and selecting protein structural models.

Download Full-text

PconsC4: fast, accurate and hassle-free contact predictions

Bioinformatics ◽

10.1093/bioinformatics/bty1036 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2677-2679 ◽

Cited By ~ 15

Author(s):

Mirco Michel ◽

David Menéndez Hurtado ◽

Arne Elofsson

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Supplementary Information ◽

Prediction Methods ◽

Coupling Analysis ◽

Contact Prediction ◽

Residue Contact ◽

Direct Coupling Analysis ◽

Computationally Expensive ◽

Contact Predictions

Abstract Motivation Residue contact prediction was revolutionized recently by the introduction of direct coupling analysis (DCA). Further improvements, in particular for small families, have been obtained by the combination of DCA and deep learning methods. However, existing deep learning contact prediction methods often rely on a number of external programs and are therefore computationally expensive. Results Here, we introduce a novel contact predictor, PconsC4, which performs on par with state of the art methods. PconsC4 is heavily optimized, does not use any external programs and therefore is significantly faster and easier to use than other methods. Availability and implementation PconsC4 is freely available under the GPL license from https://github.com/ElofssonLab/PconsC4. Installation is easy using the pip command and works on any system with Python 3.5 or later and a GCC compiler. It does not require a GPU nor special hardware. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DeepDist: real-value inter-residue distance prediction with deep residual convolutional network

10.1101/2020.03.17.995910 ◽

2020 ◽

Cited By ~ 4

Author(s):

Tianqi Wu ◽

Zhiye Guo ◽

Jie Hou ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Convolutional Network ◽

Regression Problem ◽

Distance Map ◽

The Real ◽

Real Value ◽

Contact Predictions ◽

Distance Prediction ◽

Better Than

AbstractMotivationDriven by deep learning techniques, inter-residue contact/distance prediction has been significantly improved and substantially enhanced ab initio protein structure prediction. Currently all the distance prediction methods classify inter-residue distances into multiple distance intervals (i.e. a multi-classification problem) instead of directly predicting real-value distances (i.e. a regression problem). The output of the former has to be converted into real-value distances in order to be used in tertiary structure prediction.ResultsTo explore the potentials of predicting real-value inter-residue distances, we develop a multi-task deep learning distance predictor (DeepDist) based on new residual convolutional network architectures to simultaneously predict real-value inter-residue distances and classify them into multiple distance intervals. We demonstrate that predicting the real-value distance map and multi-class distance map at the same time performs better than predicting real-value distances alone, indicating their complementarity. On 43 CASP13 hard domains, the average mean square error (MSE) of DeepDist’s real-value distance predictions is 0.896 Å when filtering out the predicted distance >=16 Å, which is lower than 1.003 Å of DeepDist’s multi-class distance predictions. When the predicted real-value distances are converted to binary contact predictions at 8Å threshold, the precisions of top L/5 and L/2 contact predictions are 78.6% and 64.5%, respectively, higher than the best results reported in the CASP13 experiment. These results demonstrate that the real-value distance prediction can predict inter-residue distances well and improve binary contact prediction over the existing state-of-the-art methods. Moreover, the predicted real-value distances can be directly used to reconstruct protein tertiary structures better than multi-class distance predictions due to the lower MSE.

Download Full-text

DeepDist: real-value inter-residue distance prediction with deep residual convolutional network

BMC Bioinformatics ◽

10.1186/s12859-021-03960-9 ◽

2021 ◽

Vol 22 (1) ◽

Cited By ~ 1

Author(s):

Tianqi Wu ◽

Zhiye Guo ◽

Jie Hou ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Tertiary Structure ◽

Convolutional Network ◽

Distance Map ◽

Contact Distance ◽

Real Value ◽

Contact Predictions ◽

Distance Prediction ◽

Better Than

Abstract Background Driven by deep learning, inter-residue contact/distance prediction has been significantly improved and substantially enhanced ab initio protein structure prediction. Currently, most of the distance prediction methods classify inter-residue distances into multiple distance intervals instead of directly predicting real-value distances. The output of the former has to be converted into real-value distances to be used in tertiary structure prediction. Results To explore the potentials of predicting real-value inter-residue distances, we develop a multi-task deep learning distance predictor (DeepDist) based on new residual convolutional network architectures to simultaneously predict real-value inter-residue distances and classify them into multiple distance intervals. Tested on 43 CASP13 hard domains, DeepDist achieves comparable performance in real-value distance prediction and multi-class distance prediction. The average mean square error (MSE) of DeepDist’s real-value distance prediction is 0.896 Å2 when filtering out the predicted distance ≥ 16 Å, which is lower than 1.003 Å2 of DeepDist’s multi-class distance prediction. When distance predictions are converted into contact predictions at 8 Å threshold (the standard threshold in the field), the precision of top L/5 and L/2 contact predictions of DeepDist’s multi-class distance prediction is 79.3% and 66.1%, respectively, higher than 78.6% and 64.5% of its real-value distance prediction and the best results in the CASP13 experiment. Conclusions DeepDist can predict inter-residue distances well and improve binary contact prediction over the existing state-of-the-art methods. Moreover, the predicted real-value distances can be directly used to reconstruct protein tertiary structures better than multi-class distance predictions due to the lower MSE. Finally, we demonstrate that predicting the real-value distance map and multi-class distance map at the same time performs better than predicting real-value distances alone.

Download Full-text

Protein Model Accuracy Estimation Empowered by Deep Learning and Inter-residue Distance Prediction in CASP14

10.21203/rs.3.rs-228012/v1 ◽

2021 ◽

Author(s):

Xiao Chen ◽

Jian Liu ◽

Zhiye Guo ◽

Tianqi Wu ◽

Jie Hou ◽

...

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Structural Models ◽

Single Model ◽

Model Accuracy ◽

Model Quality ◽

Residue Contact ◽

Contact Distance ◽

Protein Model ◽

Contact Predictions

Abstract The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (CASP13) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). During the 2020 CASP14 experiment, we developed and tested several EMA predictors that used deep learning with the new features based on inter-residue distance/contact predictions as well as the existing model quality features. The average global distance test (GDT-TS) score loss of ranking CASP14 structural models by three multi-model MULTICOM EMA predictors (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) is 0.073, 0.079, and 0.081, respectively, which are ranked first, second, and third places out of 68 CASP14 EMA predictors. The single-model EMA predictor (MULTICOM-DEEP) is ranked 10th place among all the single-model EMA methods in terms of GDT-TS score loss. The results show that deep learning and contact/distance predictions are useful in ranking and selecting protein structural models.

Download Full-text