ProSPr: Democratized Implementation of Alphafold Protein Distance Prediction Network

The whole is greater than its parts: ensembling improves protein contact prediction

Scientific Reports ◽

10.1038/s41598-021-87524-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Wendy M. Billings ◽

Connor J. Morris ◽

Dennis Della Corte

Keyword(s):

Neural Networks ◽

Structure Prediction ◽

Deep Neural Networks ◽

Protein Structures ◽

Network Models ◽

Critical Assessment ◽

Neural Network Models ◽

Contact Prediction ◽

Homologous Sequences ◽

Contact Predictions

AbstractThe prediction of amino acid contacts from protein sequence is an important problem, as protein contacts are a vital step towards the prediction of folded protein structures. We propose that a powerful concept from deep learning, called ensembling, can increase the accuracy of protein contact predictions by combining the outputs of different neural network models. We show that ensembling the predictions made by different groups at the recent Critical Assessment of Protein Structure Prediction (CASP13) outperforms all individual groups. Further, we show that contacts derived from the distance predictions of three additional deep neural networks—AlphaFold, trRosetta, and ProSPr—can be substantially improved by ensembling all three networks. We also show that ensembling these recent deep neural networks with the best CASP13 group creates a superior contact prediction tool. Finally, we demonstrate that two ensembled networks can successfully differentiate between the folds of two highly homologous sequences. In order to build further on these findings, we propose the creation of a better protein contact benchmark set and additional open-source contact prediction methods.

Download Full-text

Improved Sampling Strategies for Protein Model Refinement based on Molecular Dynamics Simulation

10.26434/chemrxiv.13299197.v1 ◽

2020 ◽

Author(s):

Lim Heo ◽

Collin Arbour ◽

Michael Feig

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulation ◽

Structure Prediction ◽

Protein Structures ◽

Conformational Space ◽

Dynamics Simulation ◽

Model Refinement ◽

Protein Model ◽

Lower Accuracy ◽

Simulation Based

Protein structures provide valuable information for understanding biological processes. Protein structures can be determined by experimental methods such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, or cryogenic electron microscopy. As an alternative, in silico methods can be used to predict protein structures. Those methods utilize protein structure databases for structure prediction via template-based modeling or for training machine-learning models to generate predictions. Structure prediction for proteins distant from proteins with known structures often results in lower accuracy with respect to the true physiological structures. Physics-based protein model refinement methods can be applied to improve model accuracy in the predicted models. Refinement methods rely on conformational sampling around the predicted structures, and if structures closer to the native states are sampled, improvements in the model quality become possible. Molecular dynamics simulations have been especially successful for improving model qualities but although consistent refinement can be achieved, the improvements in model qualities are still moderate. To extend the refinement performance of a simulation-based protocol, we explored new schemes that focus on an optimized use of biasing functions and the application of increased simulation temperatures. In addition, we tested the use of alternative initial models so that the simulations can explore conformational space more broadly. Based on the insight of this analysis we are proposing a new refinement protocol that significantly outperformed previous state-of-the-art molecular dynamics simulation-based protocols in the benchmark tests described here. <br>

Download Full-text

AttentiveDist: Protein Inter-Residue Distance Prediction Using Deep Learning with Attention on Quadruple Multiple Sequence Alignments

10.1101/2020.11.24.396770 ◽

2020 ◽

Author(s):

Aashish Jain ◽

Genki Terashi ◽

Yuki Kagaya ◽

Sai Raghavendra Maddhuri Venkata Subramaniya ◽

Charles Christoffer ◽

...

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Prediction Models ◽

3D Structure ◽

Evolutionary Information ◽

Sequence Alignments ◽

Multiple Sequence ◽

Contact Prediction ◽

Multiple Sequence Alignments ◽

Distance Prediction

ABSTRACTProtein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA’s feature at the inter-residue level, we added an attention layer to the deep neural network. The model is trained in a multi-task fashion to also predict backbone and orientation angles further improving the inter-residue distance prediction. We show that AttentiveDist outperforms the top methods for contact prediction in the CASP13 structure prediction competition. To aid in structure modeling we also developed two new deep learning-based sidechain center distance and peptide-bond nitrogen-oxygen distance prediction models. Together these led to a 12% increase in TM-score from the best server method in CASP13 for structure prediction.

Download Full-text

Study of Real-Valued Distance Prediction For Protein Structure Prediction with Deep Learning

10.1101/2020.11.26.400523 ◽

2020 ◽

Author(s):

Jin Li ◽

Jinbo Xu

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

3D Structure ◽

Prediction Method ◽

Structure Modeling ◽

Contact Prediction ◽

Real Value ◽

3D Structure Modeling ◽

Distance Prediction

AbstractInter-residue distance prediction by deep ResNet (convolutional residual neural network) has greatly advanced protein structure prediction. Currently the most successful structure prediction methods predict distance by discretizing it into dozens of bins. Here we study how well real-valued distance can be predicted and how useful it is for 3D structure modeling by comparing it with discrete-valued prediction based upon the same deep ResNet. Different from the recent methods that predict only a single real value for the distance of an atom pair, we predict both the mean and standard deviation of a distance and then employ a novel method to fold a protein by the predicted mean and deviation. Our findings include: 1) tested on the CASP13 FM (free-modeling) targets, our real-valued distance prediction obtains 81% precision on top L/5 long-range contact prediction, much better than the best CASP13 results (70%); 2) our real-valued prediction can predict correct folds for the same number of CASP13 FM targets as the best CASP13 group, despite generating only 20 decoys for each target; 3) our method greatly outperforms a very new real-valued prediction method DeepDist in both contact prediction and 3D structure modeling; and 4) when the same deep ResNet is used, our real-valued distance prediction has 1-6% higher contact and distance accuracy than our own discrete-valued prediction, but less accurate 3D structure models.

Download Full-text

REALDIST: Real-valued protein distance prediction

10.1101/2020.11.28.402214 ◽

2020 ◽

Cited By ~ 1

Author(s):

Badri Adhikari

Keyword(s):

Structure Prediction ◽

3D Structure ◽

Three Dimensional ◽

Prediction Method ◽

3D Models ◽

Classification Problem ◽

Intermediate Step ◽

Protein Distance ◽

Multi Class Classification ◽

Distance Prediction

AbstractProtein structure prediction continues to stand as an unsolved problem in bioinformatics and biomedicine. Deep learning algorithms and the availability of metagenomic sequences have led to the development of new approaches to predict inter-residue distances—the key intermediate step. Different from the recently successful methods which frame the problem as a multi-class classification problem, this article introduces a real-valued distance prediction method REALDIST. Using a representative set of 43 thousand protein chains, a variant of deep ResNet is trained to predict real-valued distance maps. The contacts derived from the real-valued distance maps predicted by this method, on the most difficult CASP13 free-modeling protein datasets, demonstrate a long-range top-L precision of 52%, which is 17% higher than the top CASP13 predictor Raptor-X and slightly higher than the more recent trRosetta method. Similar improvements are observed on the CAMEO ‘hard’ and ‘very hard’ datasets. Three-dimensional (3D) structure prediction guided by real-valued distances reveals that for short proteins the mean accuracy of the 3D models is slightly higher than the top human predictor AlphaFold and server predictor Quark in the CASP13 competition.

Download Full-text

Developing a Fully-glycosylated Full-length SARS-CoV-2 Spike Protein Model in a Viral Membrane

10.1101/2020.05.20.103325 ◽

2020 ◽

Cited By ~ 2

Author(s):

Hyeonuk Woo ◽

Sang-Jun Park ◽

Yeol Kyo Choi ◽

Taeyong Park ◽

Maham Tanveer ◽

...

Keyword(s):

Modeling And Simulation ◽

Structure Prediction ◽

De Novo ◽

Protein Structures ◽

Full Length ◽

Loop Modeling ◽

S Protein ◽

Viral Membrane ◽

Protein Model ◽

Dynamics Simulations

ABSTRACTThis technical study describes all-atom modeling and simulation of a fully-glycosylated full-length SARS-CoV-2 spike (S) protein in a viral membrane. First, starting from PDB:6VSB and 6VXX, full-length S protein structures were modeled using template-based modeling, de-novo protein structure prediction, and loop modeling techniques in GALAXY modeling suite. Then, using the recently-determined most occupied glycoforms, 22 N-glycans and 1 O-glycan of each monomer were modeled using Glycan Reader & Modeler in CHARMM-GUI. These fully-glycosylated full-length S protein model structures were assessed and further refined against the low-resolution data in their respective experimental maps using ISOLDE. We then used CHARMM-GUI Membrane Builder to place the S proteins in a viral membrane and performed all-atom molecular dynamics simulations. All structures are available in CHARMM-GUI COVID-19 Archive (http://www.charmm-gui.org/docs/archive/covid19), so researchers can use these models to carry out innovative and novel modeling and simulation research for the prevention and treatment of COVID-19.

Download Full-text

Evaluation of Deep Neural Network ProSPr for Accurate Protein Distance Predictions on CASP14 Targets

International Journal of Molecular Sciences ◽

10.3390/ijms222312835 ◽

2021 ◽

Vol 22 (23) ◽

pp. 12835

Author(s):

Jacob Stern ◽

Bryce Hedelius ◽

Olivia Fisher ◽

Wendy M. Billings ◽

Dennis Della Corte

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Sequence Length ◽

Multiple Sequence ◽

Current State ◽

Protein Distance ◽

Solid Foundation ◽

Distance Prediction ◽

Further Development

The field of protein structure prediction has recently been revolutionized through the introduction of deep learning. The current state-of-the-art tool AlphaFold2 can predict highly accurate structures; however, it has a prohibitively long inference time for applications that require the folding of hundreds of sequences. The prediction of protein structure annotations, such as amino acid distances, can be achieved at a higher speed with existing tools, such as the ProSPr network. Here, we report on important updates to the ProSPr network, its performance in the recent Critical Assessment of Techniques for Protein Structure Prediction (CASP14) competition, and an evaluation of its accuracy dependency on sequence length and multiple sequence alignment depth. We also provide a detailed description of the architecture and the training process, accompanied by reusable code. This work is anticipated to provide a solid foundation for the further development of protein distance prediction tools.

Download Full-text

Improved Sampling Strategies for Protein Model Refinement based on Molecular Dynamics Simulation

10.26434/chemrxiv.13299197 ◽

2020 ◽

Author(s):

Lim Heo ◽

Collin Arbour ◽

Michael Feig

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulation ◽

Structure Prediction ◽

Protein Structures ◽

Conformational Space ◽

Dynamics Simulation ◽

Model Refinement ◽

Protein Model ◽

Lower Accuracy ◽

Simulation Based

Protein structures provide valuable information for understanding biological processes. Protein structures can be determined by experimental methods such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, or cryogenic electron microscopy. As an alternative, in silico methods can be used to predict protein structures. Those methods utilize protein structure databases for structure prediction via template-based modeling or for training machine-learning models to generate predictions. Structure prediction for proteins distant from proteins with known structures often results in lower accuracy with respect to the true physiological structures. Physics-based protein model refinement methods can be applied to improve model accuracy in the predicted models. Refinement methods rely on conformational sampling around the predicted structures, and if structures closer to the native states are sampled, improvements in the model quality become possible. Molecular dynamics simulations have been especially successful for improving model qualities but although consistent refinement can be achieved, the improvements in model qualities are still moderate. To extend the refinement performance of a simulation-based protocol, we explored new schemes that focus on an optimized use of biasing functions and the application of increased simulation temperatures. In addition, we tested the use of alternative initial models so that the simulations can explore conformational space more broadly. Based on the insight of this analysis we are proposing a new refinement protocol that significantly outperformed previous state-of-the-art molecular dynamics simulation-based protocols in the benchmark tests described here. <br>

Download Full-text

A fully open-source framework for deep learning protein real-valued distances

10.1101/2020.04.26.061820 ◽

2020 ◽

Author(s):

Badri Adhikari

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Training Data ◽

Learning Models ◽

Web Browser ◽

Fast Development ◽

Protein Distance ◽

Distance Prediction

AbstractAs deep learning algorithms drive the progress in protein structure prediction, a lot remains to be studied at this emerging crossway of deep learning and protein structure prediction. Recent findings show that inter-residue distance prediction, a more granular version of the well-known contact prediction problem, is a key to predict accurate models. We believe that deep learning methods that predict these distances are still at infancy. To advance these methods and develop other novel methods, we need a small and representative dataset packaged for fast development and testing. In this work, we introduce Protein Distance Net (PDNET), a dataset derived from the widely used DeepCov dataset and consists of 3456 representative protein chains for training and validation. It is packaged with all the scripts that were used to curate the dataset, generate the input features and distance maps, and scripts with deep learning models to train, validate and test. Deep learning models can also be trained and tested in a web browser using free platforms such as Google Colab. We discuss how this dataset can be used to predict contacts, distance intervals, and real-valued distances (in Å) by designing regression models. All scripts, training data, deep learning code for training, validation, and testing, and Python notebooks are available at https://github.com/ba-lab/pdnet/.

Download Full-text

Improving deep learning-based protein distance prediction in CASP14

10.1101/2021.02.02.429462 ◽

2021 ◽

Author(s):

Zhiye Guo ◽

Tianqi Wu ◽

Jian Liu ◽

Jie Hou ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Prediction Method ◽

Learning Method ◽

Sequence Alignments ◽

Evolutionary Features ◽

Protein Distance ◽

Distance Prediction

AbstractAccurate prediction of residue-residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions (i.e. classifying distances between two residues into two categories: in contact (< 8 Angstrom) and not in contact otherwise) and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the F1 measure. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps.

Download Full-text