Distance-based Protein Folding Powered by Deep Learning

Mapping Intimacies ◽

10.1101/465955 ◽

2018 ◽

Cited By ~ 9

Author(s):

Jinbo Xu

Keyword(s):

Protein Folding ◽

Deep Learning ◽

Protein Structure ◽

Membrane Proteins ◽

Family Size ◽

Personal Computer ◽

Experimental Validation ◽

Distance Matrix ◽

Geometric Constraints ◽

Folding Simulation

AbstractDirect coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming folding simulation. We show that we can accurately predict the distance matrix of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving any folding simulation. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 hours on a Linux computer of 20 CPUs. In contrast, DCA cannot fold any of these hard targets in the absence of folding simulation, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into complex, fragment-based folding simulation. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on top L/5 long-range predicted contacts. Latest experimental validation in CAMEO shows that our server predicted correct fold for two membrane proteins of new fold while all the other servers failed. These results imply that it is now feasible to predict correct fold for proteins lack of similar structures in PDB on a personal computer without folding simulation.SignificanceAccurate description of protein structure and function is a fundamental step towards understanding biological life and highly relevant in the development of therapeutics. Although greatly improved, experimental protein structure determination is still low-throughput and costly, especially for membrane proteins. As such, computational structure prediction is often resorted. Predicting the structure of a protein with a new fold (i.e., without similar structures in PDB) is very challenging and usually needs a large amount of computing power. This paper shows that by using a powerful deep learning technique, even with only a personal computer we can predict new folds much more accurately than ever before. This method also works well on membrane protein folding.

Download Full-text

Distance-based protein folding powered by deep learning

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1821309116 ◽

2019 ◽

Vol 116 (34) ◽

pp. 16856-16865 ◽

Cited By ~ 72

Author(s):

Jinbo Xu

Keyword(s):

Protein Folding ◽

Deep Learning ◽

Family Size ◽

Experimental Validation ◽

Distance Matrix ◽

Data Bank ◽

3D Models ◽

Geometric Constraints ◽

Central Processing ◽

Direct Coupling Analysis

Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.

Download Full-text

PolyFold: An interactive visual simulator for distance-based protein folding

PLoS ONE ◽

10.1371/journal.pone.0243331 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243331

Author(s):

Andrew J. McGehee ◽

Sutanu Bhattacharya ◽

Rahmatullah Roche ◽

Debswapna Bhattacharya

Keyword(s):

Protein Folding ◽

Structure Prediction ◽

Distance Matrix ◽

Geometric Constraints ◽

Visualization System ◽

Folding Process ◽

Graphical Interfaces ◽

Dynamic Perspective ◽

Static View ◽

Stochastic Optimization Algorithms

Recent advances in distance-based protein folding have led to a paradigm shift in protein structure prediction. Through sufficiently precise estimation of the inter-residue distance matrix for a protein sequence, it is now feasible to predict the correct folds for new proteins much more accurately than ever before. Despite the exciting progress, a dedicated visualization system that can dynamically capture the distance-based folding process is still lacking. Most molecular visualizers typically provide only a static view of a folded protein conformation, but do not capture the folding process. Even among the selected few graphical interfaces that do adopt a dynamic perspective, none of them are distance-based. Here we present PolyFold, an interactive visual simulator for dynamically capturing the distance-based protein folding process through real-time rendering of a distance matrix and its compatible spatial conformation as it folds in an intuitive and easy-to-use interface. PolyFold integrates highly convergent stochastic optimization algorithms with on-demand customizations and interactive manipulations to maximally satisfy the geometric constraints imposed by a distance matrix. PolyFold is capable of simulating the complex process of protein folding even on modest personal computers, thus making it accessible to the general public for fostering citizen science. Open source code of PolyFold is freely available for download at https://github.com/Bhattacharya-Lab/PolyFold. It is implemented in cross-platform Java and binary executables are available for macOS, Linux, and Windows.

Download Full-text

Precise estimation of residue relative solvent accessible area from Cα atom distance matrix using a deep learning method

Bioinformatics ◽

10.1093/bioinformatics/btab616 ◽

2021 ◽

Author(s):

Jianzhao Gao ◽

Shuangjia Zheng ◽

Mengting Yao ◽

Peikun Wu

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Function ◽

Pearson Correlation ◽

Correlation Coefficients ◽

Distance Matrix ◽

Supplementary Information ◽

Learning Method ◽

Solvent Accessible Area ◽

Accessible Area

Abstract Motivation The solvent accessible surface is an essential structural property measure related to the protein structure and protein function. Relative solvent accessible area (RSA) is a standard measure to describe the degree of residue exposure in the protein surface or inside of protein. However, this computation will fail when the residues information is missing. Results In this article, we proposed a novel method for estimation RSA using the Cα atom distance matrix with the deep learning method (EAGERER). The new method, EAGERER, achieves Pearson correlation coefficients of 0.921–0.928 on two independent test datasets. We empirically demonstrate that EAGERER can yield better Pearson correlation coefficients than existing RSA estimators, such as coordination number, half sphere exposure and SphereCon. To the best of our knowledge, EAGERER represents the first method to estimate the solvent accessible area using limited information with a deep learning model. It could be useful to the protein structure and protein function prediction. Availabilityand implementation The method is free available at https://github.com/cliffgao/EAGERER. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Hybridized distance- and contact-based hierarchical structure modeling for folding soluble and membrane proteins

10.1101/2020.07.05.188466 ◽

2020 ◽

Author(s):

Rahmatullah Roche ◽

Sutanu Bhattacharya ◽

Debswapna Bhattacharya

Keyword(s):

Protein Folding ◽

Protein Structure ◽

Membrane Proteins ◽

Ab Initio ◽

Protein Structure Prediction ◽

Hierarchical Structure ◽

Structure Determination ◽

Structure Prediction ◽

Structure Modeling ◽

Contact Maps

AbstractCrystallography and NMR system (CNS) is currently the de facto standard for fragment-free ab initio protein folding from inter-residue distance or contact maps. Despite its widespread use in protein structure prediction, CNS is a decade-old macromolecular structure determination system that was originally developed for solving macromolecular geometry from experimental restraints as opposed to predictive modeling driven by interaction map data. As such, the adaptation of the CNS experimental structure determination protocol for ab initio protein folding is intrinsically anomalous that may undermine the folding accuracy of computational protein structure prediction. In this paper, we propose a new CNS-free hierarchical structure modeling method called DConStruct for folding both soluble and membrane proteins driven by distance and contact information. Rigorous experimental validation shows that DConStruct attains much better reconstruction accuracy than CNS when tested with the same input contact map at varying contact thresholds. The hierarchical modeling with iterative self-correction employed in DConStruct scales at a much higher degree of folding accuracy than CNS with the increase in contact thresholds, ultimately approaching near-optimal reconstruction accuracy at higher-thresholded contact maps. The folding accuracy of DConStruct can be further improved by exploiting distance-based hybrid interaction maps at tri-level thresholding, as demonstrated by the better performance of our method in folding difficult free modeling targets from the 12th and 13th rounds of the Critical Assessment of techniques for protein Structure Prediction (CASP) experiments compared to several popular CNS- and fragment-based approaches, some of which even using much finer-grained distance maps than ours. Additional large-scale benchmarking shows that DConStruct can significantly improve the folding accuracy of membrane proteins compared to a CNS-based approach. These results collectively demonstrate the feasibility of greatly improving the accuracy of ab initio protein folding by optimally exploiting the information encoded in inter-residue interaction maps beyond what is possible by CNS.Author summaryPredicting the folded and functional 3-dimensional structure of a protein molecule from its amino acid sequence is of central importance to structural biology. Recently, promising advances have been made in ab initio protein folding due to the reasonably accurate estimation of inter-residue interaction maps at increasingly higher resolutions that range from binary contacts to finer-grained distances. Despite the progress in predicting the interaction maps, approaches for turning the residue-residue interactions projected in these maps into their precise spatial positioning heavily rely on a decade-old experimental structure determination protocol that is not suitable for predictive modeling. This paper presents a new hierarchical structure modeling method, DConStruct, which can better exploit the information encoded in the interaction maps at multiple granularities, from binary contact maps to distance-based hybrid maps at tri-level thresholding, for improved ab initio folding. Multiple large-scale benchmarking experiments show that our proposed method can substantially improve the folding accuracy for both soluble and membrane proteins compared to state-of-the-art approaches. DConStruct is licensed under the GNU General Public License v3 and freely available at https://github.com/Bhattacharya-Lab/DConStruct.

Download Full-text

Hybridized distance- and contact-based hierarchical structure modeling for folding soluble and membrane proteins

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008753 ◽

2021 ◽

Vol 17 (2) ◽

pp. e1008753

Author(s):

Rahmatullah Roche ◽

Sutanu Bhattacharya ◽

Debswapna Bhattacharya

Keyword(s):

Protein Folding ◽

Protein Structure ◽

Membrane Proteins ◽

Ab Initio ◽

Protein Structure Prediction ◽

Structure Determination ◽

Structure Prediction ◽

Structure Modeling ◽

Reconstruction Accuracy ◽

Contact Maps

Crystallography and NMR system (CNS) is currently a widely used method for fragment-free ab initio protein folding from inter-residue distance or contact maps. Despite its widespread use in protein structure prediction, CNS is a decade-old macromolecular structure determination system that was originally developed for solving macromolecular geometry from experimental restraints as opposed to predictive modeling driven by interaction map data. As such, the adaptation of the CNS experimental structure determination protocol for ab initio protein folding is intrinsically anomalous that may undermine the folding accuracy of computational protein structure prediction. In this paper, we propose a new CNS-free hierarchical structure modeling method called DConStruct for folding both soluble and membrane proteins driven by distance and contact information. Rigorous experimental validation shows that DConStruct attains much better reconstruction accuracy than CNS when tested with the same input contact map at varying contact thresholds. The hierarchical modeling with iterative self-correction employed in DConStruct scales at a much higher degree of folding accuracy than CNS with the increase in contact thresholds, ultimately approaching near-optimal reconstruction accuracy at higher-thresholded contact maps. The folding accuracy of DConStruct can be further improved by exploiting distance-based hybrid interaction maps at tri-level thresholding, as demonstrated by the better performance of our method in folding free modeling targets from the 12th and 13th rounds of the Critical Assessment of techniques for protein Structure Prediction (CASP) experiments compared to popular CNS- and fragment-based approaches and energy-minimization protocols, some of which even using much finer-grained distance maps than ours. Additional large-scale benchmarking shows that DConStruct can significantly improve the folding accuracy of membrane proteins compared to a CNS-based approach. These results collectively demonstrate the feasibility of greatly improving the accuracy of ab initio protein folding by optimally exploiting the information encoded in inter-residue interaction maps beyond what is possible by CNS.

Download Full-text

Complexity in Protein Folding: Simulation Meets Experiment

Current Physical Chemistrye ◽

10.2174/1877947611202010004 ◽

2012 ◽

Vol 2 (1) ◽

pp. 4-11

Author(s):

Amedeo Caflisch ◽

Peter Hamm

Keyword(s):

Protein Folding ◽

Folding Simulation

Download Full-text

Deep learning techniques have significantly impacted protein structure prediction and protein design

Current Opinion in Structural Biology ◽

10.1016/j.sbi.2021.01.007 ◽

2021 ◽

Vol 68 ◽

pp. 194-207

Author(s):

Robin Pearce ◽

Yang Zhang

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Protein Design ◽

Structure Prediction ◽

Learning Techniques

Download Full-text

Template-based prediction of protein structure with deep learning

BMC Genomics ◽

10.1186/s12864-020-07249-8 ◽

2020 ◽

Vol 21 (S11) ◽

Author(s):

Haicang Zhang ◽

Yufeng Shen

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Query Sequence ◽

Dynamic Programming Algorithm ◽

Tertiary Structure Prediction ◽

Protein Tertiary Structure ◽

Protein Threading ◽

Protein Tertiary Structure Prediction

Abstract Background Accurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. Results We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residue-residue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56, 13, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13’s TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16, 9 and 8% in terms of TM-score, respectively. Conclusions These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.

Download Full-text

Nanoscale lipid membrane mimetics in spin-labeling and electron paramagnetic resonance spectroscopy studies of protein structure and function

Nanotechnology Reviews ◽

10.1515/ntrev-2016-0080 ◽

2017 ◽

Vol 6 (1) ◽

pp. 75-92 ◽

Cited By ~ 6

Author(s):

Elka R. Georgieva

Keyword(s):

Electron Paramagnetic Resonance ◽

Protein Structure ◽

Membrane Proteins ◽

Membrane Protein ◽

Spin Labeling ◽

Structure And Function ◽

Protein Structure And Function ◽

Paramagnetic Resonance ◽

And Function ◽

Electron Paramagnetic

AbstractCellular membranes and associated proteins play critical physiological roles in organisms from all life kingdoms. In many cases, malfunction of biological membranes triggered by changes in the lipid bilayer properties or membrane protein functional abnormalities lead to severe diseases. To understand in detail the processes that govern the life of cells and to control diseases, one of the major tasks in biological sciences is to learn how the membrane proteins function. To do so, a variety of biochemical and biophysical approaches have been used in molecular studies of membrane protein structure and function on the nanoscale. This review focuses on electron paramagnetic resonance with site-directed nitroxide spin-labeling (SDSL EPR), which is a rapidly expanding and powerful technique reporting on the local protein/spin-label dynamics and on large functionally important structural rearrangements. On the other hand, adequate to nanoscale study membrane mimetics have been developed and used in conjunction with SDSL EPR. Primarily, these mimetics include various liposomes, bicelles, and nanodiscs. This review provides a basic description of the EPR methods, continuous-wave and pulse, applied to spin-labeled proteins, and highlights several representative applications of EPR to liposome-, bicelle-, or nanodisc-reconstituted membrane proteins.

Download Full-text

Improved protein structure prediction by deep learning irrespective of co-evolution information

Nature Machine Intelligence ◽

10.1038/s42256-021-00348-5 ◽

2021 ◽

Author(s):

Jinbo Xu ◽

Matthew McPartlon ◽

Jin Li

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction

Download Full-text