Accurate prediction of protein structures and interactions using a three-track neural network

Accurate prediction of protein structures and interactions using a 3-track network

10.1101/2021.06.14.448402 ◽

2021 ◽

Author(s):

Minkyung Baek ◽

Frank DiMaio ◽

Ivan Anishchenko ◽

Justas Dauparas ◽

Sergey Ovchinnikov ◽

...

Keyword(s):

Protein Complexes ◽

Protein Structures ◽

Biological Research ◽

Sequence Information ◽

Network Architectures ◽

Distance Map ◽

X Ray Crystallography ◽

Unknown Structure ◽

Rapid Generation ◽

Traditional Approaches

DeepMind presented remarkably accurate protein structure predictions at the CASP14 conference. We explored network architectures incorporating related ideas and obtained the best performance with a 3-track network in which information at the 1D sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The 3-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables rapid solution of challenging X-ray crystallography and cryo-EM structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate models of protein-protein complexes from sequence information alone, short circuiting traditional approaches which require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.

Download Full-text

Prediction of Structural and Functional Aspects of Protein

Advances in Secure Computing, Internet Services, and Applications - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-4940-8.ch016 ◽

2014 ◽

pp. 317-333

Author(s):

Arun G. Ingale

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Sequence Information ◽

Predict Protein Structure ◽

Basic Ideas

To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.

Download Full-text

A Basic Molecular Analysis of the Diabetic Antigen GAD by Homology Modelling. Principles of the Method and Understanding of Antigenicity and Binding Sites

Pteridines ◽

10.1515/pteridines.2007.18.1.79 ◽

2007 ◽

Vol 18 (1) ◽

pp. 79-94

Author(s):

Marco Wiltgen ◽

Gernot P. Tilz

Keyword(s):

Active Site ◽

Protein Sequence ◽

Structure Prediction ◽

Structural Information ◽

Homology Modelling ◽

Protein Structures ◽

Dopa Decarboxylase ◽

Gad 65 ◽

Unknown Structure ◽

Introductory Paper

Abstract Functional specificity of a protein is linked to its structure. A growing section of bioinformatics deals with the prediction and visualization of protein 3D structures. In homology modelling, a protein sequence with an unknown structure is aligned with sequences of known protein structures. By exploiting structural information from the known configurations, the new structure can be predicted. In this introductory paper, we will present the principles of homology modelling and demonstrate the method used, by determining the structure of the enzyme glutamic decarboxylase (GAD 65). This protein is an autoantigen involved in several human autoimmune diseases. We will illustrate the different steps in structure prediction of GAD 65 by use of two experimentally determined structures of pig kidney DOPA decarboxylase (one structure in complex with the inhibitor carbidopa) as templates. The resulting model of GAD 65 provides detailed information about the active site of the protein and selected epitopes. By analysis of the interactions between the DOPA decarboxylase with the inhibitor carbidopa, the residues of the GAD 65 active site can be identified via the sequence alignment between DOPA and GAD 65. The locations of known epitopes in the molecule are visualized in special representations giving insights into mechanisms of antigenicity. Hydrophobicity analysis gives first hints for the adherence ability of GAD 65 to the cell membrane. Homology modelling is at present one of the most efficient techniques to provide accurate structural models of proteins. It is expected that in few years, for every new determined protein sequence, at least one member with a known structure of the same protein family will be available, which will steadily increase the importance and applicability of homology modelling.

Download Full-text

Determining protein structures using genetics

10.1101/303875 ◽

2018 ◽

Cited By ~ 6

Author(s):

Jörn M. Schmiedel ◽

Ben Lehner

Keyword(s):

Structure Determination ◽

Low Cost ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Biological Research ◽

X Ray Crystallography ◽

Close Relationship ◽

And Function

SummaryDetermining the three dimensional structures of macromolecules is a major goal of biological research because of the close relationship between structure and function. Structure determination usually relies on physical techniques including x-ray crystallography, NMR spectroscopy and cryo-electron microscopy. Here we present a method that allows the high-resolution three-dimensional structure of a biological macromolecule to be determined only from measurements of the activity of mutant variants of the molecule. This genetic approach to structure determination relies on the quantification of genetic interactions (epistasis) between mutations and the discrimination of direct from indirect interactions. This provides a new experimental strategy for structure determination, with the potential to reveal functional and in vivo structural conformations at low cost and high throughput.

Download Full-text

A COMPARATIVE STUDY OF PROTEIN TERTIARY STRUCTURE PREDICTION METHODS

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2014.1168 ◽

2014 ◽

pp. 15-18

Author(s):

CHANDRAYANI N. ROKDE ◽

DR.MANALI KSHIRSAGAR

Keyword(s):

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Sequence Data ◽

Protein Structures ◽

Three Dimensional ◽

Data Bank ◽

Dimensional Structure ◽

X Ray Crystallography ◽

Protein Tertiary Structure Prediction

Protein structure prediction (PSP) from amino acid sequence is one of the high focus problems in bioinformatics today. This is due to the fact that the biological function of the protein is determined by its three dimensional structure. The understanding of protein structures is vital to determine the function of a protein and its interaction with DNA, RNA and enzyme. Thus, protein structure is a fundamental area of computational biology. Its importance is intensed by large amounts of sequence data coming from PDB (Protein Data Bank) and the fact that experimentally methods such as X-ray crystallography or Nuclear Magnetic Resonance (NMR)which are used to determining protein structures remains very expensive and time consuming. In this paper, different types of protein structures and methods for its prediction are described.

Download Full-text

Prediction of Structural and Functional Aspects of Protein

Pharmaceutical Sciences ◽

10.4018/978-1-5225-1762-7.ch021 ◽

2017 ◽

pp. 551-568

Author(s):

Arun G. Ingale

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Sequence Information ◽

Predict Protein Structure ◽

Basic Ideas

To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.

Download Full-text

A Multi-Layer LSTM-Time-Density-Softmax (LDS) approach for protein structure prediction using deep learning

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200918124012 ◽

2020 ◽

Vol 13 ◽

Author(s):

Gururaj Tejeshwar ◽

Siddesh Gaddadadevra Mat

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Primary Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Short Term Memory ◽

Protein Structures ◽

Protein Secondary Structure ◽

Sequence Information ◽

Structure Information

Introduction: The primary structure of the protein is a polypeptide chain made up of a sequence of amino acids. What happens due to interaction between the atoms of the backbone is that it forms within a polypeptide a folded structure which is very much within the secondary structure. These alignments can be made more accurate by the inclusion of secondary structure information. Objective: It is difficult to identify the sequence information embedded in the secondary structure of the protein. However, Deep learning methods can be used for solving the identification of the sequence information in the protein structures. Methods: The scope of the proposed work is to increase the accuracy of identifying the sequence information in the primary structure and the tertiary structure, thereby increasing the accuracy of the predicted protein secondary structure (PSS). In this proposed work, homology is eliminated by a Recurrent Neural Network (RNN) based network that consists of three layers namely bi-directional Long Short term Memory (LSTM), time distributed layer and Softmax layer. Results: The proposed LDS model achieves an accuracy of approx. 86% for the prediction of the three-state secondary structure of the protein. Conclusion: The gap between the number of protein primary structures and secondary structures we know is huge and increasing. Machine learning is trying to reduce this gap. In most of the other pre attempts in predicting the secondary structure of proteins the data is divided according to homology of the proteins. This limits the efficiency of the predicting model and limits the inputs given to such models. Hence in our model homology has not been considered while collecting the data for training or testing out model. This has led to our model to not be affected by the homology of the protein fed to it and hence remove that restriction, so any protein can be fed to it.

Download Full-text

Sequence Specific Dihedral Angle Distribution: Application in Protein Structure Prediction and Evaluation

Plant Tissue Culture and Biotechnology ◽

10.3329/ptcb.v19i2.5439 ◽

1970 ◽

Vol 19 (2) ◽

pp. 217-226

Author(s):

S. M. Minhaz Ud-Dean ◽

Mahdi Muhammad Moosa

Keyword(s):

Protein Structure ◽

Dihedral Angle ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Protein Structures ◽

Angle Distribution ◽

Ramachandran Plot ◽

Specific Data ◽

Specific Distribution ◽

Structure Evaluation

Protein structure prediction and evaluation is one of the major fields of computational biology. Estimation of dihedral angle can provide information about the acceptability of both theoretically predicted and experimentally determined structures. Here we report on the sequence specific dihedral angle distribution of high resolution protein structures available in PDB and have developed Sasichandran, a tool for sequence specific dihedral angle prediction and structure evaluation. This tool will allow evaluation of a protein structure in pdb format from the sequence specific distribution of Ramachandran angles. Additionally, it will allow retrieval of the most probable Ramachandran angles for a given sequence along with the sequence specific data. Key words: Torsion angle, φ-ψ distribution, sequence specific ramachandran plot, Ramasekharan, protein structure appraisal D.O.I. 10.3329/ptcb.v19i2.5439 Plant Tissue Cult. & Biotech. 19(2): 217-226, 2009 (December)

Download Full-text

Improved Sampling Strategies for Protein Model Refinement based on Molecular Dynamics Simulation

10.26434/chemrxiv.13299197.v1 ◽

2020 ◽

Author(s):

Lim Heo ◽

Collin Arbour ◽

Michael Feig

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulation ◽

Structure Prediction ◽

Protein Structures ◽

Conformational Space ◽

Dynamics Simulation ◽

Model Refinement ◽

Protein Model ◽

Lower Accuracy ◽

Simulation Based

Protein structures provide valuable information for understanding biological processes. Protein structures can be determined by experimental methods such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, or cryogenic electron microscopy. As an alternative, in silico methods can be used to predict protein structures. Those methods utilize protein structure databases for structure prediction via template-based modeling or for training machine-learning models to generate predictions. Structure prediction for proteins distant from proteins with known structures often results in lower accuracy with respect to the true physiological structures. Physics-based protein model refinement methods can be applied to improve model accuracy in the predicted models. Refinement methods rely on conformational sampling around the predicted structures, and if structures closer to the native states are sampled, improvements in the model quality become possible. Molecular dynamics simulations have been especially successful for improving model qualities but although consistent refinement can be achieved, the improvements in model qualities are still moderate. To extend the refinement performance of a simulation-based protocol, we explored new schemes that focus on an optimized use of biasing functions and the application of increased simulation temperatures. In addition, we tested the use of alternative initial models so that the simulations can explore conformational space more broadly. Based on the insight of this analysis we are proposing a new refinement protocol that significantly outperformed previous state-of-the-art molecular dynamics simulation-based protocols in the benchmark tests described here. <br>

Download Full-text

Expanding our knowledge of the protein universe: Modelling of protein structures

Acta Crystallographica Section A Foundations and Advances ◽

10.1107/s2053273314095084 ◽

2014 ◽

Vol 70 (a1) ◽

pp. C491-C491

Author(s):

Jürgen Haas ◽

Alessandro Barbato ◽

Tobias Schmidt ◽

Steven Roth ◽

Andrew Waterhouse ◽

...

Keyword(s):

Computational Modeling ◽

Structure Prediction ◽

Structural Information ◽

Protein Structures ◽

Model Organism ◽

Data Bank ◽

Continuous Model ◽

Structure Modeling ◽

Structure Comparison ◽

Modeling And Prediction

Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing goal in structural biology. Over the last two decades, a paradigm shift has occurred: starting from a large "knowledge gap" between the huge number of protein sequences compared to a small number of experimentally known structures, today, some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Methods for structure modeling and prediction have made substantial progress of the last decades, and template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. However, computational modeling and prediction techniques often fall short in accuracy compared to high-resolution experimental structures, and it is often difficult to convey the expected accuracy and structural variability of a specific model. Retrospectively assessing the quality of blind structure prediction in comparison to experimental reference structures allows benchmarking the state-of-the-art in structure prediction and identifying areas which need further development. The Critical Assessment of Structure Prediction (CASP) experiment has for the last 20 years assessed the progress in the field of protein structure modeling based on predictions for ca. 100 blind prediction targets per experiment which are carefully evaluated by human experts. The "Continuous Model EvaluatiOn" (CAMEO) project aims to provide a fully automated blind assessment for prediction servers based on weekly pre-released sequences of the Protein Data Bank PDB. CAMEO has been made possible by the development of novel scoring methods such as lDDT, which are robust against domain movements to allow for automated continuous structure comparison without human intervention.

Download Full-text