Knowledge-based prediction of protein structures and the design of novel molecules

T. L. Blundell; B. L. Sibanda; M. J. E. Sternberg; J. M. Thornton

doi:10.1038/326347a0

Applications of Knowledge Based Mean Fields in the Determination of Protein Structures

NATO ASI Series - Statistical Mechanics, Protein Structure, and Protein Substrate Interactions ◽

10.1007/978-1-4899-1349-4_25 ◽

1994 ◽

pp. 297-315 ◽

Cited By ~ 2

Author(s):

Manfred J. Sippl ◽

Markus Jaritz ◽

Manfred Hendlich ◽

Maria Ortner ◽

Peter Lackner

Keyword(s):

Protein Structures ◽

Knowledge Based ◽

Mean Fields

Download Full-text

PB-kPRED: knowledge-based prediction of protein backbone conformation using a structural alphabet

10.1101/127423 ◽

2017 ◽

Author(s):

Iyanar Vetrivel ◽

Swapnil Mahajan ◽

Manoj Tyagi ◽

Lionel Hoffmann ◽

Yves-Henri Sanejouand ◽

...

Keyword(s):

Protein Structures ◽

Scoring Function ◽

Evolutionary Information ◽

Structural Alphabet ◽

Local Structures ◽

Knowledge Based ◽

Scanning Strategies ◽

Accuracy Of Prediction ◽

The Impact ◽

Protein Blocks

AbstractLibraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks (PBs), is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of PBs. Thus, predicting the local structure of a protein in terms of protein blocks is a step towards the objective of predicting its 3-D structure. Here a new approach, kPred, is proposed towards this aim that is independent of the evolutionary information available. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) apply a purely knowledge-based algorithm, not relying on secondary structure predictions or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures.Based on the strategy used for scanning the database, the method was able to achieve efficient mean Q16 accuracies between 40.8% and 66.3% for a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. The impact of these scanning strategies on the prediction was evaluated and is discussed. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.

Download Full-text

Improving Prediction Accuracy via Subspace Modeling in a Statistical Geometry Based Computational Protein Mutagenesis

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2010100103 ◽

2010 ◽

Vol 1 (4) ◽

pp. 54-68

Author(s):

Majid Masso

Keyword(s):

Bacteriophage T4 ◽

Protein Structures ◽

Dimensional Subspace ◽

Sequence Length ◽

Protein Chain ◽

Contact Potential ◽

Knowledge Based ◽

Environmental Perturbations ◽

Subspace Modeling ◽

Residue Substitution

A computational mutagenesis is detailed whereby each single residue substitution in a protein chain of primary sequence length N is represented as a sparse N-dimensional feature vector, whose M << N nonzero components locally quantify environmental perturbations occurring at the mutated position and its neighbors in the protein structure. The methodology makes use of both the Delaunay tessellation algorithm for representing protein structures, as well as a four-body, knowledge based, statistical contact potential. Feature vectors for each subset of mutants due to all possible residue substitutions at a particular position cohabit the same M-dimensional subspace, where the value of M and the identities of the M nonzero components are similarly position dependent. The approach is used to characterize a large experimental dataset of single residue substitutions in bacteriophage T4 lysozyme, each categorized as either unaffected or affected based on the measured level of mutant activity relative to that of the native protein. Performance of a single classifier trained with the collective set of mutants in N-space is compared to that of an ensemble of position-specific classifiers trained using disjoint mutant subsets residing in significantly smaller subspaces. Results suggest that significant improvements can be achieved through subspace modeling.

Download Full-text

KORP: knowledge-based 6D potential for fast protein and loop modeling

Bioinformatics ◽

10.1093/bioinformatics/btz026 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3013-3019 ◽

Cited By ~ 13

Author(s):

José Ramón López-Blanco ◽

Pablo Chacón

Keyword(s):

Structure Prediction ◽

Protein Structures ◽

Joint Probability ◽

Protein Modeling ◽

Supplementary Information ◽

Joint Probability Distribution ◽

Loop Modeling ◽

Statistical Potentials ◽

Knowledge Based ◽

Backbone Atoms

Abstract Motivation Knowledge-based statistical potentials constitute a simpler and easier alternative to physics-based potentials in many applications, including folding, docking and protein modeling. Here, to improve the effectiveness of the current approximations, we attempt to capture the six-dimensional nature of residue–residue interactions from known protein structures using a simple backbone-based representation. Results We have developed KORP, a knowledge-based pairwise potential for proteins that depends on the relative position and orientation between residues. Using a minimalist representation of only three backbone atoms per residue, KORP utilizes a six-dimensional joint probability distribution to outperform state-of-the-art statistical potentials for native structure recognition and best model selection in recent critical assessment of protein structure prediction and loop-modeling benchmarks. Compared with the existing methods, our side-chain independent potential has a lower complexity and better efficiency. The superior accuracy and robustness of KORP represent a promising advance for protein modeling and refinement applications that require a fast but highly discriminative energy function. Availability and implementation http://chaconlab.org/modeling/korp. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Smotifs as structural local descriptors of supersecondary elements: classification, completeness and applications

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2014-0016 ◽

2014 ◽

Vol 10 (4) ◽

Author(s):

Jaume Bonet ◽

Andras Fiser ◽

Baldo Oliva ◽

Narcis Fernandez-Fuentes

Keyword(s):

Protein Design ◽

Structure Prediction ◽

Protein Structures ◽

Regular Structure ◽

Loop Structure ◽

Apparent Lack ◽

Knowledge Based ◽

Limits Of Knowledge ◽

Folding Dynamics ◽

And Function

AbstractProtein structures are made up of periodic and aperiodic structural elements (i.e., α-helices, β-strands and loops). Despite the apparent lack of regular structure, loops have specific conformations and play a central role in the folding, dynamics, and function of proteins. In this article, we reviewed our previous works in the study of protein loops as local supersecondary structural motifs or Smotifs. We reexamined our works about the structural classification of loops (ArchDB) and its application to loop structure prediction (ArchPRED), including the assessment of the limits of knowledge-based loop structure prediction methods. We finalized this article by focusing on the modular nature of proteins and how the concept of Smotifs provides a convenient and practical approach to decompose proteins into strings of concatenated Smotifs and how can this be used in computational protein design and protein structure prediction.

Download Full-text

All-Atom Four-Body Knowledge-Based Statistical Potentials to Distinguish Native Protein Structures from Nonnative Folds

BioMed Research International ◽

10.1155/2017/5760612 ◽

2017 ◽

Vol 2017 ◽

pp. 1-17 ◽

Cited By ~ 3

Author(s):

Majid Masso

Keyword(s):

Structure Prediction ◽

Protein Structures ◽

Binding Energies ◽

Coarse Grained ◽

Amber Force Field ◽

Statistical Potentials ◽

Knowledge Based ◽

Native Proteins ◽

Interacting Atoms ◽

Atomic Coordinates

Recent advances in understanding protein folding have benefitted from coarse-grained representations of protein structures. Empirical energy functions derived from these techniques occasionally succeed in distinguishing native structures from their corresponding ensembles of nonnative folds or decoys which display varying degrees of structural dissimilarity to the native proteins. Here we utilized atomic coordinates of single protein chains, comprising a large diverse training set, to develop and evaluate twelve all-atom four-body statistical potentials obtained by exploring alternative values for a pair of inherent parameters. Delaunay tessellation was performed on the atomic coordinates of each protein to objectively identify all quadruplets of interacting atoms, and atomic potentials were generated via statistical analysis of the data and implementation of the inverted Boltzmann principle. Our potentials were evaluated using benchmarking datasets from Decoys-‘R’-Us, and comparisons were made with twelve other physics- and knowledge-based potentials. Ranking 3rd, our best potential tied CHARMM19 and surpassed AMBER force field potentials. We illustrate how a generalized version of our potential can be used to empirically calculate binding energies for target-ligand complexes, using HIV-1 protease-inhibitor complexes for a practical application. The combined results suggest an accurate and efficient atomic four-body statistical potential for protein structure prediction and assessment.

Download Full-text

Identification of native protein structures captured by principal interactions

BMC Bioinformatics ◽

10.1186/s12859-019-3186-6 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Mehdi Mirzaie

Keyword(s):

Protein Structure ◽

Protein Structures ◽

Interaction Model ◽

Principal Component ◽

Potential Functions ◽

Native Protein ◽

Knowledge Based ◽

Total Potential ◽

Full Interaction ◽

Interaction Types

Abstract Background Evaluation of protein structure is based on trustworthy potential function. The total potential of a protein structure is approximated as the summation of all pair-wise interaction potentials. Knowledge-based potentials (KBP) are one type of potential functions derived by known experimentally determined protein structures. Although several KBP functions with different methods have been introduced, the key interactions that capture the total potential have not studied yet. Results In this study, we seek the interaction types that preserve as much of the total potential as possible. We employ a procedure based on the principal component analysis (PCA) to extract the significant and key interactions in native protein structures. We call these interactions as principal interactions and show that the results of the model that considers only these interactions are very close to the full interaction model that considers all interactions in protein fold recognition. In fact, the principal interactions maintain the discriminative power of the full interaction model. This method was evaluated on 3 KBPs with different contact definitions and thresholds of distance and revealed that their corresponding principal interactions are very similar and have a lot in common. Additionally, the principal interactions consisted of 20 % of the full interactions on average, and they are between residues, which are considered important in protein folding. Conclusions This work shows that all interaction types are not equally important in discrimination of native structure. The results of the reduced model based on principal interactions that were very close to the full interaction model suggest that a new strategy is needed to capture the role of remaining interactions (non-principal interactions) to improve the power of knowledge-based potential functions.

Download Full-text

Improving Prediction Accuracy via Subspace Modeling

Computational Knowledge Discovery for Bioinformatics Research ◽

10.4018/978-1-4666-1785-8.ch003 ◽

2013 ◽

pp. 33-48

Author(s):

Majid Masso

Keyword(s):

Bacteriophage T4 ◽

Protein Structures ◽

Dimensional Subspace ◽

Sequence Length ◽

Protein Chain ◽

Contact Potential ◽

Knowledge Based ◽

Environmental Perturbations ◽

Subspace Modeling ◽

Residue Substitution

A computational mutagenesis is detailed whereby each single residue substitution in a protein chain of primary sequence length N is represented as a sparse N-dimensional feature vector, whose M << N nonzero components locally quantify environmental perturbations occurring at the mutated position and its neighbors in the protein structure. The methodology makes use of both the Delaunay tessellation algorithm for representing protein structures, as well as a four-body, knowledge based, statistical contact potential. Feature vectors for each subset of mutants due to all possible residue substitutions at a particular position cohabit the same M-dimensional subspace, where the value of M and the identities of the M nonzero components are similarly position dependent. The approach is used to characterize a large experimental dataset of single residue substitutions in bacteriophage T4 lysozyme, each categorized as either unaffected or affected based on the measured level of mutant activity relative to that of the native protein. Performance of a single classifier trained with the collective set of mutants in N-space is compared to that of an ensemble of position-specific classifiers trained using disjoint mutant subsets residing in significantly smaller subspaces. Results suggest that significant improvements can be achieved through subspace modeling.

Download Full-text