Sampling Multiple Scoring Functions Can Improve Protein Loop Structure Prediction Accuracy

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] RNA (ribonucleic acid) molecules play a variety of crucial roles in cellular functions at the level of transcription, translation and gene regulation. RNA functions are tied to structures. In parallel to the experimental determination of RNA structures, such as X-ray crystallography and NMR spectroscopy, which can be laborious, time-consuming and expensive, it is imperative to develop a reliable theoretical/computational model for RNA structure prediction from its sequence. We aim to develop a novel free energy-based model for RNA structures, especially for RNA loops and junctions. One of the major roadblocks for the physics-based RNA tertiary structure prediction is the evaluation of the entropy for RNA tertiary folds. In particular, the entropies of structures with multiple loops and helices can be highly convoluted due to the volume exclusion between the loops and helices. In the first project, we develop a new conformational entropy model for RNA structures consisting of multiple helices connected by cross-linked loops. The basic strategy of our approach is to decompose the whole structure into a number of three-body building blocks, where each building block consists of a loop and two helices that are directly connected to the two ends of the loop. The simple construct of the three-body system allows for accurate computation of the conformational entropy for each building block. Assembly of the building blocks gives the entropy of the whole structure. This approach enables treatment of a large class of RNA tertiary folds. Tests against exact computer enumeration indicate that the method can yield accurate results for the entropy. The method provide a solid first step toward a systematic development of an entropy and free energy model for complex tertiary folds for RNA and other biopolymer. In the second project, we developed a novel approach to the prediction of loop structures from the sequence. The current loop free energy parameters (such as the Turner rules) depend only on the loop length and ignore the loop sequence-dependence. Such an oversimplification can lead to significant inaccuracies in the prediction of loop structure and stability. Here we tackle the problem by extracting the sequence-dependent scoring functions from the known loop structures. Specifically, based on the survey of all the known RNA structures, we derive a set of virtual bond-based scoring functions for the different types of dinucleotides. To circumvent the problem of reference state selection, we apply an iterative method to extract the effective potential, based on the complete conformational ensemble. This new new method has two notable advantages: (1) the statistical potential is extracted from the complete conformational ensemble, including the nonnative structures, (2) the method predicts low-energy loop structures from the sequence without additional information such as the homologous structural template. With such a set of knowledge-based energy parameters, for a given sequence, we can successfully identify the native structure (the best-scored structure) from a set of structural decoys. Our extensive benchmark tests show consistently encouraging success rates in the coarse-grained loop structure predictions.

Download Full-text

Protein Loop Structure Prediction Using Conformational Space Annealing

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.6b00742 ◽

2017 ◽

Vol 57 (5) ◽

pp. 1068-1078 ◽

Cited By ~ 3

Author(s):

Seungryong Heo ◽

Juyong Lee ◽

Keehyoung Joo ◽

Hang-Cheol Shin ◽

Jooyoung Lee

Keyword(s):

Structure Prediction ◽

Conformational Space ◽

Loop Structure

Download Full-text

A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model

10.1101/205047 ◽

2017 ◽

Cited By ~ 1

Author(s):

Manato Akiyama ◽

Kengo Sato ◽

Yasubumi Sakakibara

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Prediction Accuracy ◽

Secondary Structure Prediction ◽

Training Data ◽

Support Vector ◽

Rna Secondary Structure Prediction ◽

Fine Grained

AbstractMotivation: A popular approach for predicting RNA secondary structure is the thermodynamic nearest neighbor model that finds a thermodynamically most stable secondary structure with the minimum free energy (MFE). For further improvement, an alternative approach that is based on machine learning techniques has been developed. The machine learning based approach can employ a fine-grained model that includes much richer feature representations with the ability to fit the training data. Although a machine learning based fine-grained model achieved extremely high performance in prediction accuracy, a possibility of the risk of overfitting for such model has been reported.Results: In this paper, we propose a novel algorithm for RNA secondary structure prediction that integrates the thermodynamic approach and the machine learning based weighted approach. Ourfine-grained model combines the experimentally determined thermodynamic parameters with a large number of scoring parameters for detailed contexts of features that are trained by the structured support vector machine (SSVM) with the ℓ1 regularization to avoid overfitting. Our benchmark shows that our algorithm achieves the best prediction accuracy compared with existing methods, and heavy overfitting cannot be observed.Availability: The implementation of our algorithm is available at https://github.com/keio-bioinformatics/mxfold.Contact:[email protected]

Download Full-text

Protein Loop Structure Prediction Methods

Encyclopedia of Optimization ◽

10.1007/978-0-387-74759-0_530 ◽

2008 ◽

pp. 3100-3105

Author(s):

Martin Mönnigmann ◽

Christodoulos A. Floudas

Keyword(s):

Structure Prediction ◽

Prediction Methods ◽

Loop Structure

Download Full-text

Smotifs as structural local descriptors of supersecondary elements: classification, completeness and applications

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2014-0016 ◽

2014 ◽

Vol 10 (4) ◽

Author(s):

Jaume Bonet ◽

Andras Fiser ◽

Baldo Oliva ◽

Narcis Fernandez-Fuentes

Keyword(s):

Protein Design ◽

Structure Prediction ◽

Protein Structures ◽

Regular Structure ◽

Loop Structure ◽

Apparent Lack ◽

Knowledge Based ◽

Limits Of Knowledge ◽

Folding Dynamics ◽

And Function

AbstractProtein structures are made up of periodic and aperiodic structural elements (i.e., α-helices, β-strands and loops). Despite the apparent lack of regular structure, loops have specific conformations and play a central role in the folding, dynamics, and function of proteins. In this article, we reviewed our previous works in the study of protein loops as local supersecondary structural motifs or Smotifs. We reexamined our works about the structural classification of loops (ArchDB) and its application to loop structure prediction (ArchPRED), including the assessment of the limits of knowledge-based loop structure prediction methods. We finalized this article by focusing on the modular nature of proteins and how the concept of Smotifs provides a convenient and practical approach to decompose proteins into strings of concatenated Smotifs and how can this be used in computational protein design and protein structure prediction.

Download Full-text

Context-Based Features Enhance Protein Secondary Structure Prediction Accuracy

Journal of Chemical Information and Modeling ◽

10.1021/ci400647u ◽

2014 ◽

Vol 54 (3) ◽

pp. 992-1002 ◽

Cited By ~ 31

Author(s):

Ashraf Yaseen ◽

Yaohang Li

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Prediction Accuracy ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction

Download Full-text

High-resolution structure prediction ofβ-barrel membrane proteins

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1716817115 ◽

2018 ◽

Vol 115 (7) ◽

pp. 1511-1516 ◽

Cited By ~ 7

Author(s):

Wei Tian ◽

Meishan Lin ◽

Ke Tang ◽

Jie Liang ◽

Hammad Naveed

Keyword(s):

Membrane Proteins ◽

Structure Prediction ◽

Prediction Accuracy ◽

Main Chain ◽

Structural Prediction ◽

Nmr Structures ◽

Genome Wide ◽

High Resolution Structure ◽

The Difference ◽

Resolution Structure

β-Barrel membrane proteins (βMPs) play important roles, but knowledge of their structures is limited. We have developed a method to predict their 3D structures. We predict strand registers and construct transmembrane (TM) domains of βMPs accurately, including proteins for which no prediction has been attempted before. Our method also accurately predicts structures from protein families with a limited number of sequences and proteins with novel folds. An average main-chain rmsd of 3.48 Å is achieved between predicted and experimentally resolved structures of TM domains, which is a significant improvement (>3 Å) over a recent study. For βMPs with NMR structures, the deviation between predictions and experimentally solved structures is similar to the difference among the NMR structures, indicating excellent prediction accuracy. Moreover, we can now accurately model the extended β-barrels and loops in non-TM domains, increasing the overall coverage of structure prediction by>30%. Our method is general and can be applied to genome-wide structural prediction of βMPs.

Download Full-text

Protein loop structure prediction with flexible stem geometries

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.20669 ◽

2005 ◽

Vol 61 (4) ◽

pp. 748-762 ◽

Cited By ~ 26

Author(s):

M. Mönnigmann ◽

C.A. Floudas

Keyword(s):

Structure Prediction ◽

Loop Structure ◽

Flexible Stem

Download Full-text

A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720018400255 ◽

2018 ◽

Vol 16 (06) ◽

pp. 1840025 ◽

Cited By ~ 5

Author(s):

Manato Akiyama ◽

Kengo Sato ◽

Yasubumi Sakakibara

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Prediction Accuracy ◽

Secondary Structure Prediction ◽

Training Data ◽

Support Vector ◽

Rna Secondary Structure Prediction ◽

Fine Grained

A popular approach for predicting RNA secondary structure is the thermodynamic nearest-neighbor model that finds a thermodynamically most stable secondary structure with minimum free energy (MFE). For further improvement, an alternative approach that is based on machine learning techniques has been developed. The machine learning-based approach can employ a fine-grained model that includes much richer feature representations with the ability to fit the training data. Although a machine learning-based fine-grained model achieved extremely high performance in prediction accuracy, a possibility of the risk of overfitting for such a model has been reported. In this paper, we propose a novel algorithm for RNA secondary structure prediction that integrates the thermodynamic approach and the machine learning-based weighted approach. Our fine-grained model combines the experimentally determined thermodynamic parameters with a large number of scoring parameters for detailed contexts of features that are trained by the structured support vector machine (SSVM) with the [Formula: see text] regularization to avoid overfitting. Our benchmark shows that our algorithm achieves the best prediction accuracy compared with existing methods, and heavy overfitting cannot be observed. The implementation of our algorithm is available at https://github.com/keio-bioinformatics/mxfold .

Download Full-text

Prediction of RNA Secondary Structure Using Quantum-inspired Genetic Algorithms

Current Bioinformatics ◽

10.2174/1574893614666190916154103 ◽

2020 ◽

Vol 15 (2) ◽

pp. 135-143

Author(s):

Sha Shi ◽

Xin-Li Zhang ◽

Le Yang ◽

Wei Du ◽

Xian-Li Zhao ◽

...

Keyword(s):

Quantum Computing ◽

Secondary Structure ◽

Rna Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Prediction Accuracy ◽

Secondary Structure Prediction ◽

Minimum Free Energy ◽

Rna Sequences ◽

New Strategy

Background: The prediction of RNA secondary structure using optimization algorithms is key to understand the real structure of an RNA. Evolutionary algorithms (EAs) are popular strategies for RNA secondary structure prediction. However, compared to most state-of-the-art software based on DPAs, the performances of EAs are a bit far from satisfactory. Objective: Therefore, a more powerful strategy is required to improve the performances of EAs when applied to the prediciton of RNA secondary structures. Methods: The idea of quantum computing is introduced here yielding a new strategy to find all possible legal paired-bases with the constraint of minimum free energy. The sate of a stem pool with size N is encoded as a population of QGA, which is represented by N quantum bits but not classical bits. The updating of populations is accomplished by so-called quantum crossover operations, quantum mutation operations and quantum rotation operations. Results: The numerical results show that the performances of traditional EAs are significantly improved by using QGA with regard to not only prediction accuracy and sensitivity but also complexity. Moreover, for RNA sequences with middle-short length, QGA even improves the state-of-art software based on DPAs in terms of both prediction accuracy and sensitivity. Conclusion: This work sheds an interesting light on the applications of quantum computing on RNA structure prediction.

Download Full-text