S-Motifs as a New Approach to Secondary Structure Prediction: Comparison with State of the Art Methods

Ivan Popov

doi:10.5504/bbeq.2012.0017

ProteinUnet2 for Fast Protein Secondary Structure Prediction: A Step Towards Proper Evaluation

10.21203/rs.3.rs-900318/v1 ◽

2021 ◽

Author(s):

Katarzyna Stapor ◽

Krzysztof Kotowski ◽

Tomasz Smolarczyk ◽

Irena Roterman

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

State Of The Art ◽

Protein Secondary Structure ◽

Evaluation Study ◽

Practical Significance ◽

Statistical Methodology ◽

Extensive Evaluation ◽

Benchmark Datasets

Abstract Background: The importance of protein secondary structure (SS) prediction is widely known, its solution enables learning about the role of a protein in organisms. As the experimental methods are expensive and sometimes impossible, many SS predictors, mainly based on different machine learning methods have been proposed for many years. SS prediction as the imbalanced classification problem should not be judged by the commonly used Q3/Q8 metrics. Moreover, as the benchmark datasets are not random samples, the classical statistical null hypothesis testing based on the Neyman-Pearson approach is not appropriate. Also, the state-of-the-art predictors have usually relatively long prediction times.Results: We present a new deep network ProteinUnet2 for SS prediction which is based on U-Net convolutional architecture. We also propose a new statistical methodology for prediction performance assessment based on the significance from Fisher-Pitman permutation tests accompanied by practical significance measured by Cohen’s effect size. Through an extensive evaluation study, we report the performance of ProteinUnet2 in comparison with two state-of-the-art methods SAINT and SPOT-1D on benchmark datasets TEST2016, TEST2018, and CASP12. Conclusions: Our results suggest that ProteinUnet2 has much shorter prediction times while maintaining (or outperforming) the mentioned predictors. We strongly believe that our proposed statistical methodology will be adopted and used (and even expanded) by the research community.

Download Full-text

New Approach in Genetic Algorithm for RNA Secondary Structure Prediction

Journal of Advances in Information Technology ◽

10.12720/jait.11.4.249-258 ◽

2020 ◽

Vol 11 (4) ◽

pp. 249-258

Author(s):

Binh Doan Duy ◽

◽

Minh Tuan Pham ◽

Long Dang Duc ◽

Long Dang Duc

Keyword(s):

Genetic Algorithm ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Secondary Structure Prediction ◽

Rna Secondary Structure Prediction ◽

New Approach

Download Full-text

Prediction of Protein Secondary Structure

Jurnal Teknologi ◽

10.11113/jt.v35.605 ◽

2012 ◽

Author(s):

Satya Nanda Vel Arjunan ◽

Safaai Deris ◽

Rosli Md Illias

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Structure Prediction ◽

Large Scale ◽

Secondary Structure Prediction ◽

State Of The Art ◽

Protein Structures ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction ◽

General Guide

Dengan wujudnya projek jujukan DNA secara besar–besaran, teknik yang tepat untuk meramalkan struktur protein diperlukan. Masalah meramalkan struktur protein daripada jujukan DNA pada dasarnya masih belum dapat diselesaikan walaupun kajian intensif telah dilakukan selama lebih daripada tiga dekad. Dalam kertas kerja ini, teori asas struktur protein akan dibincangkan sebagai panduan umum bagi kajian peramalan struktur protein sekunder. Analisis jujukan terkini serta prinsip yang digunakan dalam teknik–teknik tersebut akan diterangkan. Kata kunci: Peramalan struktur sekunder protein; Rangkaian Neural In the wake of large-scale DNA sequencing projects, accurate tools are needed to predict protein structures. The problem of predicting protein structure from DNA sequence remains fundamentally unsolved even after more than three decades of intensive research. In this paper, fundamental theory of the protein structure will be presented as a general guide to protein secondary structure prediction research. An overview of the state–of–the–art in sequence analysis and some principles of the methods involved wil be described. Key words: Protein secondary structure prediction; Neural networks

Download Full-text

ExpertRNA: A new framework for RNA structure prediction

10.1101/2021.01.18.427087 ◽

2021 ◽

Author(s):

Menghan Liu ◽

Giulia Pedrielli ◽

Erik Poppleton ◽

Petr Šulc ◽

Dimitri P. Bertsekas

Keyword(s):

Free Energy ◽

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

State Of The Art ◽

Data Driven ◽

Prediction Algorithm ◽

Data Sets ◽

Great Effort ◽

Parametric Data

AbstractRibonucleic acid (RNA) is a fundamental biological molecule that is essential to all living organisms, performing a versatile array of cellular tasks. The function of many RNA molecules is strongly related to the structure it adopts. As a result, great effort is being dedicated to the design of efficient algorithms that solve the “folding problem”: given a sequence of nucleotides, return a probable list of base pairs, referred to as the secondary structure prediction. Early algorithms have largely relied on finding the structure with minimum free energy. However, the predictions rely on effective simplified free energy models that may not correctly identify the correct structure as the one with the lowest free energy. In light of this, new, data-driven approaches that not only consider free energy, but also use machine learning techniques to learn motifs have also been investigated, and have recently been shown to outperform free energy based algorithms on several experimental data sets.In this work, we introduce the new ExpertRNA algorithm that provides a modular framework which can easily incorporate an arbitrary number of rewards (free energy or non-parametric/data driven) and secondary structure prediction algorithms. We argue that this capability of ExpertRNA has the potential to balance out different strengths and weaknesses of state-of-the-art folding tools. We test the ExpertRNA on several RNA sequence-structure data sets, and we compare the performance of ExpertRNA against a state-of-the-art folding algorithm. We find that ExpertRNA produces, on average, more accurate predictions than the structure prediction algorithm used, thus validating the promise of the approach.

Download Full-text

Protein secondary structure prediction: A survey of the state of the art

Journal of Molecular Graphics and Modelling ◽

10.1016/j.jmgm.2017.07.015 ◽

2017 ◽

Vol 76 ◽

pp. 379-402 ◽

Cited By ~ 28

Author(s):

Qian Jiang ◽

Xin Jin ◽

Shin-Jye Lee ◽

Shaowen Yao

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

State Of The Art ◽

Protein Secondary Structure ◽

The State ◽

Protein Secondary Structure Prediction

Download Full-text

A new approach to secondary structure evaluation: Secondary structure prediction of porcine adenylate kinase and yeast guanylate kinase by CD spectroscopy of overlapping synthetic peptide segments

Biopolymers ◽

10.1002/(sici)1097-0282(199702)41:2<213::aid-bip8>3.0.co;2-w ◽

1997 ◽

Vol 41 (2) ◽

pp. 213-231 ◽

Cited By ~ 7

Author(s):

Henrik W. Behrends ◽

Gerd Folkers ◽

Annette G. Beck-Sickinger

Keyword(s):

Secondary Structure ◽

Adenylate Kinase ◽

Synthetic Peptide ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Cd Spectroscopy ◽

Guanylate Kinase ◽

New Approach ◽

Structure Evaluation ◽

Peptide Segments

Download Full-text

Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction

Scientific Reports ◽

10.1038/s41598-019-48786-x ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 12

Author(s):

Mirko Torrisi ◽

Manaz Kaleel ◽

Gianluca Pollastri

Keyword(s):

Neural Networks ◽

Secondary Structure ◽

Convolutional Neural Networks ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

State Of The Art ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction

Download Full-text

Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization

Bioinformatics ◽

10.1093/bioinformatics/btaa336 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i317-i325

Author(s):

Spencer Krieger ◽

John Kececioglu

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Nearest Neighbor ◽

Secondary Structure Prediction ◽

State Of The Art ◽

Hybrid Approach ◽

Protein Secondary Structure ◽

Nearest Neighbor Search ◽

Protein Secondary Structure Prediction ◽

Neighbor Search

Abstract Motivation Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. Method We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. Results On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q8 accuracy by more than 2–10%, and Q3 accuracy by more than 1–3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction. Availability and implementation A preliminary implementation in a new tool we call Nnessy is available free for non-commercial use at http://nnessy.cs.arizona.edu.

Download Full-text

The influence of dataset homology and a rigorous evaluation strategy on protein secondary structure prediction

PLoS ONE ◽

10.1371/journal.pone.0254555 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0254555

Author(s):

Teng-Ruei Chen ◽

Chia-Hua Lo ◽

Sheng-Hung Juan ◽

Wei-Cheng Lo

Keyword(s):

Secondary Structure ◽

Sequence Homology ◽

Structure Prediction ◽

Large Scale ◽

Prediction Models ◽

Secondary Structure Prediction ◽

State Of The Art ◽

Protein Secondary Structure ◽

Substantial Improvement ◽

Protein Secondary Structures

The secondary structure prediction (SSP) of proteins has long been an essential structural biology technique with various applications. Despite its vital role in many research and industrial fields, in recent years, as the accuracy of state-of-the-art secondary structure predictors approaches the theoretical upper limit, SSP has been considered no longer challenging or too challenging to make advances. With the belief that the substantial improvement of SSP will move forward many fields depending on it, we conducted this study, which focused on three issues that have not been noticed or thoroughly examined yet but may have affected the reliability of the evaluation of previous SSP algorithms. These issues are all about the sequence homology between or within the developmental and evaluation datasets. We thus designed many different homology layouts of datasets to train and evaluate SSP prediction models. Multiple repeats were performed in each experiment by random sampling. The conclusions obtained with small experimental datasets were verified with large-scale datasets using state-of-the-art SSP algorithms. Very different from the long-established assumption, we discover that the sequence homology between query datasets for training, testing, and independent tests exerts little influence on SSP accuracy. Besides, the sequence homology redundancy between or within most datasets would make the accuracy of an SSP algorithm overestimated, while the redundancy within the reference dataset for extracting predictive features would make the accuracy underestimated. Since the overestimating effects are more significant than the underestimating effect, the accuracy of some SSP methods might have been overestimated. Based on the discoveries, we propose a rigorous procedure for developing SSP algorithms and making reliable evaluations, hoping to bring substantial improvements to future SSP methods and benefit all research and application fields relying on accurate prediction of protein secondary structures.

Download Full-text

Faculty Opinions recommendation of COFOLD: an RNA secondary structure prediction method that takes co-transcriptional folding into account.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718010599.793476797 ◽

2013 ◽

Author(s):

Scott Silverman

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Secondary Structure Prediction ◽

Prediction Method ◽

Rna Secondary Structure Prediction ◽

Structure Prediction Method ◽

Secondary Structure Prediction Method

Download Full-text