Protein Structure Abstractionand Automatic Clustering Using Secondary Structure Element Sequences

A comparison between internal protein nanoenvironments of α-helices and β-sheets

PLoS ONE ◽

10.1371/journal.pone.0244315 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0244315

Author(s):

Ivan Mazoni ◽

Jose Augusto Salim ◽

Fabio Rogerio de Moraes ◽

Luiz Borro ◽

Goran Neshich

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Univariate Analysis ◽

Protein Structures ◽

Main Idea ◽

Secondary Structure Element ◽

Statistical Tool ◽

Full Characterization ◽

Β Sheets ◽

Almost All

Secondary structure elements are generally found in almost all protein structures revealed so far. In general, there are more β-sheets than α helices found inside the protein structures. For example, considering the PDB, DSSP and Stride definitions for secondary structure elements and by using the consensus among those, we found 60,727 helices in 4,376 chains identified in all-α structures and 129,440 helices in 7,898 chains identified in all-α and α + β structures. For β-sheets, we identified 837,345 strands in 184,925 β-sheets located within 50,803 chains of all-β structures and 1,541,961 strands in 355,431 β-sheets located within 86,939 chains in all-β and α + β structures (data extracted on February 1, 2019). In this paper we would first like to address a full characterization of the nanoenvironment found at beta sheet locations and then compare those characteristics with the ones we already published for alpha helical secondary structure elements. For such characterization, we use here, as in our previous work about alpha helical nanoenvironments, set of STING protein structure descriptors. As in the previous work, we assume that we will be able to prove that there is a set of protein structure parameters/attributes/descriptors, which could fully describe the nanoenvironment around beta sheets and that appropriate statistically analysis will point out to significant changes in values for those parameters when compared for loci considered inside and outside defined secondary structure element. Clearly, while the univariate analysis is straightforward and intuitively understood, it is severely limited in coverage: it could be successfully applied at best in up to 25% of studied cases. The indication of the main descriptors for the specific secondary structure element (SSE) by means of the multivariate MANOVA test is the strong statistical tool for complete discrimination among the SSEs, and it revealed itself as the one with the highest coverage. The complete description of the nanoenvironment, by analogy, might be understood in terms of describing a key lock system, where all lock mini cylinders need to combine their elevation (controlled by a matching key) to open the lock. The main idea is as follows: a set of descriptors (cylinders in the key-lock example) must precisely combine their values (elevation) to form and maintain a specific secondary structure element nanoenvironment (a required condition for a key being able to open a lock).

Download Full-text

Extension of a de novo TIM barrel with a rationally designed secondary structure element

Protein Science ◽

10.1002/pro.4064 ◽

2021 ◽

Vol 30 (5) ◽

pp. 982-989

Author(s):

Jonas Gregor Wiese ◽

Sooruban Shanmugaratnam ◽

Birte Höcker

Keyword(s):

Secondary Structure ◽

De Novo ◽

Secondary Structure Element ◽

Tim Barrel

Download Full-text

Protein Structure Prediction: Assembly of Secondary Structure Elements by Basin-Hopping

ChemPhysChem ◽

10.1002/cphc.201402247 ◽

2014 ◽

Vol 15 (15) ◽

pp. 3378-3390 ◽

Cited By ~ 1

Author(s):

Falk Hoffmann ◽

Ioan Vancea ◽

Sanjay G. Kamat ◽

Birgit Strodel

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Basin Hopping

Download Full-text

Outer membrane proteins can be simply identified using secondary structure element alignment

BMC Bioinformatics ◽

10.1186/1471-2105-12-76 ◽

2011 ◽

Vol 12 (1) ◽

pp. 76 ◽

Cited By ~ 16

Author(s):

Ren-Xiang Yan ◽

Zhen Chen ◽

Ziding Zhang

Keyword(s):

Secondary Structure ◽

Membrane Proteins ◽

Outer Membrane ◽

Outer Membrane Proteins ◽

Secondary Structure Element

Download Full-text

Advances in Protein Super-Secondary Structure Prediction and Application to Protein Structure Prediction

Methods in Molecular Biology - Protein Supersecondary Structures ◽

10.1007/978-1-4939-9161-7_2 ◽

2019 ◽

pp. 15-45 ◽

Cited By ~ 3

Author(s):

Elijah MacCarthy ◽

Derrick Perry ◽

Dukka B. KC

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Secondary Structure Prediction

Download Full-text

SSNN, a method for neural network protein secondary structure fitting using circular dichroism data

Analytical Methods ◽

10.1039/c3ay41831f ◽

2014 ◽

Vol 6 (17) ◽

pp. 6721-6726 ◽

Cited By ~ 6

Author(s):

Vincent Hall ◽

Anthony Nash ◽

Alison Rodger

Keyword(s):

Neural Network ◽

Circular Dichroism ◽

Protein Structure ◽

Secondary Structure ◽

Protein Secondary Structure ◽

Network Approach ◽

Cd Spectra ◽

Neural Network Approach ◽

Self Organising Map ◽

Circular Dichroïsm

SSNN is a self-organising map neural network approach for estimating protein structure from circular dichroism (CD) spectra. The method for using SSNN is described here, and SSNN is compared with CDSSTR, a well-known methodology for finding secondary structures from CD. SSNN compares well with similar methodologies.

Download Full-text

Hermes: an ensemble machine learning architecture for protein secondary structure prediction

10.1101/640656 ◽

2019 ◽

Author(s):

Larry Bliss ◽

Ben Pascoe ◽

Samuel K Sheppard

Keyword(s):

Machine Learning ◽

Protein Structure ◽

Secondary Structure ◽

Structure Prediction ◽

Cross Validation ◽

Secondary Structure Prediction ◽

Protein Structures ◽

Lower Boundary ◽

Protein Secondary Structure ◽

Homologous Proteins

AbstractMotivationProtein structure predictions, that combine theoretical chemistry and bioinformatics, are an increasingly important technique in biotechnology and biomedical research, for example in the design of novel enzymes and drugs. Here, we present a new ensemble bi-layered machine learning architecture, that directly builds on ten existing pipelines providing rapid, high accuracy, 3-State secondary structure prediction of proteins.ResultsAfter training on 1348 solved protein structures, we evaluated the model with four independent datasets: JPRED4 - compiled by the authors of the successful predictor with the same name, and CASP11, CASP12 & CASP13 - assembled by the Critical Assessment of protein Structure Prediction consortium who run biannual experiments focused on objective testing of predictors. These rigorous, pre-established protocols included 7-fold cross-validation and blind testing. This led to a mean Hermes accuracy of 95.5%, significantly (p<0.05) better than the ten previously published models analysed in this paper. Furthermore, Hermes yielded a reduction in standard deviation, lower boundary outliers, and reduced dependency on solved structures of homologous proteins, as measured by NEFF score. This architecture provides advantages over other pipelines, while remaining accessible to users at any level of bioinformatics experience.Availability and ImplementationThe source code for Hermes is freely available at: https://github.com/HermesPrediction/Hermes. This page also includes the cross-validation with corresponding models, and all training/testing data presented in this study with predictions and accuracy.

Download Full-text

Improved computational methods of protein sequence alignment, model selection and tertiary structure prediction

10.32469/10355/46126 ◽

2013 ◽

Author(s):

◽

Xin Deng

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Model Selection ◽

Sequence Alignment ◽

Protein Sequence ◽

Structure Prediction ◽

Tertiary Structure ◽

Solvent Accessibility ◽

Relative Solvent Accessibility ◽

Tertiary Structure Prediction

Protein sequence and profile alignment has been used essentially in most bioinformatics tasks such as protein structure modeling, function prediction, and phylogenetic analysis. We designed a new algorithm MSACompro to incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into multiple protein sequence alignment. Our experiments showed that it improved multiple sequence alignment accuracy over most existing methods without using the structural information and performed comparably to the method using structural features and additional homologous sequences by slightly lower scores. We also developed HHpacom, a new profile-profile pairwise alignment by integrating secondary structure, solvent accessibility, torsion angle and inferred residue pair coupling information. The evaluation showed that the secondary structure, relative solvent accessibility and torsion angle information significantly improved the alignment accuracy in comparison with the state of the art methods HHsearch and HHsuite. The evolutionary constraint information did help in some cases, especially the alignments of the proteins which are of short lengths, typically 100 to 500 residues. Protein Model selection is also a key step in protein tertiary structure prediction. We developed two SVM model quality assessment methods taking query-template alignment as input. The assessment results illustrated that this could help improve the model selection, protein structure prediction and many other bioinformatics problems. Moreover, we also developed a protein tertiary structure prediction pipeline, of which many components were built in our groupâ€™s MULTICOM system. The MULTICOM performed well in the CASP10 (Critical Assessment of Techniques for Protein Structure Prediction) competition.

Download Full-text

Literature Survey of Protein Secondary Structure Prediction

Jurnal Teknologi ◽

10.11113/jt.v34.642 ◽

2012 ◽

Author(s):

Satya Nanda Vel Arjunan ◽

Safaai Deris ◽

Rosli Md Illias

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Structure Prediction ◽

Large Scale ◽

Secondary Structure Prediction ◽

Protein Structures ◽

Protein Secondary Structure ◽

Fundamental Theory ◽

Protein Secondary Structure Prediction ◽

General Guide

Dengan wujudnya projek jujukan DNA secara besar-besaran, teknik yang tepat untuk meramalkan struktur protein diperlukan. Masalah meramalkan struktur protein daripada jujukan DNA pada dasarnya masih belum dapat diselesaikan walaupun kajian intensif telah dilakukan selama lebih daripada tiga dekad. Dalam kertas kerja ini, teori asas struktur protein akan dibincangkan sebagai panduan umum bagi kajian peramalan struktur protein sekunder. Analisis jujukan terkini serta prinsi p yang digunakan dalam teknik-teknik tersebut akan diterangkan. Kata kunci: peramalan stuktur sekunder protein; rangkaian neural. In the wake of large-scale DNA sequencing projects, accurate tools are needed to predict protein structures. The problem of predicting protein structure from DNA sequence remains fundamentally unsolved even after more than three decades of intensive research. In this paper, fundamental theory of the protein structure of the protein structure will be presented as a general guide to protein secondary structure prediction research. An overview of the state-of-theart in sequence analysis and some princi ples of the methods invloved wil be described. Key words: protein secondary structure prediction;neural networks.

Download Full-text

Prediction of Protein Secondary Structure

Jurnal Teknologi ◽

10.11113/jt.v35.605 ◽

2012 ◽

Author(s):

Satya Nanda Vel Arjunan ◽

Safaai Deris ◽

Rosli Md Illias

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Structure Prediction ◽

Large Scale ◽

Secondary Structure Prediction ◽

State Of The Art ◽

Protein Structures ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction ◽

General Guide

Dengan wujudnya projek jujukan DNA secara besar–besaran, teknik yang tepat untuk meramalkan struktur protein diperlukan. Masalah meramalkan struktur protein daripada jujukan DNA pada dasarnya masih belum dapat diselesaikan walaupun kajian intensif telah dilakukan selama lebih daripada tiga dekad. Dalam kertas kerja ini, teori asas struktur protein akan dibincangkan sebagai panduan umum bagi kajian peramalan struktur protein sekunder. Analisis jujukan terkini serta prinsip yang digunakan dalam teknik–teknik tersebut akan diterangkan. Kata kunci: Peramalan struktur sekunder protein; Rangkaian Neural In the wake of large-scale DNA sequencing projects, accurate tools are needed to predict protein structures. The problem of predicting protein structure from DNA sequence remains fundamentally unsolved even after more than three decades of intensive research. In this paper, fundamental theory of the protein structure will be presented as a general guide to protein secondary structure prediction research. An overview of the state–of–the–art in sequence analysis and some principles of the methods involved wil be described. Key words: Protein secondary structure prediction; Neural networks

Download Full-text