scholarly journals Peptide Secondary Structure Prediction using Evolutionary Information

2019 ◽  
Author(s):  
Harinder Singh ◽  
Sandeep Singh ◽  
Gajendra Pal Singh Raghava

ABSTRACTBACKGROUNDIn the past, large numbers of methods have been developed for predicting secondary structure of proteins. Best of author’s knowledge no method has been specifically developed for predicting secondary structure of peptides. We analyzed secondary structure of peptides and proteins; it was observed that same peptide in protein adopt different secondary structures. Considering the wide application of peptides in therapeutic market, we made attempt to develop a method called PEP2D for predicting secondary structure of peptides.RESULTSIn this study, 3107 unique peptides have been used to train, test and evaluate peptide secondary structure prediction models. It was observed that regular secondary structure content (e.g., helix, beta-sheet) increased with length of peptides. Firstly, models based on various machine-learning techniques have been developed using binary profile of peptides and achieved maximum overall accuracy (Q3) 79.5%. The performance of models further improved from 79.5% to 83.5% using evolutionary information in the form of PSSM profile. We also evaluate performance of protein secondary structure prediction method PSIPRED on our dataset and achieved maximum accuracy 76.9%; particularly poor (Q3 71.4%) for small peptides having length less than 10 residues. Overall, PEP2D has better prediction of beta-sheets (Q3 74%) and coil region (Q3 87%) of peptides as compare to PSIPRED (Q3 54.4% for beta-sheet and Q3 77.9% for coil). We also measure performance of PSIPRED and PEP2D in terms of segment overlap (SOV); achieved 69.3 and 76.7 respectively.CONCLUSIONOur observations indicate that there is a need of developing separate method for predicting secondary structure of peptides. It was also observed that models based on PSSM profile perform poor on small peptides in comparison to long peptides. Based on our study, we developed method for predicting secondary structure of peptides. In order to provide service to user, a webserver/standalone has been developed (https://webs.iiitd.edu.in/raghava/pep2d/).

2004 ◽  
Vol 56 ◽  
pp. 305-327 ◽  
Author(s):  
Yann Guermeur ◽  
Gianluca Pollastri ◽  
André Elisseeff ◽  
Dominique Zelus ◽  
Hélène Paugam-Moisy ◽  
...  

2020 ◽  
Vol 15 (7) ◽  
pp. 767-777
Author(s):  
Lin Guo ◽  
Qian Jiang ◽  
Xin Jin ◽  
Lin Liu ◽  
Wei Zhou ◽  
...  

Background: Protein secondary structure prediction (PSSP) is a fundamental task in bioinformatics that is helpful for understanding the three-dimensional structure and biological function of proteins. Many neural network-based prediction methods have been developed for protein secondary structures. Deep learning and multiple features are two obvious means to improve prediction accuracy. Objective: To promote the development of PSSP, a deep convolutional neural network-based method is proposed to predict both the eight-state and three-state of protein secondary structure. Methods: In this model, sequence and evolutionary information of proteins are combined as multiple input features after preprocessing. A deep convolutional neural network with no pooling layer and connection layer is then constructed to predict the secondary structure of proteins. L2 regularization, batch normalization, and dropout techniques are employed to avoid over-fitting and obtain better prediction performance, and an improved cross-entropy is used as the loss function. Results: Our proposed model can obtain Q3 prediction results of 86.2%, 84.5%, 87.8%, and 84.7%, respectively, on CullPDB, CB513, CASP10 and CASP11 datasets, with corresponding Q8 prediction results of 74.1%, 70.5%, 74.9%, and 71.3%. Conclusion: We have proposed the DCNN-SS deep convolutional-network-based PSSP method, and experimental results show that DCNN-SS performs competitively with other methods.


1992 ◽  
Vol 288 (1) ◽  
pp. 35-40 ◽  
Author(s):  
N Bihoreau ◽  
M P Fontaine-Aupart ◽  
A Lehegarat ◽  
M Desmadril ◽  
J M Yon

The first analysis of the secondary structure of human factor VIII light chain was performed by c.d. spectroscopy. The purification process described in this paper allowed us to obtain the large amounts of purified factor VIII light chains required for c.d. experiments. Since this 80 kDa protein is non-covalently associated with a heavy chain to form the active molecule, isolated factor VIII light chains were obtained after immunoadsorption and dissociation of the immobilized active complexes by EDTA. Furthermore, factor VIII light chains were discriminated from the residual active complexes and the free heavy chains by a final ion-exchange-chromatography step. This f.p.l.c. analysis showed that factor VIII light chains were less electronegative than the active complexes. The results of conformational analysis by c.d. show that the protein possesses a high degree of regular secondary structure (58%) with approx. 22% of alpha-helix and 36% of beta-strand structures. The protein was completely unfolded by 3 M-guanidine hydrochloride. The results obtained from the analysis of c.d. spectra were compared with those predicted from three different statistical methods based on amino-acid sequence. The secondary structure information obtained from these methods was in good agreement with the c.d. results. These results were comparable with the secondary structure prediction of ceruloplasmin, a protein known to show sequence identity to factor VIII.


2020 ◽  
Vol 8 (1) ◽  
pp. 36-50 ◽  
Author(s):  
Devin Willmott ◽  
David Murrugarra ◽  
Qiang Ye

AbstractThe problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we call RNA state inference, can be studied by different machine learning techniques. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. Typical tools for state inference, such as hidden Markov models, exhibit poor performance in RNA state inference, owing in part to their inability to recognize nonlocal dependencies. Bidirectional long short-term memory (LSTM) neural networks have emerged as a powerful tool that can model global nonlinear sequence dependencies and have achieved state-of-the-art performances on many different classification problems.This paper presents a practical approach to RNA secondary structure inference centered around a deep learning method for state inference. State predictions from a deep bidirectional LSTM are used to generate synthetic SHAPE data that can be incorporated into RNA secondary structure prediction via the Nearest Neighbor Thermodynamic Model (NNTM). This method produces predicted secondary structures for a diverse test set of 16S ribosomal RNA that are, on average, 25 percentage points more accurate than undirected MFE structures. Accuracy is highly dependent on the success of our state inference method, and investigating the global features of our state predictions reveals that accuracy of both our state inference and structure inference methods are highly dependent on the similarity of pairing patterns of the sequence to the training dataset. Availability of a large training dataset is critical to the success of this approach. Code available at https://github.com/dwillmott/rna-state-inf.


1986 ◽  
Vol 236 (1) ◽  
pp. 127-130 ◽  
Author(s):  
L Sawyer ◽  
L A Fothergill-Gilmore ◽  
G A Russell

The results of several secondary-structure prediction programs were combined to produce an estimate of the regions of alpha-helix, beta-sheet and reverse turn for both chicken skeletal-muscle and yeast enolase sequences. The predicted secondary-structure content of the chicken enzyme is 27% alpha-helix and less than 10% beta-sheet, whereas in the yeast enolase a similar helix content but virtually no sheet are predicted. These results are in fair agreement with published experimental estimates of the amount of secondary structure in the yeast enzyme. The enzyme appears to be formed from three domains.


Author(s):  
JAYAVARDHANA GUBBI ◽  
DANIEL T. H. LAI ◽  
MARIMUTHU PALANISWAMI ◽  
MICHAEL PARKER

Knowledge of the secondary structure and solvent accessibility of a protein plays a vital role in the prediction of fold, and eventually the tertiary structure of the protein. A challenging issue of predicting protein secondary structure from sequence alone is addressed. Support vector machines (SVM) are employed for the classification and the SVM outputs are converted to posterior probabilities for multi-class classification. The effect of using Chou–Fasman parameters and physico-chemical parameters along with evolutionary information in the form of position specific scoring matrix (PSSM) is analyzed. These proposed methods are tested on the RS126 and CB513 datasets. A new dataset is curated (PSS504) using recent release of CATH. On the CB513 dataset, sevenfold cross-validation accuracy of 77.9% was obtained using the proposed encoding method. A new method of calculating the reliability index based on the number of votes and the Support Vector Machine decision value is also proposed. A blind test on the EVA dataset gives an average Q3 accuracy of 74.5% and ranks in top five protein structure prediction methods. Supplementary material including datasets are available on .


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254555
Author(s):  
Teng-Ruei Chen ◽  
Chia-Hua Lo ◽  
Sheng-Hung Juan ◽  
Wei-Cheng Lo

The secondary structure prediction (SSP) of proteins has long been an essential structural biology technique with various applications. Despite its vital role in many research and industrial fields, in recent years, as the accuracy of state-of-the-art secondary structure predictors approaches the theoretical upper limit, SSP has been considered no longer challenging or too challenging to make advances. With the belief that the substantial improvement of SSP will move forward many fields depending on it, we conducted this study, which focused on three issues that have not been noticed or thoroughly examined yet but may have affected the reliability of the evaluation of previous SSP algorithms. These issues are all about the sequence homology between or within the developmental and evaluation datasets. We thus designed many different homology layouts of datasets to train and evaluate SSP prediction models. Multiple repeats were performed in each experiment by random sampling. The conclusions obtained with small experimental datasets were verified with large-scale datasets using state-of-the-art SSP algorithms. Very different from the long-established assumption, we discover that the sequence homology between query datasets for training, testing, and independent tests exerts little influence on SSP accuracy. Besides, the sequence homology redundancy between or within most datasets would make the accuracy of an SSP algorithm overestimated, while the redundancy within the reference dataset for extracting predictive features would make the accuracy underestimated. Since the overestimating effects are more significant than the underestimating effect, the accuracy of some SSP methods might have been overestimated. Based on the discoveries, we propose a rigorous procedure for developing SSP algorithms and making reliable evaluations, hoping to bring substantial improvements to future SSP methods and benefit all research and application fields relying on accurate prediction of protein secondary structures.


2021 ◽  
Author(s):  
Shutong Yang ◽  
Yuhong Wang ◽  
Kennie Cruz-Gutierrez ◽  
Fangling Wu ◽  
Chuan-Fan Ding

Abstract BackgroundProtein secondary structure prediction (PSSP) is important for protein structure modeling and design. Over the past a few years, deep learning models have shown promising results for PSSP. However, the current good performers for PSSP often require evolutionary information such as multiple sequence alignments and even real protein structures (templates), entire protein sequences, and amino acid property profiles. ResultsIn this study, we used a fixed-size window of adjacent residues and only amino acid sequences, without any evolutionary information, as inputs, and developed a very simple, yet accurate RNN model: LocalNet. The accuracy for three states of secondary structures is as high as 85.15%, indicating that the local amino acid sequence itself contains enough information for PSSP, a well-known classical view. By comparing to other predictors, we also achieve an state-of-art accuracy on dataset of CASP11, CASP12 and CASP13.ConclusionThe well-trained models are expected to have good applications in protein structure modeling and protein design. This model can be downloaded from https://github.com/lake-chao/protein-secondary-structure-prediction.


Sign in / Sign up

Export Citation Format

Share Document