scholarly journals Prediction of endoplasmic reticulum resident proteins using fragmented amino acid composition and support vector machine

PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3561 ◽  
Author(s):  
Ravindra Kumar ◽  
Bandana Kumari ◽  
Manish Kumar

BackgroundThe endoplasmic reticulum plays an important role in many cellular processes, which includes protein synthesis, folding and post-translational processing of newly synthesized proteins. It is also the site for quality control of misfolded proteins and entry point of extracellular proteins to the secretory pathway. Hence at any given point of time, endoplasmic reticulum contains two different cohorts of proteins, (i) proteins involved in endoplasmic reticulum-specific function, which reside in the lumen of the endoplasmic reticulum, called as endoplasmic reticulum resident proteins and (ii) proteins which are in process of moving to the extracellular space. Thus, endoplasmic reticulum resident proteins must somehow be distinguished from newly synthesized secretory proteins, which pass through the endoplasmic reticulum on their way out of the cell. Approximately only 50% of the proteins used in this study as training data had endoplasmic reticulum retention signal, which shows that these signals are not essentially present in all endoplasmic reticulum resident proteins. This also strongly indicates the role of additional factors in retention of endoplasmic reticulum-specific proteins inside the endoplasmic reticulum.MethodsThis is a support vector machine based method, where we had used different forms of protein features as inputs for support vector machine to develop the prediction models. During trainingleave-one-outapproach of cross-validation was used. Maximum performance was obtained with a combination of amino acid compositions of different part of proteins.ResultsIn this study, we have reported a novel support vector machine based method for predicting endoplasmic reticulum resident proteins, named as ERPred. During training we achieved a maximum accuracy of 81.42% withleave-one-outapproach of cross-validation. When evaluated on independent dataset, ERPred did prediction with sensitivity of 72.31% and specificity of 83.69%. We have also annotated six different proteomes to predict the candidate endoplasmic reticulum resident proteins in them. A webserver, ERPred, was developed to make the method available to the scientific community, which can be accessed athttp://proteininformatics.org/mkumar/erpred/index.html.DiscussionWe found that out of 124 proteins of the training dataset, only 66 proteins had endoplasmic reticulum retention signals, which shows that these signals are not an absolute necessity for endoplasmic reticulum resident proteins to remain inside the endoplasmic reticulum. This observation also strongly indicates the role of additional factors in retention of proteins inside the endoplasmic reticulum. Our proposed predictor, ERPred, is a signal independent tool. It is tuned for the prediction of endoplasmic reticulum resident proteins, even if the query protein does not contain specific ER-retention signal.

2020 ◽  
Vol 15 ◽  
Author(s):  
Yiyin Cao ◽  
Chunlu Yu ◽  
Shenghui Huang ◽  
Shiyuan Wang ◽  
Yongchun Zuo ◽  
...  

Background: Presynaptic and postsynaptic neurotoxins are two important neurotoxins. Due to the important role of presynaptic and postsynaptic neurotoxins in pharmacology and neuroscience, identification of them becomes very important in biology. Method: In this study, the statistical test and F-score were used to calculate the difference between amino acids and biological properties. The support vector machine was used to predict the presynaptic and postsynaptic neurotoxins by using the reduced amino acid alphabet types. Results: By using the reduced amino acid alphabet as the input parameters of support vector machine, the overall accuracy of our classifier had increased to 91.07%, which was the highest overall accuracy in this study. When compared with the other published methods, better predictive results were obtained by our classifier. Conclusion: In summary, we analyzed the differences between two neurotoxins in amino acids and biological properties, and constructed a classifier that could predict these two neurotoxins by using the reduced amino acid alphabet.


2018 ◽  
Vol 1 (1) ◽  
pp. 120-130 ◽  
Author(s):  
Chunxiang Qian ◽  
Wence Kang ◽  
Hao Ling ◽  
Hua Dong ◽  
Chengyao Liang ◽  
...  

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.


2019 ◽  
Vol 17 ◽  
Author(s):  
Yanqiu Yao ◽  
Xiaosa Zhao ◽  
Qiao Ning ◽  
Junping Zhou

Background: Glycation is a nonenzymatic post-translational modification process by attaching a sugar molecule to a protein or lipid molecule. It may impair the function and change the characteristic of the proteins which may lead to some metabolic diseases. In order to understand the underlying molecular mechanisms of glycation, computational prediction methods have been developed because of their convenience and high speed. However, a more effective computational tool is still a challenging task in computational biology. Methods: In this study, we showed an accurate identification tool named ABC-Gly for predicting lysine glycation sites. At first, we utilized three informative features, including position-specific amino acid propensity, secondary structure and the composition of k-spaced amino acid pairs to encode the peptides. Moreover, to sufficiently exploit discriminative features thus can improve the prediction and generalization ability of the model, we developed a two-step feature selection, which combined the Fisher score and an improved binary artificial bee colony algorithm based on support vector machine. Finally, based on the optimal feature subset, we constructed the effective model by using Support Vector Machine on the training dataset. Results: The performance of the proposed predictor ABC-Gly was measured with the sensitivity of 76.43%, the specificity of 91.10%, the balanced accuracy of 83.76%, the area under the receiver-operating characteristic curve (AUC) of 0.9313, a Matthew’s Correlation Coefficient (MCC) of 0.6861 by 10-fold cross-validation on training dataset, and a balanced accuracy of 59.05% on independent dataset. Compared to the state-of-the-art predictors on the training dataset, the proposed predictor achieved significant improvement in the AUC of 0.156 and MCC of 0.336. Conclusion: The detailed analysis results indicated that our predictor may serve as a powerful complementary tool to other existing methods for predicting protein lysine glycation. The source code and datasets of the ABC-Gly were provided in the Supplementary File 1.


2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Bin Zhang ◽  
Jinke Gong ◽  
Wenhua Yuan ◽  
Jun Fu ◽  
Yi Huang

In order to effectively predict the sieving efficiency of a vibrating screen, experiments to investigate the sieving efficiency were carried out. Relation between sieving efficiency and other working parameters in a vibrating screen such as mesh aperture size, screen length, inclination angle, vibration amplitude, and vibration frequency was analyzed. Based on the experiments, least square support vector machine (LS-SVM) was established to predict the sieving efficiency, and adaptive genetic algorithm and cross-validation algorithm were used to optimize the parameters in LS-SVM. By the examination of testing points, the prediction performance of least square support vector machine is better than that of the existing formula and neural network, and its average relative error is only 4.2%.


Sign in / Sign up

Export Citation Format

Share Document