scholarly journals Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique

2015 ◽  
Vol 9 ◽  
pp. BBI.S26864 ◽  
Author(s):  
Hebatallah Hassan ◽  
Amr Badr ◽  
M. B. Abdelhalim

O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification accuracy for the minority class to become unsatisfactory. In our previous work, we have proposed a new classification approach, which is based on particle swarm optimization (PSO) and random forest (RF); this approach has considered the imbalanced dataset problem. The PSO parameters setting in the training process impacts the classification accuracy. Thus, in this paper, we perform parameters optimization for the PSO algorithm, based on genetic algorithm, in order to increase the classification accuracy. Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors. In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein.

2020 ◽  
Vol 27 (3) ◽  
pp. 178-186 ◽  
Author(s):  
Ganesan Pugalenthi ◽  
Varadharaju Nithya ◽  
Kuo-Chen Chou ◽  
Govindaraju Archunan

Background: N-Glycosylation is one of the most important post-translational mechanisms in eukaryotes. N-glycosylation predominantly occurs in N-X-[S/T] sequon where X is any amino acid other than proline. However, not all N-X-[S/T] sequons in proteins are glycosylated. Therefore, accurate prediction of N-glycosylation sites is essential to understand Nglycosylation mechanism. Objective: In this article, our motivation is to develop a computational method to predict Nglycosylation sites in eukaryotic protein sequences. Methods: In this article, we report a random forest method, Nglyc, to predict N-glycosylation site from protein sequence, using 315 sequence features. The method was trained using a dataset of 600 N-glycosylation sites and 600 non-glycosylation sites and tested on the dataset containing 295 Nglycosylation sites and 253 non-glycosylation sites. Nglyc prediction was compared with NetNGlyc, EnsembleGly and GPP methods. Further, the performance of Nglyc was evaluated using human and mouse N-glycosylation sites. Results: Nglyc method achieved an overall training accuracy of 0.8033 with all 315 features. Performance comparison with NetNGlyc, EnsembleGly and GPP methods shows that Nglyc performs better than the other methods with high sensitivity and specificity rate. Conclusion: Our method achieved an overall accuracy of 0.8248 with 0.8305 sensitivity and 0.8182 specificity. Comparison study shows that our method performs better than the other methods. Applicability and success of our method was further evaluated using human and mouse N-glycosylation sites. Nglyc method is freely available at https://github.com/bioinformaticsML/ Ngly.


Molecules ◽  
2021 ◽  
Vol 26 (23) ◽  
pp. 7314
Author(s):  
Subash C. Pakhrin ◽  
Kiyoko F. Aoki-Kinoshita ◽  
Doina Caragea ◽  
Dukka B. KC

Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community.


Viruses ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1281
Author(s):  
Wei Liu ◽  
Junhua Li ◽  
Hongli Du ◽  
Zhihua Ou

Human papillomavirus type 16 (HPV16) is the most prevalent HPV type causing cervical cancers. Herein, using 1597 full genomes, we systemically investigated the mutation profiles, surface protein glycosylation sites and the codon usage bias (CUB) of HPV16 from different lineages and sublineages. Multiple lineage- or sublineage-conserved mutation sites were identified. Glycosylation analysis showed that HPV16 lineage D contained the highest number of different glycosylation sites from lineage A in both L1 and L2 capsid proteins, which might lead to their antigenic distances between the two lineages. CUB analysis showed that the HPV16 open reading frames (ORFs) preferred codons ending with A/T. The CUB of HPV16 ORFs was mainly affected by natural selection except for E1, E5 and L2. HPV16 only shared some of the preferred codons with humans, which might help reduce competition in translational resources. These findings increase our understanding of the heterogeneity between HPV16 lineages and sublineages, and the adaptation mechanism of HPV in human cells. In summary, this study might facilitate HPV classification and improve vaccine development and application.


2020 ◽  
Vol 19 (3) ◽  
pp. 529-539 ◽  
Author(s):  
Freja Scheys ◽  
Els J. M. Van Damme ◽  
Jarne Pauwels ◽  
An Staes ◽  
Kris Gevaert ◽  
...  

Glycosylation is a common modification of proteins and critical for a wide range of biological processes. Differences in protein glycosylation between sexes have already been observed in humans, nematodes and trematodes, and have recently also been reported in the rice pest insect Nilaparvata lugens. Although protein N-glycosylation in insects is nowadays of high interest because of its potential for exploitation in pest control strategies, the functionality of differential N-glycosylation between sexes is yet unknown. In this study, therefore, the occurrence and role of sex-related protein N-glycosylation in insects were examined. A comprehensive investigation of the N-glycosylation sites from the adult stages of N. lugens was conducted, allowing a qualitative and quantitative comparison between sexes at the glycopeptide level. N-glycopeptide enrichment via lectin capturing using the high mannose/paucimannose-binding lectin Concanavalin A, or the Rhizoctonia solani agglutinin which interacts with complex N-glycans, resulted in the identification of over 1300 N-glycosylation sites derived from over 600 glycoproteins. Comparison of these N-glycopeptides revealed striking differences in protein N-glycosylation between sexes. Male- and female-specific N-glycosylation sites were identified, and some of these sex-specific N-glycosylation sites were shown to be derived from proteins with a putative role in insect reproduction. In addition, differential glycan composition between males and females was observed for proteins shared across sexes. Both lectin blotting experiments as well as transcript expression analyses with complete insects and insect tissues confirmed the observed differences in N-glycosylation of proteins between sexes. In conclusion, this study provides further evidence for protein N-glycosylation to be sex-related in insects. Furthermore, original data on N-glycosylation sites of N. lugens adults are presented, providing novel insights into planthopper's biology and information for future biological pest control strategies.


Author(s):  
Jian Liu ◽  
Yuchen Zheng ◽  
Ke Dong ◽  
Haitong Yu ◽  
Jianjun Zhou ◽  
...  

In classification of fashion article images based on e-commerce image recommendation system, the classification accuracy and computation time cannot meet the actual requirements. Herein, for the first time to our knowledge, we present two diverse image recognition approaches for classification of fashion article images called random-forest method based on genetic algorithm (GA-RF) and Visual Geometry Group-Image Enhancement algorithm (VGG-IE) to solve classification accuracy and computation time problem. In GA-RF, the number of segmentation times and the decision trees are the key factors affecting the classification results. Improved genetic algorithm is introduced into the parameter optimization of forests to determine the optimal combination of the two parameters with minimal manual intervention. Finally, we propose six different Deep Neural Network architectures, including VGG-IE, to improve classification accuracy. The VGG-IE algorithm uses batch normalization and seven kinds training-data augmentation for ease and promotion of learning process. We investigate the effectiveness of the proposed method using Fashion-MNIST dataset and 70[Formula: see text]000 pictures, Experimental results demonstrate that, in comparison with the state-of-the-art algorithms for 10 categories of image recognition, our VGG algorithm has the shortest computational time when it satisfies certain classification accuracy. VGG-IE approach has the highest classification accuracy.


1995 ◽  
Vol 311 (3) ◽  
pp. 959-967 ◽  
Author(s):  
C Kronman ◽  
B Velan ◽  
D Marcus ◽  
A Ordentlich ◽  
S Reuveny ◽  
...  

The possible role of post-translational modifications such as subunit oligomerization, protein glycosylation and oligosaccharide processing on the circulatory life-time of proteins was studied using recombinant human acetylcholinesterase (rHuAChE). Different preparations of rHuAChE containing various amounts of tetramers, dimers and monomers are cleared at similar rates from the circulation, suggesting that oligomerization does not play an important role in determining the rate of clearance. An engineered rHuAChE mutant containing only one N-glycosylation site was cleared from the circulation more rapidly than the wild-type triglycosylated enzyme. On the other hand, hyperglycosylated mutants containing either four or five occupied N-glycosylation sites, analagous to those present on the slowly cleared fetal bovine serum acetylcholinesterase (FBS-AChE), were also cleared more rapidly from the bloodstream than the wild-type species. Furthermore, the two different tetraglycosylated mutants were cleared at different rates while the pentaglycosylated mutant exhibited the most rapid clearance profile. These results imply that though the number of N-glycosylation sites plays a role in the circulatory life-time of the enzyme, the number of N-glycan units in itself does not determine the rate of clearance. When saturating amounts of asialofetuin were administered together with rHuAChE, the circulatory half-life of the enzyme was dramatically increased (from 80 min to 19 h) and was found to be similar to that displayed by plasma-derived cholinesterases while desialylation of these enzymes caused a sharp decrease in the circulatory half-life to approximately 3-5 min. Determination of the average number of sialic acid residues per enzyme subunit of the five different N-glycosylation species generated, revealed that the rate of clearance is not a function of the absolute number of appended sialic acid moieties but rather of the number of unoccupied sialic acid attachment sites per enzyme molecule. Specifically, we demonstrate an inverse-linear relationship between the number of vacant sialic acid attachment sites and the values of the enzyme residence time within the bloodstream.


2021 ◽  
Vol 15 ◽  
Author(s):  
Alhassan Alkuhlani ◽  
Walaa Gad ◽  
Mohamed Roushdy ◽  
Abdel-Badeeh M. Salem

Background: Glycosylation is one of the most common post-translation modifications (PTMs) in organism cells. It plays important roles in several biological processes including cell-cell interaction, protein folding, antigen’s recognition, and immune response. In addition, glycosylation is associated with many human diseases such as cancer, diabetes and coronaviruses. The experimental techniques for identifying glycosylation sites are time-consuming, extensive laboratory work, and expensive. Therefore, computational intelligence techniques are becoming very important for glycosylation site prediction. Objective: This paper is a theoretical discussion of the technical aspects of the biotechnological (e.g., using artificial intelligence and machine learning) to digital bioinformatics research and intelligent biocomputing. The computational intelligent techniques have shown efficient results for predicting N-linked, O-linked and C-linked glycosylation sites. In the last two decades, many studies have been conducted for glycosylation site prediction using these techniques. In this paper, we analyze and compare a wide range of intelligent techniques of these studies from multiple aspects. The current challenges and difficulties facing the software developers and knowledge engineers for predicting glycosylation sites are also included. Method: The comparison between these different studies is introduced including many criteria such as databases, feature extraction and selection, machine learning classification methods, evaluation measures and the performance results. Results and conclusions: Many challenges and problems are presented. Consequently, more efforts are needed to get more accurate prediction models for the three basic types of glycosylation sites.


2018 ◽  
Vol 145 ◽  
pp. 488-494 ◽  
Author(s):  
Aleksandr Sboev ◽  
Alexey Serenko ◽  
Roman Rybka ◽  
Danila Vlasov ◽  
Andrey Filchenkov

Sign in / Sign up

Export Citation Format

Share Document