Position-Specific Analysis and Prediction of Protein Pupylation Sites Based on Multiple Features

BioMed Research International ◽

10.1155/2013/109549 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9 ◽

Cited By ~ 15

Author(s):

Xiaowei Zhao ◽

Jiangyan Dai ◽

Qiao Ning ◽

Zhiqiang Ma ◽

Minghao Yin ◽

...

Keyword(s):

Posttranslational Modifications ◽

Computational Prediction ◽

Support Vector ◽

Good Prediction ◽

Accurate Identification ◽

Multiple Features ◽

Specific Analysis ◽

Experimental Approaches ◽

Optimal Feature

Pupylation is one of the most important posttranslational modifications of proteins; accurate identification of pupylation sites will facilitate the understanding of the molecular mechanism of pupylation. Besides the conventional experimental approaches, computational prediction of pupylation sites is much desirable for their convenience and fast speed. In this study, we developed a novel predictor to predict the pupylation sites. First, the maximum relevance minimum redundancy (mRMR) and incremental feature selection methods were made on five kinds of features to select the optimal feature set. Then the prediction model was built based on the optimal feature set with the assistant of the support vector machine algorithm. As a result, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was 0.764, and the Mathews correlation coefficient was 0.522, indicating a good prediction. Feature analysis showed that all features types contributed to the prediction of protein pupylation sites. Further site-specific features analysis revealed that the features of sites surrounding the central lysine contributed more to the determination of pupylation sites than the other sites.

Download Full-text

Identification of Protein Pupylation Sites Using Bi-Profile Bayes Feature Extraction and Ensemble Learning

Mathematical Problems in Engineering ◽

10.1155/2013/283129 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 4

Author(s):

Xiaowei Zhao ◽

Jian Zhang ◽

Qiao Ning ◽

Pingping Sun ◽

Zhiqiang Ma ◽

...

Keyword(s):

Feature Extraction ◽

Correlation Coefficient ◽

Posttranslational Modifications ◽

Protein Identification ◽

Matthews Correlation Coefficient ◽

Predictive Performance ◽

Computational Prediction ◽

Training Dataset ◽

Support Vector ◽

Lysine Residues

Pupylation, one of the most important posttranslational modifications of proteins, typically takes place when prokaryotic ubiquitin-like protein (Pup) is attached to specific lysine residues on a target protein. Identification of pupylation substrates and their corresponding sites will facilitate the understanding of the molecular mechanism of pupylation. Comparing with the labor-intensive and time-consuming experiment approaches, computational prediction of pupylation sites is much desirable for their convenience and fast speed. In this study, a new bioinformatics tool named EnsemblePup was developed that used an ensemble of support vector machine classifiers to predict pupylation sites. The highlight of EnsemblePup was to utilize the Bi-profile Bayes feature extraction as the encoding scheme. The performance of EnsemblePup was measured with a sensitivity of 79.49%, a specificity of 82.35%, an accuracy of 85.43%, and a Matthews correlation coefficient of 0.617 using the 5-fold cross validation on the training dataset. When compared with other existing methods on a benchmark dataset, the EnsemblePup provided better predictive performance, with a sensitivity of 80.00%, a specificity of 83.33%, an accuracy of 82.00%, and a Matthews correlation coefficient of 0.629. The experimental results suggested that EnsemblePup presented here might be useful to identify and annotate potential pupylation sites in proteins of interest. A web server for predicting pupylation sites was developed.

Download Full-text

Identification of Phage Virion Proteins by Using the g-gap Tripeptide Composition

Letters in Organic Chemistry ◽

10.2174/1570178615666180910112813 ◽

2019 ◽

Vol 16 (4) ◽

pp. 332-339

Author(s):

Liangwei Yang ◽

Hui Gao ◽

Zhen Liu ◽

Lixia Tang

Keyword(s):

Information Gain ◽

State Of The Art ◽

Support Vector ◽

Feature Subset ◽

Bacterial Cells ◽

Accurate Identification ◽

Incremental Feature Selection ◽

Optimal Feature Subset ◽

Phage Proteins ◽

Optimal Feature

Phages are widely distributed in locations populated by bacterial hosts. Phage proteins can be divided into two main categories, that is, virion and non-virion proteins with different functions. In practice, people mainly use phage virion proteins to clarify the lysis mechanism of bacterial cells and develop new antibacterial drugs. Accurate identification of phage virion proteins is therefore essential to understanding the phage lysis mechanism. Although some computational methods have been focused on identifying virion proteins, the result is not satisfying which gives more room for improvement. In this study, a new sequence-based method was proposed to identify phage virion proteins using g-gap tripeptide composition. In this approach, the protein features were firstly extracted from the ggap tripeptide composition. Subsequently, we obtained an optimal feature subset by performing incremental feature selection (IFS) with information gain. Finally, the support vector machine (SVM) was used as the classifier to discriminate virion proteins from non-virion proteins. In 10-fold crossvalidation test, our proposed method achieved an accuracy of 97.40% with AUC of 0.9958, which outperforms state-of-the-art methods. The result reveals that our proposed method could be a promising method in the work of phage virion proteins identification.

Download Full-text

Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods

Molecules ◽

10.3390/molecules23082000 ◽

2018 ◽

Vol 23 (8) ◽

pp. 2000 ◽

Cited By ~ 15

Author(s):

Jiu-Xin Tan ◽

Fu-Ying Dao ◽

Hao Lv ◽

Peng-Mian Feng ◽

Hui Ding

Keyword(s):

Feature Selection ◽

Cross Validation ◽

Support Vector ◽

Virion Protein ◽

Accurate Identification ◽

Machine Learning Methods ◽

Minimal Redundancy ◽

Maximal Relevance ◽

Optimal Feature ◽

Fold Cross Validation

Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for identifying phage virion proteins, it is extremely urgent to apply machine learning methods to accurately and efficiently identify phage virion proteins. In this work, a support vector machine (SVM) based method was proposed by mixing multiple sets of optimal g-gap dipeptide compositions. The analysis of variance (ANOVA) and the minimal-redundancy-maximal-relevance (mRMR) with an increment feature selection (IFS) were applied to single out the optimal feature set. In the five-fold cross-validation test, the proposed method achieved an overall accuracy of 87.95%. We believe that the proposed method will become an efficient and powerful method for scientists concerning phage virion proteins.

Download Full-text

Comparison of SVM, RF and SGD Methods for Determination of Programmer's Performance Classification Model in Social Media Activities

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1770 ◽

2020 ◽

Vol 4 (2) ◽

pp. 329-335

Author(s):

Rusydi Umar ◽

Imam Riadi ◽

Purwono

Keyword(s):

Social Media ◽

Gradient Descent ◽

Classification Model ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Svm Algorithm ◽

Vector Machines ◽

Performance Patterns ◽

A Company

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.

Download Full-text

A Computational Method for the Identification of Endolysins and Autolysins

Protein and Peptide Letters ◽

10.2174/0929866526666191002104735 ◽

2020 ◽

Vol 27 (4) ◽

pp. 329-336 ◽

Cited By ~ 1

Author(s):

Lei Xu ◽

Guangmin Liang ◽

Baowen Chen ◽

Xu Tan ◽

Huaikun Xiang ◽

...

Keyword(s):

Support Vector Machine ◽

Cell Wall ◽

Experimental Results ◽

Computational Method ◽

Lytic Enzyme ◽

Support Vector ◽

Lytic Enzymes ◽

Data Set ◽

Optimal Feature ◽

Better Than

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.

Download Full-text

Possibility of Human Gender Recognition Using Raman Spectra of Teeth

Molecules ◽

10.3390/molecules26133983 ◽

2021 ◽

Vol 26 (13) ◽

pp. 3983

Author(s):

Ozren Gamulin ◽

Marko Škrabić ◽

Kristina Serec ◽

Matej Par ◽

Marija Baković ◽

...

Keyword(s):

Raman Spectra ◽

Principal Component ◽

Support Vector ◽

Gender Recognition ◽

Proof Of Concept ◽

Male And Female ◽

Tooth Type ◽

Tooth Apex ◽

The Difference

Gender determination of the human remains can be very challenging, especially in the case of incomplete ones. Herein, we report a proof-of-concept experiment where the possibility of gender recognition using Raman spectroscopy of teeth is investigated. Raman spectra were recorded from male and female molars and premolars on two distinct sites, tooth apex and anatomical neck. Recorded spectra were sorted into suitable datasets and initially analyzed with principal component analysis, which showed a distinction between spectra of male and female teeth. Then, reduced datasets with scores of the first 20 principal components were formed and two classification algorithms, support vector machine and artificial neural networks, were applied to form classification models for gender recognition. The obtained results showed that gender recognition with Raman spectra of teeth is possible but strongly depends both on the tooth type and spectrum recording site. The difference in classification accuracy between different tooth types and recording sites are discussed in terms of the molecular structure difference caused by the influence of masticatory loading or gender-dependent life events.

Download Full-text

The application of artificial neural networks and support vector regression for simultaneous spectrophotometric determination of commercial eye drop contents

Spectrochimica Acta Part A Molecular and Biomolecular Spectroscopy ◽

10.1016/j.saa.2017.11.056 ◽

2018 ◽

Vol 193 ◽

pp. 297-304 ◽

Cited By ~ 5

Author(s):

Maryam Valizadeh ◽

Mahmoud Reza Sohrabi

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Support Vector Regression ◽

Spectrophotometric Determination ◽

Support Vector ◽

Artificial Neural

Download Full-text

Machine Learning Approaches for Prediction of the Compressive Strength of Alkali Activated Termite Mound Soil

Applied Sciences ◽

10.3390/app11114754 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4754

Author(s):

Assia Aboubakar Mahamat ◽

Moussa Mahamat Boukar ◽

Nurudeen Mahmud Ibrahim ◽

Tido Tiwa Stanislas ◽

Numfor Linda Bih ◽

...

Keyword(s):

Compressive Strength ◽

Construction Materials ◽

Curing Temperature ◽

Sustainable Construction ◽

Support Vector ◽

Learning Approaches ◽

Artificial Neural Network Ann ◽

Curing Regime ◽

Alkali Activated

Earth-based materials have shown promise in the development of ecofriendly and sustainable construction materials. However, their unconventional usage in the construction field makes the estimation of their properties difficult and inaccurate. Often, the determination of their properties is conducted based on a conventional materials procedure. Hence, there is inaccuracy in understanding the properties of the unconventional materials. To obtain more accurate properties, a support vector machine (SVM), artificial neural network (ANN) and linear regression (LR) were used to predict the compressive strength of the alkali-activated termite soil. In this study, factors such as activator concentration, Si/Al, initial curing temperature, water absorption, weight and curing regime were used as input parameters due to their significant effect in the compressive strength. The experimental results depict that SVM outperforms ANN and LR in terms of R2 score and root mean square error (RMSE).

Download Full-text

Application of adaptive Neuro-fuzzy interference system, fuzzy interference system and least squares support vector machine for rapid simultaneous spectrophotometric determination of antipsychotic drugs in binary mixtures and biological fluid

Optik ◽

10.1016/j.ijleo.2021.166569 ◽

2021 ◽

Vol 232 ◽

pp. 166569

Author(s):

Mojdeh Alibakhshi ◽

Mahmoud Reza Sohrabi ◽

Mehran Davallo

Keyword(s):

Support Vector Machine ◽

Least Squares ◽

Spectrophotometric Determination ◽

Antipsychotic Drugs ◽

Biological Fluid ◽

Support Vector ◽

Fuzzy Interference System ◽

Interference System ◽

Neuro Fuzzy

Download Full-text

From support vector machine learning to the determination of the minimum enclosing zone

Computers & Industrial Engineering ◽

10.1016/s0360-8352(02)00003-7 ◽

2002 ◽

Vol 42 (1) ◽

pp. 59-74 ◽

Cited By ~ 20

Author(s):

A.M. Malyscheff ◽

T.B. Trafalis ◽

S. Raman

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Support Vector

Download Full-text