scholarly journals Identification of DNA-Binding Proteins Using Support Vector Machine with Sequence Information

2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Xin Ma ◽  
Jiansheng Wu ◽  
Xiaoyun Xue

DNA-binding proteins are fundamentally important in understanding cellular processes. Thus, the identification of DNA-binding proteins has the particularly important practical application in various fields, such as drug design. We have proposed a novel approach method for predicting DNA-binding proteins using only sequence information. The prediction model developed in this study is constructed by support vector machine-sequential minimal optimization (SVM-SMO) algorithm in conjunction with a hybrid feature. The hybrid feature is incorporating evolutionary information feature, physicochemical property feature, and two novel attributes. These two attributes use DNA-binding residues and nonbinding residues in a query protein to obtain DNA-binding propensity and nonbinding propensity. The results demonstrate that our SVM-SMO model achieves 0.67 Matthew's correlation coefficient (MCC) and 89.6% overall accuracy with 88.4% sensitivity and 90.8% specificity, respectively. Performance comparisons on various features indicate that two novel attributes contribute to the performance improvement. In addition, our SVM-SMO model achieves the best performance than state-of-the-art methods on independent test dataset.

2020 ◽  
Vol 17 (4) ◽  
pp. 302-310
Author(s):  
Yijie Ding ◽  
Feng Chen ◽  
Xiaoyi Guo ◽  
Jijun Tang ◽  
Hongjie Wu

Background: The DNA-binding proteins is an important process in multiple biomolecular functions. However, the tradition experimental methods for DNA-binding proteins identification are still time consuming and extremely expensive. Objective: In past several years, various computational methods have been developed to detect DNAbinding proteins. However, most of them do not integrate multiple information. Methods: In this study, we propose a novel computational method to predict DNA-binding proteins by two steps Multiple Kernel Support Vector Machine (MK-SVM) and sequence information. Firstly, we extract several feature and construct multiple kernels. Then, multiple kernels are linear combined by Multiple Kernel Learning (MKL). At last, a final SVM model, constructed by combined kernel, is built to predict DNA-binding proteins. Results: The proposed method is tested on two benchmark data sets. Compared with other existing method, our approach is comparable, even better than other methods on some data sets. Conclusion: We can conclude that MK-SVM is more suitable than common SVM, as the classifier for DNA-binding proteins identification.


Genes ◽  
2018 ◽  
Vol 9 (8) ◽  
pp. 394 ◽  
Author(s):  
Xiu-Juan Liu ◽  
Xiu-Jun Gong ◽  
Hua Yu ◽  
Jia-Hui Xu

Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature representation of the sequences and choosing an appropriate classifier are the most trivial tasks. Disclosing the significances and contributions of different feature spaces and classifiers to the final prediction is of the utmost importance, not only for the prediction performances, but also the practical clues of biological experiment designs. In this study, we propose a model stacking framework by orchestrating multi-view features and classifiers (MSFBinder) to investigate how to integrate and evaluate loosely-coupled models for predicting DNA-binding proteins. The framework integrates multi-view features including Local_DPP, 188D, Position-Specific Scoring Matrix (PSSM)_DWT and autocross-covariance of secondary structures(AC_Struc), which were extracted based on evolutionary information, sequence composition, physiochemical properties and predicted structural information, respectively. These features are fed into various loosely-coupled classifiers such as SVM and random forest. Then, a logistic regression model was applied to evaluate the contributions of these individual classifiers and to make the final prediction. When performing on the training dataset PDB1075, the proposed method achieves an accuracy of 83.53%. On the independent dataset PDB186, the method achieves an accuracy of 81.72%, which outperforms many existing methods. These results suggest that the framework is able to orchestrate various predicted models flexibly with good performances.


2003 ◽  
Vol 14 (05) ◽  
pp. 575-585 ◽  
Author(s):  
CONG-ZHONG CAI ◽  
WAN-LU WANG ◽  
YU-ZONG CHEN

The support vector machine (SVM) is used in the classification of sonar signals and DNA-binding proteins. Our study on the classification of sonar signals shows that SVM produces a result better than that obtained from other classification methods, which is consistent from the findings of other studies. The testing accuracy of classification is 95.19% as compared with that of 90.4% from multilayered neural network and that of 82.7% from nearest neighbor classifier. From our results on the classification of DNA-binding proteins, one finds that SVM gives a testing accuracy of 82.32%, which is slightly better than that obtained from an earlier study of SVM classification of protein–protein interactions. Hence, our study indicates the usefulness of SVM in the identification of DNA-binding proteins. Further improvements in SVM algorithm and parameters are suggested.


2015 ◽  
Vol 9 (Suppl 1) ◽  
pp. S10 ◽  
Author(s):  
Ruifeng Xu ◽  
Jiyun Zhou ◽  
Hongpeng Wang ◽  
Yulan He ◽  
Xiaolong Wang ◽  
...  

2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Weizhong Lu ◽  
Zhengwei Song ◽  
Yijie Ding ◽  
Hongjie Wu ◽  
Yan Cao ◽  
...  

The knowledge of DNA-binding proteins would help to understand the functions of proteins better in cellular biological processes. Research on the prediction of DNA-binding proteins can promote the research of drug proteins and computer acidified drugs. In recent years, methods based on machine learning are usually used to predict proteins. Although great predicted performance can be achieved via current methods, researchers still need to invest more research in terms of the improvement of predicted performance. In this study, the prediction of DNA-binding proteins is studied from the perspective of evolutionary information and the support vector machine method. One machine learning model for predicting DNA-binding proteins based on evolutionary features by using Chou’s 5-step rule is put forward. The results show that great predicted performance is obtained on benchmark dataset PDB1075 and independent dataset PDB186, achieving the accuracy of 86.05% and 75.30%, respectively. Thus, the method proposed is comparable to a certain degree, and it may work even better than other methods to some extent.


Sign in / Sign up

Export Citation Format

Share Document