Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning

PeerJ ◽

10.7717/peerj.11262 ◽

2021 ◽

Vol 9 ◽

pp. e11262

Author(s):

Guobin Li ◽

Xiuquan Du ◽

Xinlu Li ◽

Le Zou ◽

Guanhong Zhang ◽

...

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Dna Binding ◽

Binding Proteins ◽

Prediction Models ◽

Short Term Memory ◽

Local Features ◽

Dna Binding Proteins ◽

Superior Performance

DNA-binding proteins (DBPs) play pivotal roles in many biological functions such as alternative splicing, RNA editing, and methylation. Many traditional machine learning (ML) methods and deep learning (DL) methods have been proposed to predict DBPs. However, these methods either rely on manual feature extraction or fail to capture long-term dependencies in the DNA sequence. In this paper, we propose a method, called PDBP-Fusion, to identify DBPs based on the fusion of local features and long-term dependencies only from primary sequences. We utilize convolutional neural network (CNN) to learn local features and use bi-directional long-short term memory network (Bi-LSTM) to capture critical long-term dependencies in context. Besides, we perform feature extraction, model training, and model prediction simultaneously. The PDBP-Fusion approach can predict DBPs with 86.45% sensitivity, 79.13% specificity, 82.81% accuracy, and 0.661 MCC on the PDB14189 benchmark dataset. The MCC of our proposed methods has been increased by at least 9.1% compared to other advanced prediction models. Moreover, the PDBP-Fusion also gets superior performance and model robustness on the PDB2272 independent dataset. It demonstrates that the PDBP-Fusion can be used to predict DBPs from sequences accurately and effectively; the online server is at http://119.45.144.26:8080/PDBP-Fusion/.

Download Full-text

An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences

PLoS ONE ◽

10.1371/journal.pone.0225317 ◽

2019 ◽

Vol 14 (11) ◽

pp. e0225317 ◽

Cited By ~ 3

Author(s):

Siquan Hu ◽

Ruixiong Ma ◽

Haiou Wang

Keyword(s):

Deep Learning ◽

Amino Acid ◽

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Amino Acid Sequences ◽

Learning Method ◽

Contextual Features

Download Full-text

iDNA-BiProt: Predicting DNA-binding Proteins via Feature Extraction and Fuzzy K Neighbor Algorithm

10.22323/1.259.0003 ◽

2015 ◽

Author(s):

Xuan Xiao

Keyword(s):

Feature Extraction ◽

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins

Download Full-text

PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method

BioMed Research International ◽

10.1155/2020/7297631 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Jun Wang ◽

Huiwen Zheng ◽

Yang Yang ◽

Wanyue Xiao ◽

Taigang Liu

Keyword(s):

Dna Binding ◽

Binding Proteins ◽

Transition Probability ◽

Dna Binding Proteins ◽

Computational Method ◽

Superior Performance ◽

Sequence Information ◽

Experimental Approaches ◽

Wet Lab ◽

Two Stages

DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone.

Download Full-text

On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach

PLoS ONE ◽

10.1371/journal.pone.0188129 ◽

2017 ◽

Vol 12 (12) ◽

pp. e0188129 ◽

Cited By ~ 16

Author(s):

Yu-Hui Qu ◽

Hua Yu ◽

Xiu-Jun Gong ◽

Jia-Hui Xu ◽

Hong-Shun Lee

Keyword(s):

Deep Learning ◽

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Learning Approach

Download Full-text

An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences v1 (protocols.io.2rdgd26)

protocols.io ◽

10.17504/protocols.io.2rdgd26 ◽

2019 ◽

Author(s):

Ruixiong Ma

Keyword(s):

Deep Learning ◽

Amino Acid ◽

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Amino Acid Sequences ◽

Learning Method ◽

Contextual Features

Download Full-text

DeepDBP: Deep Neural Networks for Identification of DNA-binding Proteins

10.1101/829432 ◽

2019 ◽

Author(s):

Shadman Shadab ◽

Md Tawab Alam Khan ◽

Nazia Afrin Neezi ◽

Sheikh Adilina ◽

Swakkhar Shatabda

Keyword(s):

Neural Network ◽

Deep Learning ◽

Dna Binding ◽

Binding Proteins ◽

Deep Neural Networks ◽

Defense Mechanism ◽

Dna Binding Proteins ◽

Cellular Level ◽

Test Accuracy ◽

The Past

AbstractDNA-Binding proteins (DBP) are associated with many cellular level functions which includes but not limited to body’s defense mechanism and oxygen transportation. They bind DNAs and interact with them. In the past DBPs were identified using experimental lab based methods. However, in the recent years researchers are using supervised learning to identify DBPs solely from protein sequences. In this paper, we apply deep learning methods to identify DBPs. We have proposed two different deep learning based methods for identifying DBPs: DeepDBP-ANN and DeepDBP-CNN. DeepDBP-ANN uses a generated set of features trained on traditional neural network and DeepDBP-CNN uses a pre-learned embedding and Convolutional Neural Network. Both of our proposed methods were able to produce state-of-the-art results when tested on standard benchmark datasets.DeepDBP-ANN had a train accuracy of 99.02% and test accuracy of 82.80%.And DeepDBP-CNN though had train accuracy of 94.32%, it excelled at identifying test instances with 84.31% accuracy. All methods are available codes and methods are available for use at: https://github.com/antorkhan/DNABinding.

Download Full-text

Identification of genes encoding receptor-like protein kinases as possible targets of pathogen- and salicylic acid-induced WRKY DNA-binding proteins in Arabidopsis

The Plant Journal ◽

10.1046/j.1365-313x.2000.00923.x ◽

2000 ◽

Vol 24 (6) ◽

pp. 837-847 ◽

Cited By ~ 143

Author(s):

Liqun Du ◽

Zhixiang Chen

Keyword(s):

Salicylic Acid ◽

Dna Binding ◽

Protein Kinases ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Genes Encoding

Download Full-text

Faculty Opinions recommendation of Differential display of DNA-binding proteins reveals heat-shock factor 1 as a circadian transcription factor.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1100379.559066 ◽

2008 ◽

Author(s):

Mark Caddick

Keyword(s):

Transcription Factor ◽

Heat Shock ◽

Dna Binding ◽

Differential Display ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Heat Shock Factor ◽

Heat Shock Factor 1 ◽

Circadian Transcription

Download Full-text

Regulation of DNA Metabolism by DNA-Binding Proteins Probed by Single Molecule Spectroscopy

10.21236/ada459264 ◽

2006 ◽

Author(s):

Andreas Hanke

Keyword(s):

Dna Binding ◽

Single Molecule ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Single Molecule Spectroscopy ◽

Dna Metabolism

Download Full-text

DBP-PSSM: Combination of evolutionary profiles with the XGBoost algorithm to improve the identification of DNA-binding proteins

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207323999201124203531 ◽

2020 ◽

Vol 23 ◽

Author(s):

Yanping Zhang ◽

Pengcheng Chen ◽

Ya Gao ◽

Jianwei Ni ◽

Xiaosheng Wang

Keyword(s):

Logistic Regression ◽

Protein Structure ◽

Dna Binding ◽

Molecular Biology ◽

Binding Proteins ◽

Protein Sequences ◽

Low Complexity ◽

Dna Binding Proteins ◽

Position Information ◽

Position Representation

Aim and Objective:: Given the rapidly increasing number of molecular biology data available, computational methods of low complexity are necessary to infer protein structure, function, and evolution. Method:: In the work, we proposed a novel mthod, FermatS, which based on the global position information and local position representation from the curve and normalized moments of inertia, respectively, to extract features information of protein sequences. Furthermore, we use the generated features by FermatS method to analyze the similarity/dissimilarity of nine ND5 proteins and establish the prediction model of DNA-binding proteins based on logistic regression with 5-fold crossvalidation. Results:: In the similarity/dissimilarity analysis of nine ND5 proteins, the results are consistent with evolutionary theory. Moreover, this method can effectively predict the DNA-binding proteins in realistic situations. Conclusion:: The findings demonstrate that the proposed method is effective for comparing, recognizing and predicting protein sequences. The main code and datasets can download from https://github.com/GaoYa1122/FermatS.

Download Full-text