iDNA-BiProt: Predicting DNA-binding Proteins via Feature Extraction and Fuzzy K Neighbor Algorithm

Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning

PeerJ ◽

10.7717/peerj.11262 ◽

2021 ◽

Vol 9 ◽

pp. e11262

Author(s):

Guobin Li ◽

Xiuquan Du ◽

Xinlu Li ◽

Le Zou ◽

Guanhong Zhang ◽

...

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Dna Binding ◽

Binding Proteins ◽

Prediction Models ◽

Short Term Memory ◽

Local Features ◽

Dna Binding Proteins ◽

Superior Performance

DNA-binding proteins (DBPs) play pivotal roles in many biological functions such as alternative splicing, RNA editing, and methylation. Many traditional machine learning (ML) methods and deep learning (DL) methods have been proposed to predict DBPs. However, these methods either rely on manual feature extraction or fail to capture long-term dependencies in the DNA sequence. In this paper, we propose a method, called PDBP-Fusion, to identify DBPs based on the fusion of local features and long-term dependencies only from primary sequences. We utilize convolutional neural network (CNN) to learn local features and use bi-directional long-short term memory network (Bi-LSTM) to capture critical long-term dependencies in context. Besides, we perform feature extraction, model training, and model prediction simultaneously. The PDBP-Fusion approach can predict DBPs with 86.45% sensitivity, 79.13% specificity, 82.81% accuracy, and 0.661 MCC on the PDB14189 benchmark dataset. The MCC of our proposed methods has been increased by at least 9.1% compared to other advanced prediction models. Moreover, the PDBP-Fusion also gets superior performance and model robustness on the PDB2272 independent dataset. It demonstrates that the PDBP-Fusion can be used to predict DBPs from sequences accurately and effectively; the online server is at http://119.45.144.26:8080/PDBP-Fusion/.

Download Full-text

Identification of genes encoding receptor-like protein kinases as possible targets of pathogen- and salicylic acid-induced WRKY DNA-binding proteins in Arabidopsis

The Plant Journal ◽

10.1046/j.1365-313x.2000.00923.x ◽

2000 ◽

Vol 24 (6) ◽

pp. 837-847 ◽

Cited By ~ 143

Author(s):

Liqun Du ◽

Zhixiang Chen

Keyword(s):

Salicylic Acid ◽

Dna Binding ◽

Protein Kinases ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Genes Encoding

Download Full-text

Faculty Opinions recommendation of Differential display of DNA-binding proteins reveals heat-shock factor 1 as a circadian transcription factor.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1100379.559066 ◽

2008 ◽

Author(s):

Mark Caddick

Keyword(s):

Transcription Factor ◽

Heat Shock ◽

Dna Binding ◽

Differential Display ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Heat Shock Factor ◽

Heat Shock Factor 1 ◽

Circadian Transcription

Download Full-text

Regulation of DNA Metabolism by DNA-Binding Proteins Probed by Single Molecule Spectroscopy

10.21236/ada459264 ◽

2006 ◽

Author(s):

Andreas Hanke

Keyword(s):

Dna Binding ◽

Single Molecule ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Single Molecule Spectroscopy ◽

Dna Metabolism

Download Full-text

DBP-PSSM: Combination of evolutionary profiles with the XGBoost algorithm to improve the identification of DNA-binding proteins

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207323999201124203531 ◽

2020 ◽

Vol 23 ◽

Author(s):

Yanping Zhang ◽

Pengcheng Chen ◽

Ya Gao ◽

Jianwei Ni ◽

Xiaosheng Wang

Keyword(s):

Logistic Regression ◽

Protein Structure ◽

Dna Binding ◽

Molecular Biology ◽

Binding Proteins ◽

Protein Sequences ◽

Low Complexity ◽

Dna Binding Proteins ◽

Position Information ◽

Position Representation

Aim and Objective:: Given the rapidly increasing number of molecular biology data available, computational methods of low complexity are necessary to infer protein structure, function, and evolution. Method:: In the work, we proposed a novel mthod, FermatS, which based on the global position information and local position representation from the curve and normalized moments of inertia, respectively, to extract features information of protein sequences. Furthermore, we use the generated features by FermatS method to analyze the similarity/dissimilarity of nine ND5 proteins and establish the prediction model of DNA-binding proteins based on logistic regression with 5-fold crossvalidation. Results:: In the similarity/dissimilarity analysis of nine ND5 proteins, the results are consistent with evolutionary theory. Moreover, this method can effectively predict the DNA-binding proteins in realistic situations. Conclusion:: The findings demonstrate that the proposed method is effective for comparing, recognizing and predicting protein sequences. The main code and datasets can download from https://github.com/GaoYa1122/FermatS.

Download Full-text

MK-FSVM-SVDD: A Multiple Kernel-based Fuzzy SVM Model for Predicting DNA-binding Proteins via Support Vector Data Description

Current Bioinformatics ◽

10.2174/1574893615999200607173829 ◽

2020 ◽

Vol 15 ◽

Author(s):

Yi Zou ◽

Hongjie Wu ◽

Xiaoyi Guo ◽

Li Peng ◽

Yijie Ding ◽

...

Keyword(s):

Dna Binding ◽

Binding Proteins ◽

Detection Efficiency ◽

Dna Binding Proteins ◽

Support Vector ◽

Support Vector Data Description ◽

Vector Data ◽

Data Description ◽

Multiple Kernel ◽

Svm Model

Background: Detecting DNA-binding proetins (DBPs) based on biological and chemical methods is time consuming and expensive. Objective: In recent years, the rise of computational biology methods based on Machine Learning (ML) has greatly improved the detection efficiency of DBPs. Method: In this study, Multiple Kernel-based Fuzzy SVM Model with Support Vector Data Description (MK-FSVM-SVDD) is proposed to predict DBPs. Firstly, sex features are extracted from protein sequence. Secondly, multiple kernels are constructed via these sequence feature. Than, multiple kernels are integrated by Centered Kernel Alignment-based Multiple Kernel Learning (CKA-MKL). Next, fuzzy membership scores of training samples are calculated with Support Vector Data Description (SVDD). FSVM is trained and employed to detect new DBPs. Results: Our model is test on several benchmark datasets. Compared with other methods, MK-FSVM-SVDD achieves best Matthew's Correlation Coefficient (MCC) on PDB186 (0.7250) and PDB2272 (0.5476). Conclusion: We can conclude that MK-FSVM-SVDD is more suitable than common SVM, as the classifier for DNA-binding proteins identification.

Download Full-text

Bacteriophage-specific DNA-binding proteins in P22-lysogenic and in P22-infected Salmonella typhimurium.

Journal of Virology ◽

10.1128/jvi.20.1.334-338.1976 ◽

1976 ◽

Vol 20 (1) ◽

pp. 334-338 ◽

Cited By ~ 6

Author(s):

W Schumann ◽

E Lindenblatt ◽

E G Bade

Keyword(s):

Dna Binding ◽

Salmonella Typhimurium ◽

Binding Proteins ◽

Dna Binding Proteins

Download Full-text

Identification of three sequence-specific DNA-binding proteins which interact with the Rous sarcoma virus enhancer and upstream promoter elements.

Journal of Virology ◽

10.1128/jvi.62.6.2186-2190.1988 ◽

1988 ◽

Vol 62 (6) ◽

pp. 2186-2190 ◽

Cited By ~ 5

Author(s):

G H Goodwin

Keyword(s):

Dna Binding ◽

Rous Sarcoma Virus ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Promoter Elements ◽

Rous Sarcoma ◽

Upstream Promoter ◽

Sarcoma Virus

Download Full-text

Identification with monoclonal antibodies of virus-specific DNA-binding proteins in the nuclei of cells infected with three serotypes of Marek's disease virus-related viruses.

Journal of Virology ◽

10.1128/jvi.59.1.154-158.1986 ◽

1986 ◽

Vol 59 (1) ◽

pp. 154-158 ◽

Cited By ~ 2

Author(s):

K Nakajima ◽

K Ikuta ◽

S Ueda ◽

S Kato ◽

K Hirai

Keyword(s):

Monoclonal Antibodies ◽

Dna Binding ◽

Disease Virus ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Marek's Disease Virus ◽

Marek's Disease ◽

Marek’S Disease Virus ◽

Marek’S Disease

Download Full-text

Two cellular single-strand-specific DNA-binding proteins interact with two regions of the bovine papillomavirus type 1 genome, including the origin of DNA replication.

Journal of Virology ◽

10.1128/jvi.66.10.5988-5998.1992 ◽

1992 ◽

Vol 66 (10) ◽

pp. 5988-5998 ◽

Cited By ~ 1

Author(s):

C Habiger ◽

G Stelzer ◽

U Schwarz ◽

E L Winnacker

Keyword(s):

Dna Replication ◽

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Single Strand ◽

Bovine Papillomavirus ◽

Origin Of Dna Replication

Download Full-text