Using evolutionary and structural information to predict DNA‐binding sites on DNA‐binding proteins

Igor B. Kuznetsov; Zhenkun Gou; Run Li; Seungwoo Hwang

doi:10.1002/prot.20977

Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins

Nucleic Acids Research ◽

10.1093/nar/gkg922 ◽

2003 ◽

Vol 31 (24) ◽

pp. 7189-7198 ◽

Cited By ~ 142

Author(s):

S. Jones

Keyword(s):

Dna Binding ◽

Binding Sites ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Electrostatic Potentials ◽

Dna Binding Sites

Download Full-text

Analysis and classification of DNA‐binding sites in single‐stranded and double‐stranded DNA‐binding proteins using protein information

IET Systems Biology ◽

10.1049/iet-syb.2013.0048 ◽

2014 ◽

Vol 8 (4) ◽

pp. 176-183 ◽

Cited By ~ 8

Author(s):

Wei Wang ◽

Juan Liu ◽

Yi Xiong ◽

Lida Zhu ◽

Xionghui zhou

Keyword(s):

Dna Binding ◽

Binding Sites ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Double Stranded Dna ◽

Dna Binding Sites

Download Full-text

Target Detection Assay: A General Method to Determine DNA Binding Sites for Putative DNA-Binding Proteins

Immunological Methods ◽

10.1016/b978-0-12-442704-4.50010-4 ◽

1990 ◽

pp. 61-74 ◽

Cited By ~ 3

Author(s):

Hans-Jürgen Thiesen

Keyword(s):

Dna Binding ◽

Target Detection ◽

Binding Sites ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Dna Binding Sites ◽

Detection Assay ◽

General Method

Download Full-text

Searching for and predicting the activity of sites for DNA binding proteins: compilation and analysis of the binding sites forEscherichia coliintegration host factor (IHF)

Nucleic Acids Research ◽

10.1093/nar/18.17.4993 ◽

1990 ◽

Vol 18 (17) ◽

pp. 4993-5000 ◽

Cited By ~ 186

Author(s):

James A. Goodrich ◽

Michael L. Schwartz ◽

William R. McClure

Keyword(s):

Dna Binding ◽

Binding Sites ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Host Factor

Download Full-text

A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers

Genes ◽

10.3390/genes9080394 ◽

2018 ◽

Vol 9 (8) ◽

pp. 394 ◽

Cited By ~ 9

Author(s):

Xiu-Juan Liu ◽

Xiu-Jun Gong ◽

Hua Yu ◽

Jia-Hui Xu

Keyword(s):

Dna Binding ◽

Binding Proteins ◽

Structural Information ◽

Dna Binding Proteins ◽

Feature Representation ◽

Training Dataset ◽

Evolutionary Information ◽

Sequence Information ◽

Coupled Models ◽

Loosely Coupled

Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature representation of the sequences and choosing an appropriate classifier are the most trivial tasks. Disclosing the significances and contributions of different feature spaces and classifiers to the final prediction is of the utmost importance, not only for the prediction performances, but also the practical clues of biological experiment designs. In this study, we propose a model stacking framework by orchestrating multi-view features and classifiers (MSFBinder) to investigate how to integrate and evaluate loosely-coupled models for predicting DNA-binding proteins. The framework integrates multi-view features including Local_DPP, 188D, Position-Specific Scoring Matrix (PSSM)_DWT and autocross-covariance of secondary structures(AC_Struc), which were extracted based on evolutionary information, sequence composition, physiochemical properties and predicted structural information, respectively. These features are fed into various loosely-coupled classifiers such as SVM and random forest. Then, a logistic regression model was applied to evaluate the contributions of these individual classifiers and to make the final prediction. When performing on the training dataset PDB1075, the proposed method achieves an accuracy of 83.53%. On the independent dataset PDB186, the method achieves an accuracy of 81.72%, which outperforms many existing methods. These results suggest that the framework is able to orchestrate various predicted models flexibly with good performances.

Download Full-text

Genome wide screens in yeast to identify potential binding sites and target genes of DNA-binding proteins

Nucleic Acids Research ◽

10.1093/nar/gkm1117 ◽

2007 ◽

Vol 36 (1) ◽

pp. e8-e8 ◽

Cited By ~ 21

Author(s):

Jue Zeng ◽

Jizhou Yan ◽

Ting Wang ◽

Deborah Mosbrook-Davis ◽

Kyle T. Dolan ◽

...

Keyword(s):

Dna Binding ◽

Binding Sites ◽

Binding Proteins ◽

Target Genes ◽

Dna Binding Proteins ◽

Genome Wide ◽

Potential Binding

Download Full-text

AlphaFold-aware prediction of protein-DNA binding sites using graph transformer

10.1101/2021.08.25.457661 ◽

2021 ◽

Author(s):

Qianmu Yuan ◽

Sheng Chen ◽

Jiahua Rao ◽

Shuangjia Zheng ◽

Huiying Zhao ◽

...

Keyword(s):

Dna Binding ◽

Binding Sites ◽

Structure Prediction ◽

Spatial Information ◽

Structural Information ◽

Biological Activities ◽

Protein Structures ◽

Dna Binding Sites ◽

Novel Drugs ◽

Binding Residues

AbstractMotivationProtein-DNA interactions play crucial roles in the biological systems, and identifying protein-DNA binding sites is the first step for mechanistic understanding of various biological activities (such as transcription and repair) and designing novel drugs. How to accurately identify DNA-binding residues from only protein sequence remains a challenging task. Currently, most existing sequence-based methods only consider contextual features of the sequential neighbors, which are limited to capture spatial information.ResultsBased on the recent breakthrough in protein structure prediction by AlphaFold2, we propose an accurate predictor, GraphSite, for identifying DNA-binding residues based on the structural models predicted by AlphaFold2. Here, we convert the binding site prediction problem into a graph node classification task and employ a transformerbased variant model to take the protein structural information into account. By leveraging predicted protein structures and graph transformer, GraphSite substantially improves over the latest sequence-based and structure-based methods. The algorithm was further confirmed on the independent test set of 196 proteins, where GraphSite surpasses the state-of-the-art structure-based method by 12.3% in AUPR and 9.3% in MCC, [email protected]

Download Full-text

Genetic and epigenetic control of the spatial organization of the genome

Molecular Biology of the Cell ◽

10.1091/mbc.e16-03-0149 ◽

2017 ◽

Vol 28 (3) ◽

pp. 364-369 ◽

Cited By ~ 9

Author(s):

Jason Brickner

Keyword(s):

Gene Expression ◽

Dna Binding ◽

Chromatin Structure ◽

Binding Sites ◽

Spatial Organization ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Adaptive Function ◽

Nuclear Structures ◽

Eukaryotic Genomes

Eukaryotic genomes are spatially organized within the nucleus by chromosome folding, interchromosomal contacts, and interaction with nuclear structures. This spatial organization is observed in diverse organisms and both reflects and contributes to gene expression and differentiation. This leads to the notion that the arrangement of the genome within the nucleus has been shaped and conserved through evolutionary processes and likely plays an adaptive function. Both DNA-binding proteins and changes in chromatin structure influence the positioning of genes and larger domains within the nucleus. This suggests that the spatial organization of the genome can be genetically encoded by binding sites for DNA-binding proteins and can also involve changes in chromatin structure, potentially through nongenetic mechanisms. Here I briefly discuss the results that support these ideas and their implications for how genomes encode spatial organization.

Download Full-text

Combgap contributes to recruitment of Polycomb group proteins in Drosophila

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1520926113 ◽

2016 ◽

Vol 113 (14) ◽

pp. 3826-3831 ◽

Cited By ~ 24

Author(s):

Payal Ray ◽

Sandip De ◽

Apratim Mitra ◽

Karel Bezstarosti ◽

Jeroen A. A. Demmers ◽

...

Keyword(s):

Dna Binding ◽

Binding Sites ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Polycomb Group ◽

Developmentally Regulated ◽

Polycomb Response Elements ◽

Protein Recruitment ◽

Polycomb Repressive Complex 1 ◽

Transcriptional State

Polycomb group (PcG) proteins are responsible for maintaining the silenced transcriptional state of many developmentally regulated genes. PcG proteins are organized into multiprotein complexes that are recruited to DNA via cis-acting elements known as “Polycomb response elements” (PREs). In Drosophila, PREs consist of binding sites for many different DNA-binding proteins, some known and others unknown. Identification of these DNA-binding proteins is crucial to understanding the mechanism of PcG recruitment to PREs. We report here the identification of Combgap (Cg), a sequence-specific DNA-binding protein that is involved in recruitment of PcG proteins. Cg can bind directly to PREs via GTGT motifs and colocalizes with the PcG proteins Pleiohomeotic (Pho) and Polyhomeotic (Ph) at the majority of PREs in the genome. In addition, Cg colocalizes with Ph at a number of targets independent of Pho. Loss of Cg leads to decreased recruitment of Ph at only a subset of sites; some of these sites are binding sites for other Polycomb repressive complex 1 (PRC1) components, others are not. Our data suggest that Cg can recruit Ph in the absence of PRC1 and illustrate the diversity and redundancy of PcG protein recruitment mechanisms.

Download Full-text