scholarly journals DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding

2016 ◽  
Author(s):  
Wenxiu Ma ◽  
Lin Yang ◽  
Remo Rohs ◽  
William Stafford Noble

AbstractMotivationTranscription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites.ResultsWe describe a sequence+shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (1) the k-spectrum+shape model performs better than the classical k-spectrum kernel, particularly for small k values; (2) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (3) the di-mismatch+shape kernel performs better than the di-mismatch kernel for intermediate k values.AvailabilityThe software is available at https://bitbucket.org/wenxiu/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.

2019 ◽  
Vol 47 (13) ◽  
pp. 6632-6641 ◽  
Author(s):  
Soumitra Pal ◽  
Jan Hoinka ◽  
Teresa M Przytycka

Abstract Understanding the principles of DNA binding by transcription factors (TFs) is of primary importance for studying gene regulation. Recently, several lines of evidence suggested that both DNA sequence and shape contribute to TF binding. However, the following compelling question is yet to be considered: in the absence of any sequence similarity to the binding motif, can DNA shape still increase binding probability? To address this challenge, we developed Co-SELECT, a computational approach to analyze the results of in vitro HT-SELEX experiments for TF–DNA binding. Specifically, Co-SELECT leverages the presence of motif-free sequences in late HT-SELEX rounds and their enrichment in weak binders allows Co-SELECT to detect an evidence for the role of DNA shape features in TF binding. Our approach revealed that, even in the absence of the sequence motif, TFs have propensity to bind to DNA molecules of the shape consistent with the motif specific binding. This provides the first direct evidence that shape features that accompany the preferred sequence motifs also bestow an advantage for weak, sequence non-specific binding.


2015 ◽  
Vol 112 (15) ◽  
pp. 4654-4659 ◽  
Author(s):  
Tianyin Zhou ◽  
Ning Shen ◽  
Lin Yang ◽  
Namiko Abe ◽  
John Horton ◽  
...  

DNA binding specificities of transcription factors (TFs) are a key component of gene regulatory processes. Underlying mechanisms that explain the highly specific binding of TFs to their genomic target sites are poorly understood. A better understanding of TF−DNA binding requires the ability to quantitatively model TF binding to accessible DNA as its basic step, before additional in vivo components can be considered. Traditionally, these models were built based on nucleotide sequence. Here, we integrated 3D DNA shape information derived with a high-throughput approach into the modeling of TF binding specificities. Using support vector regression, we trained quantitative models of TF binding specificity based on protein binding microarray (PBM) data for 68 mammalian TFs. The evaluation of our models included cross-validation on specific PBM array designs, testing across different PBM array designs, and using PBM-trained models to predict relative binding affinities derived from in vitro selection combined with deep sequencing (SELEX-seq). Our results showed that shape-augmented models compared favorably to sequence-based models. Although both k-mer and DNA shape features can encode interdependencies between nucleotide positions of the binding site, using DNA shape features reduced the dimensionality of the feature space. In addition, analyzing the feature weights of DNA shape-augmented models uncovered TF family-specific structural readout mechanisms that were not revealed by the DNA sequence. As such, this work combines knowledge from structural biology and genomics, and suggests a new path toward understanding TF binding and genome function.


2013 ◽  
Vol 42 (5) ◽  
pp. 3059-3072 ◽  
Author(s):  
Montse Gustems ◽  
Anne Woellmer ◽  
Ulrich Rothbauer ◽  
Sebastian H. Eck ◽  
Thomas Wieland ◽  
...  

Abstract CpG methylation in mammalian DNA is known to interfere with gene expression by inhibiting the binding of transactivators to their cognate sequence motifs or recruiting proteins involved in gene repression. An Epstein–Barr virus-encoded transcription factor, Zta, was the first example of a sequence-specific transcription factor that preferentially recognizes and selectively binds DNA sequence motifs with methylated CpG residues, reverses epigenetic silencing and activates gene transcription. The DNA binding domain of Zta is homologous to c-Fos, a member of the cellular AP-1 (activator protein 1) transcription factor family, which regulates cell proliferation and survival, apoptosis, transformation and oncogenesis. We have identified a novel AP-1 binding site termed meAP-1, which contains a CpG dinucleotide. If methylated, meAP-1 sites are preferentially bound by the AP-1 heterodimer c-Jun/c-Fos in vitro and in cellular chromatin in vivo. In activated human primary B cells, c-Jun/c-Fos locates to these methylated elements in promoter regions of transcriptionally activated genes. Reminiscent of the viral Zta protein, c-Jun/c-Fos is the first identified cellular member of the AP-1 family of transactivators that can induce expression of genes with methylated, hence repressed promoters, reversing epigenetic silencing.


1991 ◽  
Vol 11 (12) ◽  
pp. 5910-5918 ◽  
Author(s):  
Y L Yuan ◽  
S Fields

The STE12 protein of the yeast Saccharomyces cerevisiae binds to the pheromone response element (PRE) present in the upstream region of genes whose transcription is induced by pheromone. Using DNase I footprinting assays with bacterially made STE12 fragments, we localized the DNA-binding domain to 164 amino acids near the amino terminus. Footprinting of oligonucleotide-derived sequences containing one PRE, or two PREs in head-to-tail or tail-to-tail orientation, showed that the N-terminal 215 amino acids of STE12 has similar binding affinity to either of the dimer sites and a binding affinity 5- to 10-fold lower for the monomer site. This binding cooperativity was also evident on a fragment from the MFA2 gene, which encodes the a-factor pheromone. On this fragment, the 215-amino-acid STE12 fragment protected both a consensus PRE as well as a degenerate PRE containing an additional residue. Mutation of the degenerate site led to a 5- to 10-fold decrease in binding; mutation of the consensus site led to a 25-fold decrease in binding. The ability of PREs to function as pheromone-inducible upstream activation sequences in yeast correlated with their ability to bind the STE12 domain in vitro. The sequence of the STE12 DNA-binding domain contains similarities to the homeodomain, although it is highly diverged from other known examples of this motif. Moreover, the alignment between STE12 and the homeodomain postulates loops after both the putative helix 1 and helix 2 of the STE12 sequence.


2000 ◽  
Vol 20 (8) ◽  
pp. 2852-2864 ◽  
Author(s):  
Mary Baum ◽  
Louise Clarke

ABSTRACT Two functionally important DNA sequence elements in centromeres of the fission yeast Schizosaccharomyces pombe are the centromeric central core and the K-type repeat. Both of these DNA elements show internal functional redundancy that is not correlated with a conserved DNA sequence. Specific, but degenerate, sequences in these elements are bound in vitro by the S. pombeDNA-binding proteins Abp1p (also called Cbp1p) and Cbhp, which are related to the mammalian centromere DNA-binding protein CENP-B. In this study, we determined that Abp1p binds to at least one of its target sequences within S. pombe centromere II central core (cc2) DNA with an affinity (Ks = 7 × 109 M−1) higher than those of other known centromere DNA-binding proteins for their cognate targets. In vivo, epitope-tagged Cbhp associated with centromeric K repeat chromatin, as well as with noncentromeric regions. Likeabp1+/cbp1 +, we found thatcbh + is not essential in fission yeast, but a strain carrying deletions of both genes (Δabp1 Δcbh) is extremely compromised in growth rate and morphology and missegregates chromosomes at very high frequency. The synergism between the two null mutations suggests that these proteins perform redundant functions in S. pombe chromosome segregation. In vitro assays with cell extracts with these proteins depleted allowed the specific assignments of several binding sites for them within cc2 and the K-type repeat. Redundancy observed at the centromere DNA level appears to be reflected at the protein level, as no single member of the CENP-B-related protein family is essential for proper chromosome segregation in fission yeast. The relevance of these findings to mammalian centromeres is discussed.


1996 ◽  
Vol 16 (7) ◽  
pp. 3814-3824 ◽  
Author(s):  
J D Molkentin ◽  
A B Firulli ◽  
B L Black ◽  
J F Martin ◽  
C M Hustad ◽  
...  

There are four members of the myocyte enhancer binding factor 2 (MEF2) family of transcription factors, MEF2A, -B, -C, and -D, that have homology within an amino-terminal MADS box and an adjacent MEF2 domain that together mediate dimerization and DNA binding. MEF2A, -C, and -D have previously been shown to bind an A/T-rich DNA sequence in the control regions of numerous muscle-specific genes, whereas MEF2B was reported to be unable to bind this sequence unless the carboxyl terminus was deleted. To further define the functions of MEF2B, we analyzed its DNA binding and transcriptional activities. In contrast to previous studies, our results show that MEF2B binds the same DNA sequence as other members of the MEF2 family and acts as a strong transactivator through that sequence. Transcriptional activation by MEF2B is dependent on the carboxyl terminus, which contains two conserved sequence motifs found in all vertebrate MEF2 factors. During mouse embryogenesis, MEF2B transcripts are expressed in the developing cardiac and skeletal muscle lineages in a temporospatial pattern distinct from but overlapping with those of the other Mef2 genes. The mouse Mef2b gene maps to chromosome 8 and is unlinked to other Mef2 genes; its intron-exon organization is similar to that of the other vertebrate Mef2 genes and the single Drosophila Mef2 gene, consistent with the notion that these different Mef2 genes evolved from a common ancestral gene.


1999 ◽  
Vol 181 (3) ◽  
pp. 1035-1038 ◽  
Author(s):  
Kathleen Sandman ◽  
John N. Reeve

ABSTRACT DNA shape recognition determines the preferred binding sites for sequence-independent DNA binding proteins, and here we document that archaeal histones assemble archaeal nucleosomes in vitro centered preferentially within (CTG)6 and (CTG)8repeats, close to junctions with flanking mixed-sequence DNA. Archaeal nucleosomes were not positioned by (CTG)4-, (CTG)5-, or (CTG)3AA(CTG)3-containing DNA sequences. The features of CTG repeat-containing sequences that direct eucaryal nucleosome positioning may also be similarly recognized by archaeal histones.


2018 ◽  
Author(s):  
Soumitra Pal ◽  
Jan Hoinka ◽  
Teresa M. Przytycka

AbstractUnderstanding the principles of DNA binding by transcription factors (TFs) is of primary importance for studying gene regulation. Recently, several lines of evidence suggested that both DNA sequence and shape contribute to TF binding. However, the question if in the absence of any sequence similarity to the binding motif, DNA shape can still increase probability of binding was yet to be addressed.To address this challenge, we developed Co-SELECT, a computational approach to analyze the results of in vitro HT-SELEX experiments for TF-DNA binding. Specifically, the presence of motif-free sequences in late HT-SELEX rounds and their enrichment in weak binders allowed us to detect evidence for the role of DNA shape features in TF binding.Our approach revealed that, even in the absence of the sequence motif, TFs have propensity to weakly bind to DNA molecules enriched in specific shape features. Surprisingly, we also found that some properties of DNA shape contribute to promiscuous binding of all tested TF families. Strikingly, such promiscuously bound shapes correspond to the most frequent shape formed by the DNA. We propose that this promiscuous binding facilitates diffusing of TFs along the DNA molecule before it is locked in its binding site.


1992 ◽  
Vol 12 (11) ◽  
pp. 4809-4816
Author(s):  
F Katagiri ◽  
K Seipel ◽  
N H Chua

We have carried out deletion analyses of a tobacco transcription activator, TGA1a, in order to define its functional domains. TGA1a belongs to the basic-region-leucine zipper (bZIP) class of DNA-binding proteins. Like other proteins of this class, it binds to its target DNA as a dimer, and its bZIP domain is necessary and sufficient for specific DNA binding. A mutant polypeptide containing the bZIP domain alone, however, shows a lower DNA-binding affinity than the full-length TGA1a. The C-terminal portion of TGA1a, which is essential for the higher DNA-binding affinity, contains a polypeptide region that can stabilize dimeric forms of the protein. This polypeptide region is designated the dimer stabilization (DS) region. Under our in vitro conditions, TGA1a derivatives with the DS region and those without the region do not form a detectable mixed dimer. This result indicates that in addition to the leucine zipper, the DS region can serve as another determinant of the dimerization specificity of TGA1a. In fact, the DS region, when fused to another bZIP protein, C/EBP, can inhibit dimer formation between the fusion protein and native C/EBP, whereas each of these can form homodimers. Such a portable determinant of dimerization specificity has potential application in studies of DNA-binding proteins as well as in biotechnology.


Sign in / Sign up

Export Citation Format

Share Document