scholarly journals Synthetic STARR-seq reveals how DNA shape and sequence modulate transcriptional output and noise

2018 ◽  
Author(s):  
Stefanie Schöne ◽  
Melissa Bothe ◽  
Edda Einfeldt ◽  
Marina Borschiwer ◽  
Philipp Benner ◽  
...  

AbstractThe binding of transcription factors to short recognition sequences plays a pivotal role in controlling the expression of genes. The sequence and shape characteristics of binding sites influence DNA binding specificity and have also been implicated in modulating the activity of transcription factors downstream of binding. To quantitatively assess the transcriptional activity of dozens of thousands of designed synthetic sites in parallel, we developed a synthetic version of STARR-seq (synSTARR-seq). We used the approach to systematically analyze how variations in the recognition sequence of the glucocorticoid receptor (GR) affect transcriptional regulation. Our approach resulted in the identification of a novel highly active functional GR binding sequence and revealed that sequence variation both within and flanking GR’s core binding site can modulate GR activity without apparent changes in DNA binding affinity. Notably, we found that the sequence composition of variants with similar activity profiles was highly diverse. In contrast, groups of variants with similar activity profiles showed distinct DNA shape characteristics indicating that DNA shape may be a better predictor of activity than DNA sequence. Finally, using single cell experiments with individual enhancer variants, we obtained clues indicating that the architecture of the response element can independently tune expression mean and cell-to cell variability in gene expression (noise). Together, our studies establish synSTARR as a powerful method to systematically study how DNA sequence and shape modulate transcriptional output and noise.

Author(s):  
Ruby Sharma ◽  
Shanti P. Gangwar ◽  
Ajay K. Saxena

ERG3 (ETS-related gene) is a member of the ETS (erythroblast transformation-specific) family of transcription factors, which contain a highly conserved DNA-binding domain. The ETS family of transcription factors differ in their binding to promoter DNA sequences, and the mechanism of their DNA-sequence discrimination is little known. In the current study, crystals of the ETSi domain (the ETS domain of ERG3 containing a CID motif) in space group P41212 and of its complex with the E74 DNA sequence (DNA9) in space group C2221 were obtained and their structures were determined. Comparative structure analysis of the ETSi domain and its complex with DNA9 with previously determined structures of the ERGi domain (the ETS domain of ERG containing inhibitory motifs) in space group P65212 and of the ERGi–DNA12 complex in space group P41212 were performed. The ETSi domain is observed as a homodimer in solution as well as in the crystallographic asymmetric unit. Superposition of the structure of the ETSi domain on that of the ERGi domain showed a major conformational change at the C-terminal DNA-binding autoinhibitory (CID) motif, while minor changes are observed in the loop regions of the ETSi-domain structure. The ETSi–DNA9 complex in space group C2221 forms a structure that is quite similar to that of the ERG–DNA12 complex in space group P41212. Upon superposition of the complexes, major conformational changes are observed at the 5′ and 3′ ends of DNA9, while the conformation of the core GGA nucleotides was quite conserved. Comparison of the ETSi–DNA9 structure with known structures of ETS class 1 protein–DNA complexes shows the similarities and differences in the promoter DNA binding and specificity of the class 1 ETS proteins.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Janik Sielemann ◽  
Donat Wulf ◽  
Romy Schmidt ◽  
Andrea Bräutigam

AbstractUnderstanding gene expression will require understanding where regulatory factors bind genomic DNA. The frequently used sequence-based motifs of protein-DNA binding are not predictive, since a genome contains many more binding sites than are actually bound and transcription factors of the same family share similar DNA-binding motifs. Traditionally, these motifs only depict sequence but neglect DNA shape. Since shape may contribute non-linearly and combinational to binding, machine learning approaches ought to be able to better predict transcription factor binding. Here we show that a random forest machine learning approach, which incorporates the 3D-shape of DNA, enhances binding prediction for all 216 tested Arabidopsis thaliana transcription factors and improves the resolution of differential binding by transcription factor family members which share the same binding motif. We observed that DNA shape features were individually weighted for each transcription factor, even if they shared the same binding sequence.


2018 ◽  
Author(s):  
Andrea Callegari ◽  
Christian Sieben ◽  
Alexander Benke ◽  
David M. Suter ◽  
Beat Fierz ◽  
...  

AbstractTranscription factors (TFs) regulate gene expression in both prokaryotes and eukaryotes by recognizing and binding to specific DNA promoter sequences. In higher eukaryotes, it remains unclear how the duration of TF binding to DNA relates to downstream transcriptional output. Here, we address this question for the transcriptional activator NF-κB (p65), by live-cell single molecule imaging of TF-DNA binding kinetics and genome-wide quantification of p65-mediated transcription. We used mutants of p65, perturbing either the DNA binding domain (DBD) or the protein-protein transactivation domain (TAD). We found that p65-DNA binding time was predominantly determined by its DBD and directly correlated with its transcriptional output as long as the TAD is intact. Surprisingly, mutation or deletion of the TAD did not modify p65-DNA binding stability, suggesting that the p65 TAD generally contributes neither to the assembly of an “enhanceosome,” nor to the active removal of p65 from putative specific binding sites. However, TAD removal did reduce p65-mediated transcriptional activation, indicating that protein-protein interactions act to translate the long-lived p65-DNA binding into productive transcription.Author SummaryTo control transcription of a certain gene or a group of genes, both eukaryotes and prokaryotes express specialized proteins, transcription factors (TFs). During gene activation, TFs bind gene promotor sequences to recruit the transcriptional machinery including DNA polymerase II. TFs are often multi-subunit proteins containing a DNA-binding domain (DBD) as well as a protein-protein interaction interface. It was suggested that the duration of a TF-DNA binding event 1) depends on these two subunits and 2) dictates the outcome, i.e. the amount of mRNA produced from an activated gene. We set out to address these hypotheses using the transcriptional activator NF-κB (p65) as well as a number of mutants affecting different functional subunits. Using a combination of live-cell microscopy and RNA sequencing, we show that p65 DNA-binding time indeed correlates with the transcriptional output, but that this relationship depends on, and hence can be uncoupled by altering, the protein-protein interaction capacity. Our results suggest that, while p65 DNA binding times are dominated by the DBD, a transcriptional output can only be achieved with a functional protein-protein interaction subunit.


2018 ◽  
Author(s):  
Ariel Afek ◽  
Stefan Ilic ◽  
John Horton ◽  
David B. Lukatsky ◽  
Raluca Gordan ◽  
...  

SUMMARYPrimases are key enzymes involved in DNA replication. They act on single-stranded DNA, and catalyze the synthesis of short RNA primers used by DNA polymerases. Here, we investigate the DNA-binding and activity of the bacteriophage T7 primase using a new workflow called High-Throughput Primase Profiling (HTPP). Using a unique combination of high-throughput binding assays and biochemical analyses, HTPP reveals a complex landscape of binding specificity and functional activity for the T7 primase, determined by sequences flanking the primase recognition site. We identified specific features, such as G/T-rich flanks, which increase primase-DNA binding up to 10-fold and, surprisingly, also increase the length of newly formed RNA (up to 3-fold). To our knowledge, variability in primer length has not been reported for this primase. We expect that applying HTPP to additional enzymes will reveal new insights into the effects of DNA sequence composition on the DNA recognition and functional activity of primases.


2020 ◽  
Vol 40 (7) ◽  
Author(s):  
Gergely Nagy ◽  
Bence Daniel ◽  
Ixchelt Cuaranta-Monroy ◽  
Laszlo Nagy

ABSTRACT Peroxisome proliferator-activated receptor γ (PPARγ) is a nuclear receptor essential for adipocyte development and the maintenance of the alternatively polarized macrophage phenotype. Biochemical studies have established that as an obligate heterodimer with retinoid X receptor (RXR), PPARγ binds directly repeated nuclear receptor half sites spaced by one nucleotide (direct repeat 1 [DR1]). However, it has not been analyzed systematically and genome-wide how cis factors such as the sequences of DR1s and adjacent sequences and trans factors such as cobinding lineage-determining transcription factors (LDTFs) contribute to the direct binding of PPARγ in different cellular contexts. We developed a novel motif optimization approach using sequence composition and chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) densities from macrophages and adipocytes to complement de novo motif enrichment analysis and to define and classify high-affinity binding sites. We found that approximately half of the PPARγ cistrome represents direct DNA binding; both half sites can be extended upstream, and these are typically not of equal strength within a DR1. Strategically positioned LDTFs have greater impact on PPARγ binding than the quality of DR1, and the presence of the extension of DR1 provides a remarkable synergy with LDTFs. This approach of considering not only nucleotide frequencies but also their contribution to protein binding in a cellular context is applicable to other transcription factors.


2016 ◽  
Author(s):  
Wenxiu Ma ◽  
Lin Yang ◽  
Remo Rohs ◽  
William Stafford Noble

AbstractMotivationTranscription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites.ResultsWe describe a sequence+shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (1) the k-spectrum+shape model performs better than the classical k-spectrum kernel, particularly for small k values; (2) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (3) the di-mismatch+shape kernel performs better than the di-mismatch kernel for intermediate k values.AvailabilityThe software is available at https://bitbucket.org/wenxiu/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.


Acta Naturae ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 31-46
Author(s):  
Oksana G. Maksimenko ◽  
Dariya V. Fursenko ◽  
Elena V. Belova ◽  
Pavel G. Georgiev

In mammals, most of the boundaries of topologically associating domains and all well-studied insulators are rich in binding sites for the CTCF protein. According to existing experimental data, CTCF is a key factor in the organization of the architecture of mammalian chromosomes. A characteristic feature of the CTCF is that the central part of the protein contains a cluster consisting of eleven domains of C2H2-type zinc fingers, five of which specifically bind to a long DNA sequence conserved in most animals. The class of transcription factors that carry a cluster of C2H2-type zinc fingers consisting of five or more domains (C2H2 proteins) is widely represented in all groups of animals. The functions of most C2H2 proteins still remain unknown. This review presents data on the structure and possible functions of these proteins, using the example of the vertebrate CTCF protein and several well- characterized C2H2 proteins in Drosophila and mammals.


Sign in / Sign up

Export Citation Format

Share Document