scholarly journals Transcription factors recognize DNA shape without nucleotide recognition

2017 ◽  
Author(s):  
Md. Abul Hassan Samee ◽  
Benoit G. Bruneau ◽  
Katherine S. Pollard

AbstractWe hypothesized that transcription factors (TFs) recognize DNA shape without nucleotide sequence recognition. Motivating an independent role for shape, many TF binding sites lack a sequence-motif, DNA shape adds specificity to sequence-motifs, and different sequences can encode similar shapes. We therefore asked if binding sites of a TF are enriched for specific patterns of DNA shape-features, e.g., helical twist. We developed ShapeMF, which discovers these shape-motifs de novo without taking sequence information into account. We find that most TFs assayed in ENCODE have shape-motifs and bind regulatory regions recognizing shape-motifs in the absence of sequence-motifs. When shape- and sequence-recognition co-occur, the two types of motifs can be overlapping, flanking, or separated by consistent spacing. Shape-motifs are prevalent in regions co-bound by multiple TFs. Finally, TFs with identical sequence motifs have different shape-motifs, explaining their binding at distinct locations. These results establish shape-motifs as drivers of TF-DNA recognition complementary to sequence-motifs.

2019 ◽  
Vol 47 (13) ◽  
pp. 6632-6641 ◽  
Author(s):  
Soumitra Pal ◽  
Jan Hoinka ◽  
Teresa M Przytycka

Abstract Understanding the principles of DNA binding by transcription factors (TFs) is of primary importance for studying gene regulation. Recently, several lines of evidence suggested that both DNA sequence and shape contribute to TF binding. However, the following compelling question is yet to be considered: in the absence of any sequence similarity to the binding motif, can DNA shape still increase binding probability? To address this challenge, we developed Co-SELECT, a computational approach to analyze the results of in vitro HT-SELEX experiments for TF–DNA binding. Specifically, Co-SELECT leverages the presence of motif-free sequences in late HT-SELEX rounds and their enrichment in weak binders allows Co-SELECT to detect an evidence for the role of DNA shape features in TF binding. Our approach revealed that, even in the absence of the sequence motif, TFs have propensity to bind to DNA molecules of the shape consistent with the motif specific binding. This provides the first direct evidence that shape features that accompany the preferred sequence motifs also bestow an advantage for weak, sequence non-specific binding.


2018 ◽  
Author(s):  
Doris Bachtrog ◽  
Chris Ellison

The repeatability or predictability of evolution is a central question in evolutionary biology, and most often addressed in experimental evolution studies. Here, we infer how genetically heterogeneous natural systems acquire the same molecular changes, to address how genomic background affects adaptation in natural populations. In particular, we take advantage of independently formed neo-sex chromosomes in Drosophila species that have evolved dosage compensation by co-opting the dosage compensation (MSL) complex, to study the mutational paths that have led to the acquisition of 100s of novel binding sites for the MSL complex in different species. This complex recognizes a conserved 21-bp GA-rich sequence motif that is enriched on the X chromosome, and newly formed X chromosomes recruit the MSL complex by de novo acquisition of this binding motif. We identify recently formed sex chromosomes in the Drosophila repleta and robusta species groups by genome sequencing, and generate genomic occupancy maps of the MSL complex to infer the location of novel binding sites. We find that diverse mutational paths were utilized in each species to evolve 100s of de novo binding motifs along the neo-X, including expansions of microsatellites and transposable element insertions. However, the propensity to utilize a particular mutational path differs between independently formed X chromosomes, and appears to be contingent on genomic properties of that species, such as simple repeat or transposable element density. This establishes the “genomic environment” as an important determinant in predicting the outcome of evolutionary adaptations.


2013 ◽  
Vol 42 (D1) ◽  
pp. D148-D155 ◽  
Author(s):  
Lin Yang ◽  
Tianyin Zhou ◽  
Iris Dror ◽  
Anthony Mathelier ◽  
Wyeth W. Wasserman ◽  
...  

2019 ◽  
Author(s):  
Florian Heyl ◽  
Rolf Backofen

The prediction of binding sites (peak calling) is a common task in the data analysis of methods such as crosslinking or chromatin immunoprecipitation in combination with high-throughput sequencing (CLIP-Seq, ChIP-Seq). The predicted binding sites are often further analyzed to predict sequence motifs or structure patterns as an example. However, the obtained peak set can vary in their profile shapes because of the used peakcaller method, different binding domains of the protein, protocol biases, or other factors. Thus, a tool is missing that evaluates and classifies the predicted peaks based on their shapes. We hereby present StoatyDive, a tool that can be used to filter for specific peak profile shapes of sequencing data such as CLIP and ChIP. StoatyDive therefore fine tunes downstream analysis steps such as structure or sequence motif predictions and acts as a quality control.With StoatyDive we were able to classify distinct peak profile shapes from CLIP-seq data of the histone stem-loop-binding protein (SLBP). We show the potential of StoatyDive, as a quality control tool and as a filter to pick different shapes based on biological or methodical questions.StoatyDive is open source and freely available under GLP-3 at https://github.com/BackofenLab/StoatyDive and at bioconda https://anaconda.org/bioconda/stoatydive.


2018 ◽  
Author(s):  
Soumitra Pal ◽  
Jan Hoinka ◽  
Teresa M. Przytycka

AbstractUnderstanding the principles of DNA binding by transcription factors (TFs) is of primary importance for studying gene regulation. Recently, several lines of evidence suggested that both DNA sequence and shape contribute to TF binding. However, the question if in the absence of any sequence similarity to the binding motif, DNA shape can still increase probability of binding was yet to be addressed.To address this challenge, we developed Co-SELECT, a computational approach to analyze the results of in vitro HT-SELEX experiments for TF-DNA binding. Specifically, the presence of motif-free sequences in late HT-SELEX rounds and their enrichment in weak binders allowed us to detect evidence for the role of DNA shape features in TF binding.Our approach revealed that, even in the absence of the sequence motif, TFs have propensity to weakly bind to DNA molecules enriched in specific shape features. Surprisingly, we also found that some properties of DNA shape contribute to promiscuous binding of all tested TF families. Strikingly, such promiscuously bound shapes correspond to the most frequent shape formed by the DNA. We propose that this promiscuous binding facilitates diffusing of TFs along the DNA molecule before it is locked in its binding site.


Author(s):  
Tsu-Pei Chiu ◽  
Beibei Xin ◽  
Nicholas Markarian ◽  
Yingfei Wang ◽  
Remo Rohs

AbstractTFBSshape (https://tfbsshape.usc.edu) is a motif database for analyzing structural profiles of transcription factor binding sites (TFBSs). The main rationale for this database is to be able to derive mechanistic insights in protein–DNA readout modes from sequencing data without available structures. We extended the quantity and dimensionality of TFBSshape, from mostly in vitro to in vivo binding and from unmethylated to methylated DNA. This new release of TFBSshape improves its functionality and launches a responsive and user-friendly web interface for easy access to the data. The current expansion includes new entries from the most recent collections of transcription factors (TFs) from the JASPAR and UniPROBE databases, methylated TFBSs derived from in vitro high-throughput EpiSELEX-seq binding assays and in vivo methylated TFBSs from the MeDReaders database. TFBSshape content has increased to 2428 structural profiles for 1900 TFs from 39 different species. The structural profiles for each TFBS entry now include 13 shape features and minor groove electrostatic potential for standard DNA and four shape features for methylated DNA. We improved the flexibility and accuracy for the shape-based alignment of TFBSs and designed new tools to compare methylated and unmethylated structural profiles of TFs and methods to derive DNA shape-preserving nucleotide mutations in TFBSs.


2018 ◽  
Author(s):  
Cory C. Funk ◽  
Alex M. Casella ◽  
Segun Jung ◽  
Matthew A. Richards ◽  
Alex Rodriguez ◽  
...  

AbstractThere is intense interest in mapping the tissue-specific binding sites of transcription factors in the human genome to reconstruct gene regulatory networks and predict functions for non-coding genetic variation. DNase-seq footprinting provides a means to predict genome-wide binding sites for hundreds of transcription factors (TFs) simultaneously. However, despite the public availability of DNase-seq data for hundreds of samples, there is neither a unified analytical workflow nor a publicly accessible database providing the locations of footprints across all available samples. Here, we implemented a workflow for uniform processing of footprints using two state-of-the-art footprinting algorithms: Wellington and HINT. Our workflow scans the footprints generated by these algorithms for 1,530 sequence motifs to predict binding sites for 1,515 human transcription factors. We applied our workflow to detect footprints in 192 DNase-seq experiments from ENCODE spanning 27 human tissues. This collection of footprints describes an expansive landscape of potential TF occupancy. At thresholds optimized through machine learning, we report high-quality footprints covering 9.8% of the human genome. These footprints were enriched for true positive TF binding sites as defined by ChIP-seq peaks, as well as for genetic variants associated with changes in gene expression. Integrating our footprint atlas with summary statistics from genome-wide association studies revealed that risk for neuropsychiatric traits was enriched specifically at highly-scoring footprints in human brain, while risk for immune traits was enriched specifically at highly-scoring footprints in human lymphoblasts. Our cloud-based workflow is available at github.com/globusgenomics/genomics-footprint and a database with all footprints and TF binding site predictions are publicly available at http://data.nemoarchive.org/other/grant/sament/sament/footprint_atlas.


Sign in / Sign up

Export Citation Format

Share Document