scholarly journals Enhanced TF binding site maps improve regulatory networks learned from accessible chromatin data

2019 ◽  
Author(s):  
Shubhada R. Kulkarni ◽  
D. Marc Jones ◽  
Klaas Vandepoele

ABSTRACTDetermining where transcription factors (TF) bind in genomes provides insights into which transcriptional programs are active across organs, tissue types, and environmental conditions. Recent advances in high-throughput profiling of regulatory DNA have yielded large amounts of information about chromatin accessibility. Interpreting the functional significance of these datasets requires knowledge of which regulators are likely to bind these regions. This can be achieved by using information about TF binding preferences, or motifs, to identify TF binding events that are likely to be functional. Although different approaches exist to map motifs to DNA sequences, a systematic evaluation of these tools in plants is missing. Here we compare four motif mapping tools widely used in the Arabidopsis research community and evaluate their performance using chromatin immunoprecipitation datasets for 40 TFs. Downstream gene regulatory network (GRN) reconstruction was found to be sensitive to the motif mapper used. We further show that the low recall of FIMO, one of the most frequently used motif mapping tools, can be overcome by using an Ensemble approach, which combines results from different mapping tools. Several examples are provided demonstrating how the Ensemble approach extends our view on transcriptional control for TFs active in different biological processes. Finally, a new protocol is presented to efficiently derive more complete cell type-specific GRNs through the integrative analysis of open chromatin regions, known binding site information, and expression datasets.

2015 ◽  
Author(s):  
Javier Estrada ◽  
Teresa Ruiz-Herrero ◽  
Clarissa Scholes ◽  
Zeba Wunderlich ◽  
Angela DePace

DNA-binding proteins control many fundamental biological processes such as transcription, recombination and replication. A major goal is to decipher the role that DNA sequence plays in orchestrating the binding and activity of such regulatory proteins. To address this goal, it is useful to rationally design DNA sequences with desired numbers, affinities and arrangements of protein binding sites. However, removing binding sites from DNA is computationally non-trivial since one risks creating new sites in the process of deleting or moving others. Here we present an online binding site removal tool, SiteOut, that enables users to design arbitrary DNA sequences that entirely lack binding sites for factors of interest. SiteOut can also be used to delete sites from a specific sequence, or to introduce site-free spacers between functional sequences without creating new sites at the junctions. In combination with commercial DNA synthesis services, SiteOut provides a powerful and flexible platform for synthetic projects that interrogate regulatory DNA. Here we describe the algorithm and illustrate the ways in which SiteOut can be used; it is publicly available at https://depace.med.harvard.edu/siteout/


2017 ◽  
Vol 35 (4) ◽  
pp. 837-854 ◽  
Author(s):  
Cristina M Alexandre ◽  
James R Urton ◽  
Ken Jean-Baptiste ◽  
John Huddleston ◽  
Michael W Dorrity ◽  
...  

AbstractVariation in regulatory DNA is thought to drive phenotypic variation, evolution, and disease. Prior studies of regulatory DNA and transcription factors across animal species highlighted a fundamental conundrum: Transcription factor binding domains and cognate binding sites are conserved, while regulatory DNA sequences are not. It remains unclear how conserved transcription factors and dynamic regulatory sites produce conserved expression patterns across species. Here, we explore regulatory DNA variation and its functional consequences within Arabidopsis thaliana, using chromatin accessibility to delineate regulatory DNA genome-wide. Unlike in previous cross-species comparisons, the positional homology of regulatory DNA is maintained among A. thaliana ecotypes and less nucleotide divergence has occurred. Of the ∼50,000 regulatory sites in A. thaliana, we found that 15% varied in accessibility among ecotypes. Some of these accessibility differences were associated with extensive, previously unannotated sequence variation, encompassing many deletions and ancient hypervariable alleles. Unexpectedly, for the majority of such regulatory sites, nearby gene expression was unaffected. Nevertheless, regulatory sites with high levels of sequence variation and differential chromatin accessibility were the most likely to be associated with differential gene expression. Finally, and most surprising, we found that the vast majority of differentially accessible sites show no underlying sequence variation. We argue that these surprising results highlight the necessity to consider higher-order regulatory context in evaluating regulatory variation and predicting its phenotypic consequences.


2018 ◽  
Author(s):  
Michal Pawlak ◽  
Katarzyna Z. Kedzierska ◽  
Maciej Migdal ◽  
Karim Abu Nahia ◽  
Jordan A. Ramilowski ◽  
...  

ABSTRACTThe development of an organ involves dynamic regulation of gene transcription and complex multipathway interactions. To better understand transcriptional regulatory mechanism driving heart development and the consequences of its disruption, we isolated cardiomyocytes (CMs) from wild-type zebrafish embryos at 24, 48 and 72 hours post fertilization corresponding to heart looping, chamber formation and heart maturation, and from mutant lines carrying loss-of-function mutations in gata5, tbx5a and hand2, transcription factors (TFs) required for proper heart development. The integration of CM transcriptomics (RNA-seq) and genome-wide chromatin accessibility maps (ATAC-seq) unravelled dynamic regulatory networks driving crucial events of heart development. These networks contained key cardiac TFs including Gata5/6, Nkx2.5, Tbx5/20, and Hand2, and are associated with open chromatin regions enriched for DNA sequence motifs belonging to the family of the corresponding TFs. These networks were disrupted in cardiac TF mutants, indicating their importance in proper heart development. The most prominent gene expression changes, which correlated with chromatin accessibility modifications within their proximal promoter regions, occurred between heart looping and chamber formation, and were associated with metabolic and hematopoietic/cardiac switch during CM maturation. Furthermore, loss of function of cardiac TFs Gata5, Tbx5a, and Hand2 affected the cardiac regulatory networks and caused global changes in chromatin accessibility profile. Among regions with differential chromatin accessibility in mutants were highly conserved non-coding elements which represent putative cis regulatory elements with potential role in heart development and disease. Altogether, our results revealed the dynamic regulatory landscape at key stages of heart development and identified molecular drivers of heart morphogenesis.


2018 ◽  
Author(s):  
Avanti Shrikumar ◽  
Eva Prakash ◽  
Anshul Kundaje

AbstractSupport Vector Machines with gapped k-mer kernels (gkm-SVMs) have been used to learn predictive models of regulatory DNA sequence. However, interpreting predictive sequence patterns learned by gkm-SVMs can be challenging. Existing interpretation methods such as deltaSVM, in-silico mutagenesis (ISM), or SHAP either do not scale well or make limiting assumptions about the model that can produce misleading results when the gkm kernel is combined with nonlinear kernels. Here, we propose gkmexplain: a novel approach inspired by the method of Integrated Gradients for interpreting gkm-SVM models. Using simulated regulatory DNA sequences, we show that gkmexplain identifies predictive patterns with high accuracy while avoiding pitfalls of deltaSVM and ISM and being orders of magnitude more computationally efficient than SHAP. We use a novel motif discovery method called TF-MoDISco to recover consolidated TF motifs from gkm-SVM models of in vivo TF binding by aggregating predictive patterns identified by gkmexplain. Finally, we find that mutation impact scores derived through gkmexplain using gkm-SVM models of chromatin accessibility in lymphoblastoid cell-lines consistently outperform deltaSVM and ISM at identifying regulatory genetic variants (dsQTLs). Code and example notebooks replicating the workflow are available at https://github.com/kundajelab/gkmexplain. Explanatory videos available at http://bit.ly/gkmexplainvids.


2019 ◽  
Author(s):  
Jonathan Sobel ◽  
Claudiane Guay ◽  
Adriana Rodriguez-Trejo ◽  
Lisa Stoll ◽  
Véronique Menoud ◽  
...  

Glucose-induced insulin secretion, a peculiar property of fully mature β-cells, is only achieved after birth and is preceded by a phase of intense proliferation. These events occurring in the neonatal period are decisive for the establishment of an appropriate functional β-cell mass that provides the required insulin throughout life. However, key regulators of gene expression involved in cellular reprogramming along pancreatic islet maturation remain to be elucidated. The present study addressed this issue by mapping open chromatin regions in newborn versus adult rat islets using the ATAC-seq assay. Accessible regions were then correlated with the expression profiles of mRNAs to unveil the regulatory networks governing functional islet maturation. This led to the identification of Scrt1, a novel transcriptional repressor controlling β-cell proliferation.


Hypertension ◽  
2016 ◽  
Vol 68 (suppl_1) ◽  
Author(s):  
Maria F Martinez ◽  
Silvia Medrano ◽  
Masafumi Oka ◽  
Ellen S Pentz ◽  
Allan W Dickerman ◽  
...  

Control of the renin cell phenotype is crucial for the regulation of blood pressure and fluid- electrolyte homeostasis. Enhancers are cis -acting DNA sequences that harbor distinct chromatin features and regulate gene expression in an orientation-independent manner. Recently, clusters of enhancers or super-enhancers (SE) highly enriched with master transcription factors, possessing open chromatin configuration and in close proximity to cell-identity genes have been proposed. We tested the hypothesis that renin cells have unique repertoires of enhancers and super-enhancers, distinct from other cell types. Those regulatory clusters may in turn confer the identity of renin cells. To define the genome-wide enhancer landscape characteristic of renin cells, we studied As4.1 cells, kidney tumor cells that express renin constitutively, and native renin cells sorted from the kidneys of Ren1cKO-YFP + mice. In these mice, the renin promoter drives YFP expression thus marking the renin cells. We used genome-wide ChIP-Seq for Med1 (subunit 1 of the Mediator complex), H3K27Ac (active enhancers) and Pol II (to visualize putative genomic areas undergoing transcription). The ROSE algorithm we used to ascertain super-enhancers. Chromatin accessibility genome-wide was assessed using ATAC-Seq. The results were compared to twenty-one other cell types that do not express renin. In As4.1 cells, we identified 14,871 enhancers based on H3K27Ac. Of those, 888 were classified as super-enhancers. The Med1 signal in As4.1 cells showed a SE localized 5kb upstream the Ren1 gene, which was ranked at position 25 among other SEs. The H3K27Ac signal showed highest occupancy in the same region. ChIP-Seq for H3K27Ac in YFP + cells showed 211 SEs of 2,987 peaks. The SE for the renin gene possessed the highest signal and ranked number 1, indicating its importance in renin cells. One hundred and thirteen SEs were unique to renin cells, including the SE associated with the renin gene. ATAC-Seq signals overlapped with the renin SE and the classical enhancer indicating that the chromatin was accessible for transcription. In summary, renin-expressing cells possess distinct repertoires of unique enhancers and super-enhancers that acting in concert are likely to determine the renin phenotype.


2018 ◽  
Author(s):  
Alicia Madgwick ◽  
Marta Silvia Magri ◽  
Christelle Dantec ◽  
Damien Gailly ◽  
Ulla-Maj Fiuza ◽  
...  

Ascidian species of the Phallusia and Ciona genera are distantly related, their last common ancestor dating several hundred million years ago. Although their genome sequences have extensively diverged since this radiation, Phallusia and Ciona species share almost identical early morphogenesis and stereotyped cell lineages. Here, we explored the evolution of transcriptional control between P. mammillata and C. robusta. We combined genome-wide mapping of open chromatin regions in both species with a comparative analysis of the regulatory sequences of a test set of 10 pairs of orthologous early regulatory genes with conserved expression patterns. We find that ascidian chromatin accessibility landscapes obey similar rules as in other metazoa. Open-chromatin regions are short, highly conserved within each genus and cluster around regulatory genes. The dynamics of chromatin accessibility and closest-gene expression are strongly correlated during early embryogenesis. Open-chromatin regions are highly enriched in cis-regulatory elements: 73% of 49 open chromatin regions around our test genes behaved as either distal enhancers or proximal enhancer/promoters following electroporation in Phallusia eggs. Analysis of this datasets suggests a pervasive use in ascidians of shadow enhancers with partially overlapping activities. Cross-species electroporations point to a deep conservation of both the trans-regulatory logic between these distantly-related ascidians and the cis-regulatory activities of individual enhancers. Finally, we found that the relative order and approximate distance to the transcription start site of open chromatin regions can be conserved between Ciona and Phallusia species despite extensive sequence divergence, a property that can be used to identify orthologous enhancers, whose regulatory activity can partially diverge.


2018 ◽  
Author(s):  
Adam J. Rubin ◽  
Kevin R. Parker ◽  
Ansuman T. Satpathy ◽  
Yanyan Qi ◽  
Beijing Wu ◽  
...  

SummaryHere we present Perturb-ATAC, a method which combines multiplexed CRISPR interference or knockout with genome-wide chromatin accessibility profiling in single cells, based on the simultaneous detection of CRISPR guide RNAs and open chromatin sites by assay of transposase-accessible chromatin with sequencing (ATAC-seq). We applied Perturb-ATAC to transcription factors (TFs), chromatin-modifying factors, and noncoding RNAs (ncRNAs) in ∼4,300 single cells, encompassing more than 63 unique genotype-phenotype relationships. Perturb-ATAC in human B lymphocytes uncovered regulators of chromatin accessibility, TF occupancy, and nucleosome positioning, and identified a hierarchical organization of TFs that govern B cell state, variation, and disease-associatedcis-regulatory elements. Perturb-ATAC in primary human epidermal cells revealed three sequential modules ofcis-elements that specify keratinocyte fate, orchestrated by the TFs JUNB, KLF4, ZNF750, CEBPA, and EHF. Combinatorial deletion of all pairs of these TFs uncovered their epistatic relationships and highlighted genomic co-localization as a basis for synergistic interactions. Thus, Perturb-ATAC is a powerful and general strategy to dissect gene regulatory networks in development and disease.HighlightsA new method for simultaneous measurement of CRISPR perturbations and chromatin state in single cells.Perturb-ATAC reveals regulatory factors that controlcis-element accessibility,trans-factor occupancy, and nucleosome positioning.Perturb-ATAC reveals regulatory modules of coordinatedtrans-factor activity in B lymphoblasts.Keratinocyte differentiation is orchestrated by synergistic activities of co-binding TFs oncis-elements.


2018 ◽  
Author(s):  
Peyton Greenside ◽  
Tyler Shimko ◽  
Polly Fordyce ◽  
Anshul Kundaje

AbstractMotivationTranscription factors bind regulatory DNA sequences in a combinatorial manner to modulate gene expression. Deep neural networks (DNNs) can learn the cis-regulatory grammars encoded in regulatory DNA sequences associated with transcription factor binding and chromatin accessibility. Several feature attribution methods have been developed for estimating the predictive importance of individual features (nucleotides or motifs) in any input DNA sequence to its associated output prediction from a DNN model. However, these methods do not reveal higher-order feature interactions encoded by the models.ResultsWe present a new method called Deep Feature Interaction Maps (DFIM) to efficiently estimate interactions between all pairs of features in any input DNA sequence. DFIM accurately identifies ground truth motif interactions embedded in simulated regulatory DNA sequences. DFIM identifies synergistic interactions between GATA1 and TAL1 motifs from in vivo TF binding models. DFIM reveals epistatic interactions involving nucleotides flanking the core motif of the Cbf1 TF in yeast from in vitro TF binding models. We also apply DFIM to regulatory sequence models of in vivo chromatin accessibility to reveal interactions between regulatory genetic variants and proximal motifs of target TFs as validated by TF binding quantitative trait loci. Our approach makes significant strides in improving the interpretability of deep learning models for genomics.AvailabilityCode is available at: https://github.com/kundajelab/dfim.Contact: [email protected]


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Gi Fay Mok ◽  
Leighton Folkes ◽  
Shannon A. Weldon ◽  
Eirini Maniou ◽  
Victor Martinez-Heredia ◽  
...  

AbstractSomites arising from paraxial mesoderm are a hallmark of the segmented vertebrate body plan. They form sequentially during axis extension and generate musculoskeletal cell lineages. How paraxial mesoderm becomes regionalised along the axis and how this correlates with dynamic changes of chromatin accessibility and the transcriptome remains unknown. Here, we report a spatiotemporal series of ATAC-seq and RNA-seq along the chick embryonic axis. Footprint analysis shows differential coverage of binding sites for several key transcription factors, including CDX2, LEF1 and members of HOX clusters. Associating accessible chromatin with nearby expressed genes identifies cis-regulatory elements (CRE) for TCF15 and MEOX1. We determine their spatiotemporal activity and evolutionary conservation in Xenopus and human. Epigenome silencing of endogenous CREs disrupts TCF15 and MEOX1 gene expression and recapitulates phenotypic abnormalities of anterior–posterior axis extension. Our integrated approach allows dissection of paraxial mesoderm regulatory circuits in vivo and has implications for investigating gene regulatory networks.


Sign in / Sign up

Export Citation Format

Share Document