scholarly journals Automated Incorporation of Pairwise Dependency in Transcription Factor Binding Site Prediction Using Dinucleotide Weight Tensors

2016 ◽  
Author(s):  
Saeed Omidi ◽  
Mihaela Zavolan ◽  
Mikhail Pachkov ◽  
Jeremie Breda ◽  
Severin Berger ◽  
...  

AbstractGene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to the overall binding affinity, evidence has been accumulating that there can be significant dependencies between positions. Unfortunately, methodological challenges have so far hindered the development of a practical and generally-accepted extension of the PSWM model. On the one hand, simple models that only consider dependencies between nearest-neighbor positions are easy to use in practice, but fail to account for the distal dependencies that are observed in the data. On the other hand, models that allow for arbitrary dependencies are prone to overfitting, requiring regularization schemes that are difficult to use in practice for non-experts.Here we present a new regulatory motif model, called dinucleotide weight tensor (DWT), that incorporates arbitrary pairwise dependencies between positions in binding sites, rigorously from first principles, and free from tunable parameters. We demonstrate the power of the method on a large set of ChIP-seq data-sets, showing that DWTs outperform both PSWMs and motif models that only incorporate nearest-neighbor dependencies. We also demonstrate that DWTs outperform two previously proposed methods. Finally, we show that DWTs inferred from ChIP-seq data also outperform PSWMs on HT-SELEX data for the same TF, suggesting that DWTs capture inherent biophysical properties of the interactions between the DNA binding domains of TFs and their binding sites.We make a suite of DWT tools available at dwt.unibas.ch, that allow users to automatically perform ‘motif finding’, i.e. the inference of DWT motifs from a set of sequences, binding site prediction with DWTs, and visualization of DWT ‘dilogo’ motifs.Author SummaryGene regulatory networks are ultimately encoded in constellations of short binding sites in the DNA and RNA that are recognized by regulatory factors such as transcription factors (TFs). For several decades, computational analysis of regulatory networks has relied on a model of TF sequence-specificity, the position-specific weight-matrix (PSWM), that assumes different positions in a binding site contribute independently to the total binding energy of the TF. However, in recent years evidence has been accumulating that, at least for some TFs, this assumption does not hold. Here we present a new model for the sequence-specificity of TFs, the dinucleotide weight tensor (DWT), that takes arbitrary dependencies between positions in binding sites into account and show that it consistently outperforms PSWMs on high-throughput datasets on TF binding. Moreover, in contrast to previous approaches, DWTs are directly derived from first principles within a Bayesian framework, and contain no tunable parameters. This allows them to be easily applied in practice and we make a suite of tools available for computational analysis with DWTs.

Biosystems ◽  
2009 ◽  
Vol 96 (2) ◽  
pp. 185-193 ◽  
Author(s):  
Max B. Cooper ◽  
Matthew Loose ◽  
John F.Y. Brookfield

2018 ◽  
Vol 15 (138) ◽  
pp. 20170809 ◽  
Author(s):  
Zhipeng Wang ◽  
Davit A. Potoyan ◽  
Peter G. Wolynes

Gene regulatory networks must relay information from extracellular signals to downstream genes in an efficient, timely and coherent manner. Many complex functional tasks such as the immune response require system-wide broadcasting of information not to one but to many genes carrying out distinct functions whose dynamical binding and unbinding characteristics are widely distributed. In such broadcasting networks, the intended target sites are also often dwarfed in number by the even more numerous non-functional binding sites. Taking the genetic regulatory network of NF κ B as an exemplary system we explore the impact of having numerous distributed sites on the stochastic dynamics of oscillatory broadcasting genetic networks pointing out how resonances in binding cycles control the network's specificity and performance. We also show that active kinetic regulation of binding and unbinding through molecular stripping of DNA bound transcription factors can lead to a higher coherence of gene-co-expression and synchronous clearance.


2006 ◽  
Vol 27 (2) ◽  
pp. 141-155 ◽  
Author(s):  
Anders Stegmann ◽  
Morten Hansen ◽  
Yulan Wang ◽  
Janus B. Larsen ◽  
Leif R. Lund ◽  
...  

DNA-binding transcription factors bind to promoters that carry their binding sites. Transcription factors therefore function as nodes in gene regulatory networks. In the present work we used a bioinformatic approach to search for transcription factors that might function as nodes in gene regulatory networks during the differentiation of the small intestinal epithelial cell. In addition we have searched for connections between transcription factors and the villus metabolome. Transcriptome data were generated from mouse small intestinal villus, crypt, and fetal intestinal epithelial cells. Metabolome data were generated from crypt and villus cells. Our results show that genes that are upregulated during fetal to adult and crypt to villus differentiation have an overrepresentation of potential hepatocyte nuclear factor (HNF)-4 binding sites in their promoters. Moreover, metabolome analyses by magic angle spinning 1H nuclear magnetic resonance spectroscopy showed that the villus epithelial cells contain higher concentrations of lipid carbon chains than the crypt cells. These findings suggest a model where the HNF-4 transcription factor influences the villus metabolome by regulating genes that are involved in lipid metabolism. Our approach also identifies transcription factors of importance for crypt functions such as DNA replication (E2F) and stem cell maintenance (c-Myc).


2020 ◽  
Vol 36 (16) ◽  
pp. 4532-4534
Author(s):  
Joselyn Chávez ◽  
Carmina Barberena-Jonas ◽  
Jesus E Sotelo-Fonseca ◽  
José Alquicira-Hernández ◽  
Heladia Salgado ◽  
...  

Abstract Summary RegulonDB has collected, harmonized and centralized data from hundreds of experiments for nearly two decades and is considered a point of reference for transcriptional regulation in Escherichia coli K12. Here, we present the regutools R package to facilitate programmatic access to RegulonDB data in computational biology. regutools gives researchers the possibility of writing reproducible workflows with automated queries to RegulonDB. The regutools package serves as a bridge between RegulonDB data and the Bioconductor ecosystem by reusing the data structures and statistical methods powered by other Bioconductor packages. We demonstrate the integration of regutools with Bioconductor by analyzing transcription factor DNA binding sites and transcriptional regulatory networks from RegulonDB. We anticipate that regutools will serve as a useful building block in our progress to further our understanding of gene regulatory networks. Availability and implementation regutools is an R package available through Bioconductor at bioconductor.org/packages/regutools.


2009 ◽  
Vol 38 (1) ◽  
pp. 1-6 ◽  
Author(s):  
J. S. Torday ◽  
V. K. Rehan

In the postgenomic era, we need an algorithm to readily translate genes into physiologic principles. The failure to advance biomedicine is due to the false hope raised in the wake of the Human Genome Project (HGP) by the promise of systems biology as a ready means of reconstructing physiology from genes. like the atom in physics, the cell, not the gene, is the smallest completely functional unit of biology. Trying to reassemble gene regulatory networks without accounting for this fundamental feature of evolution will result in a genomic atlas, but not an algorithm for functional genomics. For example, the evolution of the lung can be “deconvoluted” by applying cell-cell communication mechanisms to all aspects of lung biology development, homeostasis, and regeneration/repair. Gene regulatory networks common to these processes predict ontogeny, phylogeny, and the disease-related consequences of failed signaling. This algorithm elucidates characteristics of vertebrate physiology as a cascade of emergent and contingent cellular adaptational responses. By reducing complex physiological traits to gene regulatory networks and arranging them hierarchically in a self-organizing map, like the periodic table of elements in physics, the first principles of physiology will emerge.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yesid Cuesta-Astroz ◽  
Guilherme Gischkow Rucatti ◽  
Leandro Murgas ◽  
Carol D. SanMartín ◽  
Mario Sanhueza ◽  
...  

Gene Regulatory Networks (GRNs) allow the study of regulation of gene expression of whole genomes. Among the most relevant advantages of using networks to depict this key process, there is the visual representation of large amounts of information and the application of graph theory to generate new knowledge. Nonetheless, despite the many uses of GRNs, it is still difficult and expensive to assign Transcription Factors (TFs) to the regulation of specific genes. ChIP-Seq allows the determination of TF Binding Sites (TFBSs) over whole genomes, but it is still an expensive technique that can only be applied one TF at a time and requires replicates to reduce its noise. Once TFBSs are determined, the assignment of each TF and its binding sites to the regulation of specific genes is not trivial, and it is often performed by carrying out site-specific experiments that are unfeasible to perform in all possible binding sites. Here, we addressed these relevant issues with a two-step methodology using Drosophila melanogaster as a case study. First, our protocol starts by gathering all transcription factor binding sites (TFBSs) determined with ChIP-Seq experiments available at ENCODE and FlyBase. Then each TFBS is used to assign TFs to the regulation of likely target genes based on the TFBS proximity to the transcription start site of all genes. In the final step, to try to select the most likely regulatory TF from those previously assigned to each gene, we employ GENIE3, a random forest-based method, and more than 9,000 RNA-seq experiments from D. melanogaster. Following, we employed known TF protein-protein interactions to estimate the feasibility of regulatory events in our filtered networks. Finally, we show how known interactions between co-regulatory TFs of each gene increase after the second step of our approach, and thus, the consistency of the TF-gene assignment. Also, we employed our methodology to create a network centered on the Drosophila melanogaster gene Hr96 to demonstrate the role of this transcription factor on mitochondrial gene regulation.


Sign in / Sign up

Export Citation Format

Share Document