A map of direct TF-DNA interactions in the human genome

Mapping Intimacies ◽

10.1101/394205 ◽

2018 ◽

Cited By ~ 2

Author(s):

Marius Gheorghe ◽

Geir Kjetil Sandve ◽

Aziz Khan ◽

Jeanne Chèneby ◽

Benoit Ballester ◽

...

Keyword(s):

Human Genome ◽

Protein Interaction Data ◽

Data Sets ◽

Dna Interactions ◽

Regulatory Modules ◽

Protein Protein Interaction ◽

Protein Binding Microarray ◽

Bona Fide ◽

Genomic Regions

ABSTRACTChromatin immunoprecipitation followed by sequencing (ChIP-seq) is the most popular assay to identify genomic regions, called ChIP-seq peaks, that are bound in vivo by transcription factors (TFs). These regions are derived from direct TF-DNA interactions, indirect binding of the TF to the DNA (through a co-binding partner), nonspecific binding to the DNA, and noise/bias/artifacts. Delineating the bona fide direct TF-DNA interactions within the ChIP-seq peaks remains challenging. We developed a dedicated software, ChIP-eat, that combines computational TF binding models and ChIP-seq peaks to automatically predict direct TF-DNA interactions. Our work culminated with predicted interactions covering >4% of the human genome, obtained by uniformly processing 1,983 ChIP-seq peak data sets from the ReMap database for 232 unique TFs. The predictions were a posteriori assessed using protein binding microarray and ChIP-exo data, and were predominantly found in high quality ChIP-seq peaks. The set of predicted direct TF-DNA interactions suggested that high-occupancy target regions are likely not derived from direct binding of the TFs to the DNA. Our predictions derived co-binding TFs supported by protein-protein interaction data and defined cis-regulatory modules enriched for disease- and trait-associated SNPs. Finally, we provide this collection of direct TF-DNA interactions and cis-regulatory modules in the human genome through the UniBind web-interface (http://unibind.uio.no).

Download Full-text

A map of direct TF–DNA interactions in the human genome

Nucleic Acids Research ◽

10.1093/nar/gky1210 ◽

2018 ◽

Vol 47 (4) ◽

pp. e21-e21 ◽

Cited By ~ 19

Author(s):

Marius Gheorghe ◽

Geir Kjetil Sandve ◽

Aziz Khan ◽

Jeanne Chèneby ◽

Benoit Ballester ◽

...

Keyword(s):

Human Genome ◽

Protein Interaction Data ◽

Data Sets ◽

Dna Interactions ◽

Regulatory Modules ◽

Protein Protein Interaction ◽

Protein Binding Microarray ◽

Bona Fide ◽

Genomic Regions

Abstract Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the most popular assay to identify genomic regions, called ChIP-seq peaks, that are bound in vivo by transcription factors (TFs). These regions are derived from direct TF–DNA interactions, indirect binding of the TF to the DNA (through a co-binding partner), nonspecific binding to the DNA, and noise/bias/artifacts. Delineating the bona fide direct TF–DNA interactions within the ChIP-seq peaks remains challenging. We developed a dedicated software, ChIP-eat, that combines computational TF binding models and ChIP-seq peaks to automatically predict direct TF–DNA interactions. Our work culminated with predicted interactions covering >2% of the human genome, obtained by uniformly processing 1983 ChIP-seq peak data sets from the ReMap database for 232 unique TFs. The predictions were a posteriori assessed using protein binding microarray and ChIP-exo data, and were predominantly found in high quality ChIP-seq peaks. The set of predicted direct TF–DNA interactions suggested that high-occupancy target regions are likely not derived from direct binding of the TFs to the DNA. Our predictions derived co-binding TFs supported by protein-protein interaction data and defined cis-regulatory modules enriched for disease- and trait-associated SNPs. We provide this collection of direct TF–DNA interactions and cis-regulatory modules through the UniBind web-interface (http://unibind.uio.no).

Download Full-text

Analysis and application of large-scale protein-protein interaction data sets

Chinese Science Bulletin ◽

10.1007/bf03183732 ◽

2005 ◽

Vol 50 (20) ◽

pp. 2267-2272 ◽

Cited By ~ 1

Author(s):

Jingchun Sun ◽

Jinlin Xu ◽

Yixue Li ◽

Tieliu Shi

Keyword(s):

Protein Interaction ◽

Large Scale ◽

Protein Interaction Data ◽

Data Sets ◽

Interaction Data ◽

Protein Protein Interaction

Download Full-text

UniBind: maps of high-confidence direct TF-DNA interactions across nine species

10.1101/2020.11.17.384578 ◽

2020 ◽

Author(s):

Rafael Riudavets Puig ◽

Paul Boddie ◽

Aziz Khan ◽

Jaime Abraham Castro-Mondragon ◽

Anthony Mathelier

Keyword(s):

Binding Sites ◽

R Package ◽

Dna Interactions ◽

High Confidence ◽

High Quality ◽

Enhancer Activity ◽

Regulatory Regions ◽

Regulatory Modules ◽

Experimental Approaches ◽

Genomic Regions

ABSTRACTTranscription factors (TFs) bind specifically to TF binding sites (TFBSs) at cis-regulatory regions to control transcription. Hence, it is critical to locate these TF-DNA interactions to understand transcriptional regulation. The availability of datasets generated by chromatin immunoprecipitation followed by sequencing (ChIP-seq) empowers our efforts to predict the specific locations of TFBSs with greater confidence than previously possible by fusing computational and experimental approaches. In this work, we processed ~10,000 public ChIP-seq datasets from nine species to provide high-quality TFBS predictions. After quality control, it culminated with the prediction of ~44 million TFBSs with experimental and computational evidence for direct TF-DNA interactions for 640 TFs in 1,101 cell lines and tissues. These TFBSs were used to predict >183,000 cis-regulatory modules representing clusters of binding events in the corresponding genomes. The high-quality of the TFBSs was reinforced by their evolutionary conservation, enrichment at active cis-regulatory regions, and capacity to predict combinatorial binding of TFs. Further, we confirmed that the cell type and tissue specificity of enhancer activity was correlated with the number of TFs with binding sites predicted in these regions. All the data is provided to the community through the UniBind database that can be accessed through its web-interface (https://unibind.uio.no/), a dedicated RESTful API, and as genomic tracks. Finally, we provide an enrichment tool, available as a web-service and an R package, for users to find TFs with enriched TFBSs in a set of provided genomic regions. UniBind is the first resource of its kind, providing the largest collection of high-confidence direct TF-DNA interactions in nine species.

Download Full-text

Genomic and bioinformatics tools to understand the biology of signal transducers and activators of transcription

Hormone Molecular Biology and Clinical Investigation ◽

10.1515/hmbci-2012-0012 ◽

2012 ◽

Vol 10 (1) ◽

Author(s):

Keunsoo Kang ◽

Lothar Hennighausen

Keyword(s):

Regulatory Networks ◽

Cell Types ◽

Regulatory Sequences ◽

Data Sets ◽

Signal Transducers ◽

Genome Wide ◽

Bona Fide ◽

Biochemical Signals ◽

Different Cell Types

AbstractThe signal transducer and activator of transcription (STAT) family is activated by cytokines and conveys biochemical signals to the genome through binding to specific regulatory sequences, called IFN-γ-activated sequence (GAS) motifs. As common GAS motifs (TTCnnnGAA) contain only six conserved nucleotides, the mammalian genome harbors hundreds of thousands of copies of this sequence. However, it is not possible to predict which specific GAS motifs bind to STATs and are of functional significance. Here, we apply several layers of statistical, bioinformatics and experimental analyses to narrow down the number of GAS sites that might be of biological relevance. In particular, we determined the number of bona fide GAS motifs by utilizing publically available genome-wide STAT5 ChIP-seq data sets. Less than 10% of GAS motifs within the mouse genome are recognized by STAT5 in vivo and only a small portion of them are shared across different cell types. However, even bona fide STAT5 binding did not predict that the respective gene was under cytokine-STAT control. Therefore, additional bioinformatics, genomic and epigenetic parameters, such as patterns of histone modifications, are required to more reliably predict the behavior of cytokine-STAT regulatory networks.

Download Full-text

Categorizing Biases in High-Confidence High-Throughput Protein-Protein Interaction Data Sets

Molecular & Cellular Proteomics ◽

10.1074/mcp.m111.012500 ◽

2011 ◽

Vol 10 (12) ◽

pp. M111.012500 ◽

Cited By ~ 19

Author(s):

Xueping Yu ◽

Joseph Ivanic ◽

Vesna Memišević ◽

Anders Wallqvist ◽

Jaques Reifman

Keyword(s):

High Throughput ◽

Protein Interaction ◽

Protein Interaction Data ◽

Data Sets ◽

Interaction Data ◽

High Confidence ◽

Protein Protein Interaction

Download Full-text

A ChIP-exo screen of 887 Protein Capture Reagents Program transcription factor antibodies in human cells

Genome Research ◽

10.1101/gr.275472.121 ◽

2021 ◽

Vol 31 (9) ◽

pp. 1663-1679

Author(s):

William K.M. Lai ◽

Luca Mariani ◽

Gerson Rothschild ◽

Edwin R. Smith ◽

Bryan J. Venters ◽

...

Keyword(s):

Super Resolution ◽

Cell Types ◽

Data Sets ◽

Protein Capture ◽

Protein Binding Microarray ◽

Cognate Antigen ◽

Specific Proteins ◽

Chromatin Proteins ◽

Genomic Regions ◽

Super Resolution Microscopy

Antibodies offer a powerful means to interrogate specific proteins in a complex milieu. However, antibody availability and reliability can be problematic, whereas epitope tagging can be impractical in many cases. To address these limitations, the Protein Capture Reagents Program (PCRP) generated over a thousand renewable monoclonal antibodies (mAbs) against human presumptive chromatin proteins. However, these reagents have not been widely field-tested. We therefore performed a screen to test their ability to enrich genomic regions via chromatin immunoprecipitation (ChIP) and a variety of orthogonal assays. Eight hundred eighty-seven unique antibodies against 681 unique human transcription factors (TFs) were assayed by ultra-high-resolution ChIP-exo/seq, generating approximately 1200 ChIP-exo data sets, primarily in a single pass in one cell type (K562). Subsets of PCRP mAbs were further tested in ChIP-seq, CUT&RUN, STORM super-resolution microscopy, immunoblots, and protein binding microarray (PBM) experiments. About 5% of the tested antibodies displayed high-confidence target (i.e., cognate antigen) enrichment across at least one assay and are strong candidates for additional validation. An additional 34% produced ChIP-exo data that were distinct from background and thus warrant further testing. The remaining 61% were not substantially different from background, and likely require consideration of a much broader survey of cell types and/or assay optimizations. We show and discuss the metrics and challenges to antibody validation in chromatin-based assays.

Download Full-text

Mapping Plant Interactomes Using Literature Curated and Predicted Protein–Protein Interaction Data Sets

The Plant Cell ◽

10.1105/tpc.109.072736 ◽

2010 ◽

Vol 22 (4) ◽

pp. 997-1005 ◽

Cited By ~ 17

Author(s):

KiYoung Lee ◽

David Thorneycroft ◽

Premanand Achuthan ◽

Henning Hermjakob ◽

Trey Ideker

Keyword(s):

Protein Interaction ◽

Protein Interaction Data ◽

Data Sets ◽

Interaction Data ◽

Protein Protein Interaction

Download Full-text

No evidence of paralogous loci or new bona fide microRNAs in telomere to telomere (T2T) genomic data

10.1101/2021.12.09.471935 ◽

2021 ◽

Author(s):

Arun H. Patil ◽

Marc K. Halushka ◽

Bastian K. Fromm

Keyword(s):

Human Genome ◽

Genome Project ◽

Segmental Duplications ◽

Base Pairs ◽

Repeat Elements ◽

Satellite Sequences ◽

Bona Fide ◽

Additional Base ◽

Genomic Regions ◽

Unmapped Reads

The telomere to telomere (T2T) genome project discovered and mapped ~240 million additional base pairs of primarily telomeric and centromeric reads. Much of this sequence was comprised of satellite sequences and large segmental duplications. We evaluated the extent to which human bona fide microRNAs (miRNAs) may be found in additional paralogous genomic loci or if previously undescribed microRNAs are present in these newly sequenced regions of the human genome. New genomic regions of the T2T project spanning ~240 million bp of sequence were obtained and evaluated by blastn for the human miRNAs contained in MirGeneDB2.0 (N=556) and miRBase (N = 1917) along with all species of MirGeneDB2.0 miRNAs (N=10,899). Additionally, bowtie was used to compare unmapped reads from >4,000 primary cell samples to the new T2T sequence. Based on sequence and structure, no bona fide miRNAs were identified. Ninety-seven miRNAs of questionable authenticity (frequently known repeat elements) were identified from the miRBase dataset across the newly described regions of the human genome. These 97 represent only 51 miRNA families due to paralogy of highly similar miRNAs such as 24 members of the hsa-mir-548 family. Altogether, this data strongly supports our having identified widely expressed bona fide miRNAs in the human genome and move us further toward the completion of human miRNA discovery.

Download Full-text

Transcription factor regulatory modules provide the molecular mechanisms for functional redundancy observed among transcription factors in yeast

BMC Bioinformatics ◽

10.1186/s12859-019-3212-8 ◽

2019 ◽

Vol 20 (S23) ◽

Author(s):

Tzu-Hsien Yang

Keyword(s):

Transcription Factor ◽

Protein Interaction ◽

Molecular Mechanisms ◽

Target Genes ◽

Regulatory Gene ◽

Biological Significance ◽

Functional Redundancy ◽

Protein Interaction Data ◽

Regulatory Modules ◽

Protein Protein Interaction

Abstract Background Current technologies for understanding the transcriptional reprogramming in cells include the transcription factor (TF) chromatin immunoprecipitation (ChIP) experiments and the TF knockout experiments. The ChIP experiments show the binding targets of TFs against which the antibody directs while the knockout techniques find the regulatory gene targets of the knocked-out TFs. However, it was shown that these two complementary results contain few common targets. Researchers have used the concept of TF functional redundancy to explain the low overlap between these two techniques. But the detailed molecular mechanisms behind TF functional redundancy remain unknown. Without knowing the possible molecular mechanisms, it is hard for biologists to fully unravel the cause of TF functional redundancy. Results To mine out the molecular mechanisms, a novel algorithm to extract TF regulatory modules that help explain the observed TF functional redundancy effect was devised and proposed in this research. The method first searched for candidate TF sets from the TF binding data. Then based on these candidate sets the method utilized the modified Steiner Tree construction algorithm to construct the possible TF regulatory modules from protein-protein interaction data and finally filtered out the noise-induced results by using confidence tests. The mined-out regulatory modules were shown to correlate to the concept of functional redundancy and provided testable hypotheses of the molecular mechanisms behind functional redundancy. And the biological significance of the mined-out results was demonstrated in three different biological aspects: ontology enrichment, protein interaction prevalence and expression coherence. About 23.5% of the mined-out TF regulatory modules were literature-verified. Finally, the biological applicability of the proposed method was shown in one detailed example of a verified TF regulatory module for pheromone response and filamentous growth in yeast. Conclusion In this research, a novel method that mined out the potential TF regulatory modules which elucidate the functional redundancy observed among TFs is proposed. The extracted TF regulatory modules not only correlate the molecular mechanisms to the observed functional redundancy among TFs, but also show biological significance in inferring TF functional binding target genes. The results provide testable hypotheses for biologists to further design subsequent research and experiments.

Download Full-text

Confidentiality of Statistical Records: A Threat-Monitoring Scheme for On Line Dialogue

Methods of Information in Medicine ◽

10.1055/s-0038-1635718 ◽

1976 ◽

Vol 15 (01) ◽

pp. 36-42 ◽

Cited By ~ 14

Author(s):

J. Schlörer

Keyword(s):

Statistical Data ◽

Cost Benefit ◽

Data Bank ◽

High Ratio ◽

Point Of View ◽

Data Sets ◽

Monitoring Scheme ◽

Access Controls ◽

On Line ◽

Bona Fide

From a statistical data bank containing only anonymous records, the records sometimes may be identified and then retrieved, as personal records, by on line dialogue. The risk mainly applies to statistical data sets representing populations, or samples with a high ratio n/N. On the other hand, access controls are unsatisfactory as a general means of protection for statistical data banks, which should be open to large user communities. A threat monitoring scheme is proposed, which will largely block the techniques for retrieval of complete records. If combined with additional measures (e.g., slight modifications of output), it may be expected to render, from a cost-benefit point of view, intrusion attempts by dialogue valueless, if not absolutely impossible. The bona fide user has to pay by some loss of information, but considerable flexibility in evaluation is retained. The proposal of controlled classification included in the scheme may also be useful for off line dialogue systems.

Download Full-text