Base-pair resolution detection of transcription factor binding site by deep deconvolutional network

Mapping Intimacies ◽

10.1101/254508 ◽

2018 ◽

Author(s):

Sirajul Salekin ◽

Jianqiu (Michelle) Zhang ◽

Yufei Huang

Keyword(s):

Transcription Factor ◽

Base Pair ◽

De Novo ◽

Learning Algorithm ◽

Transcription Factor Binding ◽

Factor Binding ◽

Deep Learning Algorithm ◽

Single Base Pair ◽

Regulatory Analysis ◽

Nucleotide Resolution

AbstractMotivationTranscription factor (TF) binds to the promoter region of a gene to control gene expression. Identifying precise transcription factor binding sites (TFBS) is essential for understanding the detailed mechanisms of TF mediated gene regulation. However, there is a shortage of computational approach that can deliver single base pair (bp) resolution prediction of TFBS.ResultsIn this paper, we propose DeepSNR, a Deep Learning algorithm for predicting transcription factor binding location at Single Nucleotide Resolution de novo from DNA sequence. DeepSNR adopts a novel deconvolutional network (deconvNet) model and is inspired by the similarity to image segmentation by deconvNet. The proposed deconvNet architecture is constructed on top of ‘Deep-Bind’ and we trained the entire model using TF specific data from ChIP-exonuclease (ChIP-exo) experiments. DeepSNR has been shown to outperform motif search based methods for several evaluation metrics. We have also demonstrated the usefulness of DeepSNR in the regulatory analysis of TFBS as well as in improving the TFBS prediction specificity using ChIP-seq data.AvailabilityDeepSNR is available open source in the GitHub repository (https://github.com/sirajulsalekin/DeepSNR)[email protected]

Download Full-text

EpiSAFARI: Sensitive detection of valleys in epigenetic signals for enhancing annotations of functional elements

Bioinformatics ◽

10.1093/bioinformatics/btz702 ◽

2019 ◽

Author(s):

Arif Harmanci ◽

Akdes Serin Harmanci ◽

Jyothishmathi Swaminathan ◽

Vidya Gopalakrishnan

Keyword(s):

Transcription Factor ◽

Regulatory Elements ◽

Transcription Factor Binding ◽

Computational Method ◽

Sensitive Detection ◽

Supplementary Information ◽

Chip Sequencing ◽

Factor Binding ◽

Nucleotide Resolution ◽

Systematic Identification

Abstract Motivation Functional genomics experiments generate genomewide signal profiles that are dense information sources for annotating the regulatory elements. These profiles measure epigenetic activity at the nucleotide resolution and they exhibit distinctive patterns as they fluctuate along the genome. Most notable of these patterns are the valley patterns that are prevalently observed in assays such as ChIP Sequencing and bisulfite sequencing. The genomic positions of valleys pinpoint locations of cis-regulatory elements such as enhancers and insulators. Systematic identification of the valleys provides novel information for delineating the annotation of regulatory elements. Nevertheless, the valleys are not reported by majority of the analysis pipelines. Results We describe EpiSAFARI, a computational method for sensitive detection of valleys from diverse types of epigenetic profiles. EpiSAFARI employs a novel smoothing method for decreasing noise in signal profiles and accounts for technical factors such as sparse signals, mappability, and nucleotide content. In performance comparisons, EpiSAFARI performs favorably in terms of accuracy. The histone modification valleys detected by EpiSAFARI exhibit high conservation, transcription factor binding, and they are enriched in nascent transcription. In addition, the large clusters of histone valleys are found to be enriched at the promoters of the developmentally associated genes. Differential histone valleys exhibit concordance with differential DNase signal at cell line specific valleys. DNA methylation valleys exhibit elevated conservation and high transcription factor binding. Specifically, we observed enriched binding of transcription factors associated with chromatin structure around methyl-valleys. Availability EpiSAFARI is publicly available at https://github.com/harmancilab/EpiSAFARI Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Correction: De-Novo Discovery of Differentially Abundant Transcription Factor Binding Sites Including Their Positional Preference

PLoS Computational Biology ◽

10.1371/annotation/a0b541dc-472b-4076-a435-499ce9519335 ◽

2011 ◽

Vol 7 (10) ◽

Cited By ~ 4

Author(s):

Jens Keilwagen ◽

Jan Grau ◽

Ivan A. Paponov ◽

Stefan Posch ◽

Marc Strickert ◽

...

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

De Novo ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Factor Binding ◽

Positional Preference

Download Full-text

Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites

Vavilov Journal of Genetics and Breeding ◽

10.18699/vj21.002 ◽

2021 ◽

Vol 25 (1) ◽

pp. 7-17

Author(s):

A. V. Tsukanov ◽

V. G. Levitsky ◽

T. I. Merkulova

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

De Novo ◽

Transcription Factor Binding Sites ◽

Structural Heterogeneity ◽

Transcription Factor Binding ◽

Factor Binding ◽

Model Training ◽

Motif Recognition ◽

Positional Weight

The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe, can do as much. However, application of these models was usually limited only to comparing their recognition accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their classif ication based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a signif icant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was 26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe, respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity. We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq datasets under study.

Download Full-text

De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis

Nucleic Acids Research ◽

10.1093/nar/gkq217 ◽

2010 ◽

Vol 38 (11) ◽

pp. e126-e126 ◽

Cited By ~ 44

Author(s):

Valentina Boeva ◽

Didier Surdez ◽

Noëlle Guillon ◽

Franck Tirode ◽

Anthony P. Fejes ◽

...

Keyword(s):

Transcription Factor ◽

Data Analysis ◽

Binding Sites ◽

De Novo ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Factor Binding ◽

Motif Identification

Download Full-text

412. De Novo Identification of Combinations of Hepatocyte-Specific Transcription Factor Binding Sites Using a Novel Bioinformatics Algorithm Yield Robust Liver-Specific Expression

Molecular Therapy ◽

10.1016/s1525-0016(16)38770-6 ◽

2009 ◽

Vol 17 ◽

pp. S161

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

De Novo ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Specific Expression ◽

Specific Transcription Factor ◽

Factor Binding

Download Full-text

Base-pair resolution detection of transcription factor binding site by deep deconvolutional network

Bioinformatics ◽

10.1093/bioinformatics/bty383 ◽

2018 ◽

Vol 34 (20) ◽

pp. 3446-3453 ◽

Cited By ~ 5

Author(s):

Sirajul Salekin ◽

Jianqiu Michelle Zhang ◽

Yufei Huang

Keyword(s):

Transcription Factor ◽

Binding Site ◽

Transcription Factor Binding Site ◽

Base Pair ◽

Transcription Factor Binding ◽

Factor Binding Site ◽

Factor Binding

Download Full-text

A deep learning model for predicting transcription factor binding location at single nucleotide resolution

2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) ◽

10.1109/bhi.2017.7897204 ◽

2017 ◽

Cited By ~ 5

Author(s):

Sirajul Salekin ◽

Jianqiu Michelle Zhang ◽

Yufei Huang

Keyword(s):

Transcription Factor ◽

Deep Learning ◽

Transcription Factor Binding ◽

Learning Model ◽

Single Nucleotide ◽

Factor Binding ◽

Nucleotide Resolution ◽

Deep Learning Model ◽

Single Nucleotide Resolution

Download Full-text

De-Novo Discovery of Differentially Abundant Transcription Factor Binding Sites Including Their Positional Preference

PLoS Computational Biology ◽

10.1371/journal.pcbi.1001070 ◽

2011 ◽

Vol 7 (2) ◽

pp. e1001070 ◽

Cited By ~ 34

Author(s):

Jens Keilwagen ◽

Jan Grau ◽

Ivan A. Paponov ◽

Stefan Posch ◽

Marc Strickert ◽

...

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

De Novo ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Factor Binding ◽

Positional Preference

Download Full-text

FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data

10.1101/151274 ◽

2017 ◽

Cited By ~ 17

Author(s):

Daniel Quang ◽

Xiaohui Xie

Keyword(s):

Neural Network ◽

Transcription Factor ◽

Deep Learning ◽

Cell Types ◽

Transcription Factor Binding ◽

Cell Type ◽

Neural Network Models ◽

Factor Binding ◽

Binding Data ◽

Nucleotide Resolution

AbstractDue to the large numbers of transcription factors (TFs) and cell types, querying binding profiles of all TF/cell type pairs is not experimentally feasible, owing to constraints in time and resources. To address this issue, we developed a convolutional-recurrent neural network model, called FactorNet, to computationally impute the missing binding data. FactorNet trains on binding data from reference cell types to make accurate predictions on testing cell types by leveraging a variety of features, including genomic sequences, genome annotations, gene expression, and single-nucleotide resolution sequential signals, such as DNase I cleavage. To the best of our knowledge, this is the first deep learning method to study the rules governing TF binding at such a fine resolution. With FactorNet, a researcher can perform a single sequencing assay, such as DNase-seq, on a cell type and computationally impute dozens of TF binding profiles. This is an integral step for reconstructing the complex networks underlying gene regulation. While neural networks can be computationally expensive to train, we introduce several novel strategies to significantly reduce the overhead. By visualizing the neural network models, we can interpret how the model predicts binding which in turn reveals additional insights into regulatory grammar. We also investigate the variables that affect cross-cell type predictive performance to explain why the model performs better on some TF/cell types than others, and offer insights to improve upon this field. Our method ranked among the top four teams in the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge.

Download Full-text