FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data

Mapping Intimacies ◽

10.1101/151274 ◽

2017 ◽

Cited By ~ 17

Author(s):

Daniel Quang ◽

Xiaohui Xie

Keyword(s):

Neural Network ◽

Transcription Factor ◽

Deep Learning ◽

Cell Types ◽

Transcription Factor Binding ◽

Cell Type ◽

Neural Network Models ◽

Factor Binding ◽

Binding Data ◽

Nucleotide Resolution

AbstractDue to the large numbers of transcription factors (TFs) and cell types, querying binding profiles of all TF/cell type pairs is not experimentally feasible, owing to constraints in time and resources. To address this issue, we developed a convolutional-recurrent neural network model, called FactorNet, to computationally impute the missing binding data. FactorNet trains on binding data from reference cell types to make accurate predictions on testing cell types by leveraging a variety of features, including genomic sequences, genome annotations, gene expression, and single-nucleotide resolution sequential signals, such as DNase I cleavage. To the best of our knowledge, this is the first deep learning method to study the rules governing TF binding at such a fine resolution. With FactorNet, a researcher can perform a single sequencing assay, such as DNase-seq, on a cell type and computationally impute dozens of TF binding profiles. This is an integral step for reconstructing the complex networks underlying gene regulation. While neural networks can be computationally expensive to train, we introduce several novel strategies to significantly reduce the overhead. By visualizing the neural network models, we can interpret how the model predicts binding which in turn reveals additional insights into regulatory grammar. We also investigate the variables that affect cross-cell type predictive performance to explain why the model performs better on some TF/cell types than others, and offer insights to improve upon this field. Our method ranked among the top four teams in the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge.

Download Full-text

FactorNet: A deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data

Methods ◽

10.1016/j.ymeth.2019.03.020 ◽

2019 ◽

Vol 166 ◽

pp. 40-47 ◽

Cited By ~ 32

Author(s):

Daniel Quang ◽

Xiaohui Xie

Keyword(s):

Transcription Factor ◽

Deep Learning ◽

Transcription Factor Binding ◽

Sequential Data ◽

Cell Type ◽

Specific Transcription Factor ◽

Factor Binding ◽

Learning Framework ◽

Cell Type Specific ◽

Nucleotide Resolution

Download Full-text

A deep learning model for predicting transcription factor binding location at single nucleotide resolution

2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) ◽

10.1109/bhi.2017.7897204 ◽

2017 ◽

Cited By ~ 5

Author(s):

Sirajul Salekin ◽

Jianqiu Michelle Zhang ◽

Yufei Huang

Keyword(s):

Transcription Factor ◽

Deep Learning ◽

Transcription Factor Binding ◽

Learning Model ◽

Single Nucleotide ◽

Factor Binding ◽

Nucleotide Resolution ◽

Deep Learning Model ◽

Single Nucleotide Resolution

Download Full-text

Learning from mistakes: Accurate prediction of cell type-specific transcription factor binding

10.1101/230011 ◽

2017 ◽

Cited By ~ 3

Author(s):

Jens Keilwagen ◽

Stefan Posch ◽

Jan Grau

Keyword(s):

Transcription Factor ◽

Cell Types ◽

Transcription Factor Binding ◽

Ensemble Prediction ◽

Training Procedure ◽

Cell Type ◽

Binding Motifs ◽

Factor Binding ◽

Cell Type Specific

Computational prediction of cell type-specific, in-vivo transcription factor binding sites is still one of the central challenges in regulatory genomics, and a variety of approaches has been proposed for this purpose.Here, we present our approach that earned a shared first rank in the “ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge” in 2017. This approach employs features derived from chromatin accessibility, binding motifs, gene expression, genomic sequence and annotation to train classifiers using a supervised, discriminative learning principle. Two further key aspects of this approach are learning classifier parameters in an iterative training procedure that successively adds additional negative examples to the training set, and creating an ensemble prediction by averaging over classifiers obtained for different training cell types.In post-challenge analyses, we benchmark the influence of different feature sets and find that chromatin accessiblity and binding motifs are sufficient to yield state-of-the-art performance for in-vivo binding site predictions. We also show that the iterative training procedure and the ensemble prediction are pivotal for the final prediction performance.To make predictions of this approach readily accessible, we predict 682 peak lists for a total of 31 transcription factors in 22 primary cell types and tissues, which are available for download at https://www.synapse.org/#!Synapse:syn11526239, and we demonstrate that these may help to yield biological conclusions. Finally, we provide a user-friendly version of our approach as open source software at http://jstacs.de/index.php/[email protected]

Download Full-text

Fast decoding cell type–specific transcription factor binding landscape at single-nucleotide resolution

Genome Research ◽

10.1101/gr.269613.120 ◽

2021 ◽

Author(s):

Hongyang Li ◽

Yuanfang Guan

Keyword(s):

Transcription Factor ◽

Transcription Factor Binding ◽

Cell Type ◽

Single Nucleotide ◽

Specific Transcription Factor ◽

Factor Binding ◽

Fast Decoding ◽

Cell Type Specific ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution

Download Full-text

An interpretable bimodal neural network characterizes the sequence and preexisting chromatin predictors of induced transcription factor binding

Genome Biology ◽

10.1186/s13059-020-02218-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Divyanshi Srivastava ◽

Begüm Aydin ◽

Esteban O. Mazzoni ◽

Shaun Mahony

Keyword(s):

Neural Network ◽

Transcription Factor ◽

Transcription Factors ◽

Dna Binding ◽

Dna Sequences ◽

Binding Specificity ◽

Transcription Factor Binding ◽

Cell Type ◽

Factor Binding ◽

Genome Wide

Abstract Background Transcription factor (TF) binding specificity is determined via a complex interplay between the transcription factor’s DNA binding preference and cell type-specific chromatin environments. The chromatin features that correlate with transcription factor binding in a given cell type have been well characterized. For instance, the binding sites for a majority of transcription factors display concurrent chromatin accessibility. However, concurrent chromatin features reflect the binding activities of the transcription factor itself and thus provide limited insight into how genome-wide TF-DNA binding patterns became established in the first place. To understand the determinants of transcription factor binding specificity, we therefore need to examine how newly activated transcription factors interact with sequence and preexisting chromatin landscapes. Results Here, we investigate the sequence and preexisting chromatin predictors of TF-DNA binding by examining the genome-wide occupancy of transcription factors that have been induced in well-characterized chromatin environments. We develop Bichrom, a bimodal neural network that jointly models sequence and preexisting chromatin data to interpret the genome-wide binding patterns of induced transcription factors. We find that the preexisting chromatin landscape is a differential global predictor of TF-DNA binding; incorporating preexisting chromatin features improves our ability to explain the binding specificity of some transcription factors substantially, but not others. Furthermore, by analyzing site-level predictors, we show that transcription factor binding in previously inaccessible chromatin tends to correspond to the presence of more favorable cognate DNA sequences. Conclusions Bichrom thus provides a framework for modeling, interpreting, and visualizing the joint sequence and chromatin landscapes that determine TF-DNA binding dynamics.

Download Full-text

CAE-CNN: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115404 ◽

2021 ◽

pp. 115404

Author(s):

Yongqing Zhang ◽

Shaojie Qiao ◽

Yuanqi Zeng ◽

Dongrui Gao ◽

Nan Han ◽

...

Keyword(s):

Neural Network ◽

Transcription Factor ◽

Convolutional Neural Network ◽

Binding Site ◽

Transcription Factor Binding Site ◽

Transcription Factor Binding ◽

Factor Binding Site ◽

Factor Binding ◽

Convolutional Autoencoder

Download Full-text

Virtual ChIP-seq: predicting transcription factor binding by learning from the transcriptome

10.1101/168419 ◽

2018 ◽

Cited By ~ 10

Author(s):

Mehran Karimzadeh ◽

Michael M. Hoffman

Keyword(s):

Transcription Factor ◽

Transcription Factors ◽

Binding Sites ◽

Cell Types ◽

Transcription Factor Binding ◽

Regulatory Function ◽

Factor Binding ◽

Link Type ◽

Genomic Regions ◽

Factor Sequence

AbstractMotivationIdentifying transcription factor binding sites is the first step in pinpointing non-coding mutations that disrupt the regulatory function of transcription factors and promote disease. ChIP-seq is the most common method for identifying binding sites, but performing it on patient samples is hampered by the amount of available biological material and the cost of the experiment. Existing methods for computational prediction of regulatory elements primarily predict binding in genomic regions with sequence similarity to known transcription factor sequence preferences. This has limited efficacy since most binding sites do not resemble known transcription factor sequence motifs, and many transcription factors are not even sequence-specific.ResultsWe developed Virtual ChIP-seq, which predicts binding of individual transcription factors in new cell types using an artificial neural network that integrates ChIP-seq results from other cell types and chromatin accessibility data in the new cell type. Virtual ChIP-seq also uses learned associations between gene expression and transcription factor binding at specific genomic regions. This approach outperforms methods that predict TF binding solely based on sequence preference, pre-dicting binding for 36 transcription factors (Matthews correlation coefficient > 0.3).AvailabilityThe datasets we used for training and validation are available at https://virchip.hoffmanlab.org. We have deposited in Zenodo the current version of our software (http://doi.org/10.5281/zenodo.1066928), datasets (http://doi.org/10.5281/zenodo.823297), predictions for 36 transcription factors on Roadmap Epigenomics cell types (http://doi.org/10.5281/zenodo.1455759), and predictions in Cistrome as well as ENCODE-DREAM in vivo TF Binding Site Prediction Challenge (http://doi.org/10.5281/zenodo.1209308).

Download Full-text

SeqEnhDL: sequence-based classification of cell type-specific enhancers using deep learning models

10.1101/2020.05.13.093997 ◽

2020 ◽

Author(s):

Yupeng Wang ◽

Rosario B. Jaime-Lara ◽

Abhrarup Roy ◽

Ying Sun ◽

Xinyue Liu ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Dna Sequences ◽

Cell Types ◽

Learning Models ◽

Cell Type ◽

Coding Sequences ◽

Sequence Features ◽

Cell Type Specific ◽

Different Cell Types

AbstractWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.

Download Full-text

DNA methylation alone does not cause most cell-type selective transcription factor binding

Epigenetics & Chromatin ◽

10.1186/1756-8935-6-s1-p103 ◽

2013 ◽

Vol 6 (S1) ◽

Cited By ~ 3

Author(s):

Matthew T Maurano ◽

Hao Wang ◽

Anthony Shafer ◽

Sam John ◽

John A Stamatoyannopoulos

Keyword(s):

Dna Methylation ◽

Transcription Factor ◽

Transcription Factor Binding ◽

Cell Type ◽

Factor Binding

Download Full-text

An integrative framework for combining sequence and epigenomic data to predict transcription factor binding sites using deep learning

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2019.2901789 ◽

2019 ◽

pp. 1-1 ◽

Cited By ~ 7

Author(s):

Fang Jing ◽

Shaowu Zhang ◽

Zhen Cao ◽

Shihua Zhang

Keyword(s):

Transcription Factor ◽

Deep Learning ◽

Binding Sites ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Factor Binding ◽

Integrative Framework

Download Full-text