scholarly journals marge: An API for Analysis of Motifs Using HOMER in R

2018 ◽  
Author(s):  
Robert A. Amezquita

Profiling of open chromatin regions through the assay for transposase-accessible chromatin (ATAC) and transcription factor binding via chromatin immunoprecipitation (ChIP) sequencing has increased our ability to resolve patterns of putative transcription factor binding sites at the genome-wide level. Popular tools such as [HOMER (http://homer.ucsd.edu/homer/) and [MEME] (http://meme-suite.org/) have driven forward the analyses of sequence composition, deriving de novo motifs and searching for the enrichment of known motifs. However, their interfaces do not allow for the construction of parallel inquiries of multiple datasets. Furthermore, their results do not conform to formats amenable to ‘tidy’ analyses, presenting a significant bottleneck in motif analysis. Here, I present ‘marge’, a companion ‘R’ package that interfaces with HOMER to facilitate the construction of queries and to tidy results for further downstream analyses.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tejaswi Iyyanki ◽  
Baozhen Zhang ◽  
Qixuan Wang ◽  
Ye Hou ◽  
Qiushi Jin ◽  
...  

Abstract Muscle-invasive bladder cancers are characterized by their distinct expression of luminal and basal genes, which could be used to predict key clinical features such as disease progression and overall survival. Transcriptionally, FOXA1, GATA3, and PPARG are shown to be essential for luminal subtype-specific gene regulation and subtype switching, while TP63, STAT3, and TFAP2 family members are critical for regulation of basal subtype-specific genes. Despite these advances, the underlying epigenetic mechanisms and 3D chromatin architecture responsible for subtype-specific regulation in bladder cancer remain unknown. Result We determine the genome-wide transcriptome, enhancer landscape, and transcription factor binding profiles of FOXA1 and GATA3 in luminal and basal subtypes of bladder cancer. Furthermore, we report the first-ever mapping of genome-wide chromatin interactions by Hi-C in both bladder cancer cell lines and primary patient tumors. We show that subtype-specific transcription is accompanied by specific open chromatin and epigenomic marks, at least partially driven by distinct transcription factor binding at distal enhancers of luminal and basal bladder cancers. Finally, we identify a novel clinically relevant transcription factor, Neuronal PAS Domain Protein 2 (NPAS2), in luminal bladder cancers that regulates other subtype-specific genes and influences cancer cell proliferation and migration. Conclusion In summary, our work identifies unique epigenomic signatures and 3D genome structures in luminal and basal urinary bladder cancers and suggests a novel link between the circadian transcription factor NPAS2 and a clinical bladder cancer subtype.


2018 ◽  
Author(s):  
Mehran Karimzadeh ◽  
Michael M. Hoffman

AbstractMotivationIdentifying transcription factor binding sites is the first step in pinpointing non-coding mutations that disrupt the regulatory function of transcription factors and promote disease. ChIP-seq is the most common method for identifying binding sites, but performing it on patient samples is hampered by the amount of available biological material and the cost of the experiment. Existing methods for computational prediction of regulatory elements primarily predict binding in genomic regions with sequence similarity to known transcription factor sequence preferences. This has limited efficacy since most binding sites do not resemble known transcription factor sequence motifs, and many transcription factors are not even sequence-specific.ResultsWe developed Virtual ChIP-seq, which predicts binding of individual transcription factors in new cell types using an artificial neural network that integrates ChIP-seq results from other cell types and chromatin accessibility data in the new cell type. Virtual ChIP-seq also uses learned associations between gene expression and transcription factor binding at specific genomic regions. This approach outperforms methods that predict TF binding solely based on sequence preference, pre-dicting binding for 36 transcription factors (Matthews correlation coefficient > 0.3).AvailabilityThe datasets we used for training and validation are available at https://virchip.hoffmanlab.org. We have deposited in Zenodo the current version of our software (http://doi.org/10.5281/zenodo.1066928), datasets (http://doi.org/10.5281/zenodo.823297), predictions for 36 transcription factors on Roadmap Epigenomics cell types (http://doi.org/10.5281/zenodo.1455759), and predictions in Cistrome as well as ENCODE-DREAM in vivo TF Binding Site Prediction Challenge (http://doi.org/10.5281/zenodo.1209308).


2019 ◽  
Author(s):  
Arif Harmanci ◽  
Akdes Serin Harmanci ◽  
Jyothishmathi Swaminathan ◽  
Vidya Gopalakrishnan

Abstract Motivation Functional genomics experiments generate genomewide signal profiles that are dense information sources for annotating the regulatory elements. These profiles measure epigenetic activity at the nucleotide resolution and they exhibit distinctive patterns as they fluctuate along the genome. Most notable of these patterns are the valley patterns that are prevalently observed in assays such as ChIP Sequencing and bisulfite sequencing. The genomic positions of valleys pinpoint locations of cis-regulatory elements such as enhancers and insulators. Systematic identification of the valleys provides novel information for delineating the annotation of regulatory elements. Nevertheless, the valleys are not reported by majority of the analysis pipelines. Results We describe EpiSAFARI, a computational method for sensitive detection of valleys from diverse types of epigenetic profiles. EpiSAFARI employs a novel smoothing method for decreasing noise in signal profiles and accounts for technical factors such as sparse signals, mappability, and nucleotide content. In performance comparisons, EpiSAFARI performs favorably in terms of accuracy. The histone modification valleys detected by EpiSAFARI exhibit high conservation, transcription factor binding, and they are enriched in nascent transcription. In addition, the large clusters of histone valleys are found to be enriched at the promoters of the developmentally associated genes. Differential histone valleys exhibit concordance with differential DNase signal at cell line specific valleys. DNA methylation valleys exhibit elevated conservation and high transcription factor binding. Specifically, we observed enriched binding of transcription factors associated with chromatin structure around methyl-valleys. Availability EpiSAFARI is publicly available at https://github.com/harmancilab/EpiSAFARI Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Sirajul Salekin ◽  
Jianqiu (Michelle) Zhang ◽  
Yufei Huang

AbstractMotivationTranscription factor (TF) binds to the promoter region of a gene to control gene expression. Identifying precise transcription factor binding sites (TFBS) is essential for understanding the detailed mechanisms of TF mediated gene regulation. However, there is a shortage of computational approach that can deliver single base pair (bp) resolution prediction of TFBS.ResultsIn this paper, we propose DeepSNR, a Deep Learning algorithm for predicting transcription factor binding location at Single Nucleotide Resolution de novo from DNA sequence. DeepSNR adopts a novel deconvolutional network (deconvNet) model and is inspired by the similarity to image segmentation by deconvNet. The proposed deconvNet architecture is constructed on top of ‘Deep-Bind’ and we trained the entire model using TF specific data from ChIP-exonuclease (ChIP-exo) experiments. DeepSNR has been shown to outperform motif search based methods for several evaluation metrics. We have also demonstrated the usefulness of DeepSNR in the regulatory analysis of TFBS as well as in improving the TFBS prediction specificity using ChIP-seq data.AvailabilityDeepSNR is available open source in the GitHub repository (https://github.com/sirajulsalekin/DeepSNR)[email protected]


2020 ◽  
Author(s):  
Jan Grau ◽  
Florian Schmidt ◽  
Marcel H. Schulz

AbstractSeveral studies suggested that transcription factor (TF) binding to DNA may be impaired or enhanced by DNA methylation. We present MeDeMo, a toolbox for TF motif analysis that combines information about DNA methylation with models capturing intra-motif dependencies. In a large-scale study using ChIP-seq data for 335 TFs, we identify novel TFs that are affected by DNA methylation. Overall, we find that CpG methylation decreases the likelihood of binding for the majority of TFs. For a considerable subset of TFs, we show that intra-motif dependencies are pivotal for accurately modelling the impact of DNA methylation on TF binding.


2021 ◽  
Vol 25 (1) ◽  
pp. 7-17
Author(s):  
A. V. Tsukanov ◽  
V. G. Levitsky ◽  
T. I. Merkulova

The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe, can do as much. However, application of these models was usually limited only to comparing their recognition accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their classif ication based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a signif icant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was 26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe, respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity. We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq datasets under study.


2010 ◽  
Vol 38 (11) ◽  
pp. e126-e126 ◽  
Author(s):  
Valentina Boeva ◽  
Didier Surdez ◽  
Noëlle Guillon ◽  
Franck Tirode ◽  
Anthony P. Fejes ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document