scholarly journals StereoGene: Rapid Estimation of Genomewide Correlation of Continuous or Interval Feature Data

2016 ◽  
Author(s):  
Elena D. Stavrovskaya ◽  
Tejasvi Niranjan ◽  
Elana J. Fertig ◽  
Sarah J. Wheelan ◽  
Alexander Favorov ◽  
...  

AbstractMotivationGenomics features with similar genomewide distributions are generally hypothesized to be functionally related, for example, co-localization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genomewide correlation among genomic features are required.ResultsHere, we propose a method, StereoGene, that rapidly estimates genomewide correlation among pairs of genomic features. These features may represent high throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology, and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics.AvailabilityThe StereoGene C++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/[email protected] informationSupplementary data are available online.

2020 ◽  
Vol 36 (11) ◽  
pp. 3605-3606
Author(s):  
Pumin Li ◽  
Qi Xu ◽  
Xu Hua ◽  
Zhongwei Xie ◽  
Jie Li ◽  
...  

Abstract Summary The R/Bioconductor package primirTSS is a fast and convenient tool that allows implementation of the analytical method to identify transcription start sites of microRNAs by integrating ChIP-seq data of H3K4me3 and Pol II. It further ensures the precision by employing the conservation score and sequence features. The tool showed a good performance when using H3K4me3 or Pol II Chip-seq data alone as input, which brings convenience to applications where multiple datasets are hard to acquire. This flexible package is provided with both R-programming interfaces as well as graphical web interfaces. Availability and implementation primirTSS is available at: http://bioconductor.org/packages/primirTSS. The documentation of the package including an accompanying tutorial was deposited at: https://bioconductor.org/packages/release/bioc/vignettes/primirTSS/inst/doc/primirTSS.html. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Francisco Avila Cobos ◽  
Jasper Anckaert ◽  
Pieter-Jan Volders ◽  
Dries Rombaut ◽  
Jo Vandesompele ◽  
...  

AbstractSummaryReconstructing transcript models from RNA-sequencing (RNA-seq) data and establishing these as independent transcriptional units can be a challenging task. The Zipper plot is an application that enables users to interrogate putative transcription start sites (TSSs) in relation to various features that are indicative for transcriptional activity. These features are obtained from publicly available datasets including CAGE-sequencing (CAGE-seq), ChIP-sequencing (ChIP-seq) for histone marks and DNasesequencing (DNase-seq). The Zipper plot application requires three input fields (chromosome, genomic coordinate (hg19) of the TSS and strand) and generates a report that includes a detailed summary table, a Zipper plot and several statistics derived from this plot.Availability and ImplementationThe Zipper plot is implemented using the statistical programming language R and is freely available at http://[email protected]; [email protected]; [email protected] informationSupplementary Methods available online.


Author(s):  
Mazdak Salavati ◽  
Alex Caulton ◽  
Richard Clark ◽  
Iveta Gazova ◽  
Timothy P. L. Smith ◽  
...  

AbstractThe overall aim of the Ovine FAANG project is to provide a comprehensive annotation of the new highly contiguous sheep reference genome sequence (Oar rambouillet v1.0). Mapping of transcription start sites (TSS) is a key first step in understanding transcript regulation and diversity. Using 56 tissue samples collected from the reference ewe Benz2616 we have performed a global analysis of TSS and TSS- Enhancer clusters using Cap Analysis Gene Expression (CAGE) sequencing. CAGE measures RNA expression by 5’ cap-trapping and has been specifically designed to allow the characterization of TSS within promoters to single-nucleotide resolution. We have adapted an analysis pipeline that uses TagDust2 for clean-up and trimming, Bowtie2 for mapping, CAGEfightR for clustering and the Integrative Genomics Viewer (IGV) for visualization. Mapping of CAGE tags indicated that the expression levels of CAGE tag clusters varied across tissues. Expression profiles across tissues were validated using corresponding polyA+ mRNA-Seq data from the same samples. After removal of CAGE tags with < 10 read counts, 39.3% of TSS overlapped with 5’ ends of 31,113 transcripts that had been previously annotated by NCBI (out of a total of 56,308 from the NCBI annotation). For 25,195 of the transcripts, previously annotated by NCBI, no TSS meeting stringent criteria were identified. A further 14.7% of TSS mapped to within 50bp of annotated promoter regions. Intersecting these predicted TSS regions with annotated promoter regions (±50bp) revealed 46% of the predicted TSS were ‘novel’ and previously un-annotated. Using whole genome bisulphite sequencing data from the same tissues we were able to determine that a proportion of these ‘novel’ TSS were hypo-methylated (32.2%) indicating that they are likely to be reproducible rather than ‘noise’. This global analysis of TSS in sheep will significantly enhance the annotation of gene models in the new ovine reference assembly. Our analyses provide one of the highest resolution annotations of transcript regulation and diversity in a livestock species to date.


2020 ◽  
Author(s):  
Pei-Shang Wu ◽  
Donald P. Cameron ◽  
Jan Grosser ◽  
Laura Baranello ◽  
Lena Ström

AbstractThe SMC complex cohesin mediates sister chromatid cohesion established during replication, and damage-induced cohesion formed in response to DSBs post replication. The translesion synthesis polymerase Polη is required for damage-induced cohesion through a hitherto unknown mechanism. Since Polη is functionally associated with transcription, and transcription triggers de novo cohesion in S. pombe, we hypothesized that active transcription facilitates damage-induced cohesion in S. cerevisiae. Here, we found that expression of genes involved in chromatin assembly and positive transcription regulation were relatively enriched in WT compared to Polη-deficient cells (rad30Δ). The rad30Δ mutant showed a dysregulated transcriptional response and increased cohesin binding around transcription start sites. Perturbing histone exchange at promoters adversely affected damage-induced cohesion, similarly to deletion of RAD30. Conversely, altering chromatin accessibility or regulation of transcription elongation, suppressed the lack of damage-induced cohesion in rad30Δ cells. These results indicate that Polη promotes damage-induced cohesion through its role in transcription, and support the model that regulated transcription facilitates formation of damage-induced cohesion.


Biomolecules ◽  
2020 ◽  
Vol 10 (6) ◽  
pp. 827
Author(s):  
Gabriel Le Berre ◽  
Virginie Hossard ◽  
Jean-Francois Riou ◽  
Anne-Laure Guieysse-Peugeot

Alternative promoter usage involved in the regulation of transcription, splicing, and translation contributes to proteome diversity and is involved in a large number of diseases, in particular, cancer. Epigenetic mechanisms and cis regulatory elements are involved in alternative promoter activity. Multiple transcript isoforms can be produced from a gene, due to the initiation of transcription at different transcription start sites (TSS). These transcripts may not have regions that allow discrimination during RT-qPCR, making quantification technically challenging. This study presents a general method for the relative quantification of a transcript synthesized from a particular TSS that we called AP-TSS (analysis of particular TSS). AP-TSS is based on the specific elongation of the cDNA of interest, followed by its quantification by qPCR. As proof of principle, AP-TSS was applied to two non-coding RNA: telomeric repeat-containing RNAs (TERRA) from a particular subtelomeric TSS, and Alu transcripts. The treatment of cells with a DNA methylation inhibitor was associated with a global increase of the total TERRA level, but the TERRA expression from the TSS of interest did not change in HT1080 cells, and only modestly increased in HeLa cells. This result suggests that TERRA upregulation induced by global demethylation of the genome is mainly due to activation from sites other than this particular TSS. For Alu RNA, the signal obtained by AP-TSS is specific for the RNA Polymerase III-dependent Alu transcript. In summary, our method provides a tool to study regulation of gene expression from a given transcription start site, in different conditions that could be applied to many genes. In particular, AP-TSS can be used to investigate the epigenetic regulation of alternative TSS usage that is of importance for the development of epigenetic-targeted therapies.


2020 ◽  
Vol 11 ◽  
Author(s):  
Mazdak Salavati ◽  
Alex Caulton ◽  
Richard Clark ◽  
Iveta Gazova ◽  
Timothy P. L. Smith ◽  
...  

2016 ◽  
Author(s):  
Qin Zhu ◽  
Stephen A Fisher ◽  
Jamie Shallcross ◽  
Junhyong Kim

AbstractMotivationRNA-Seq is a powerful technology that delivers digital gene expression data. To measure expression strength at the gene level, one popular approach is direct read counting after aligning the reads to a reference genome/transcriptome. HTSeq is one of the most popular ways of counting reads, yet its slow running speed of poses a bottleneck to many RNA-Seq pipelines. Gene level counting programs also lack a robust scheme for quantifying reads that map to non-exonic genomic features, such as intronic and intergenic regions, even though these reads are prevalent in most RNA-Seq data.ResultsIn this paper we present VERSE, an RNA-Seq read counting tool which builds upon the speed of featureCounts and implements the counting modes of HTSeq. VERSE is more than 30x faster than HTSeq when computing the same gene counts. VERSE also supports a hierarchical assignment scheme, which allows reads to be assigned uniquely and sequentially to different types of features according to user-defined priorities.AvailabilityVERSE is implemented in C. It is built on top of featureCounts. VERSE is open source and can be downloaded freely from Github (https://github.com/qinzhu/VERSE)[email protected] informationTables and figures illustrating the counting modes implemented in VERSE and the differences between hierarchical and independent assignment.


Sign in / Sign up

Export Citation Format

Share Document