scholarly journals HiCcompare: a method for joint normalization of Hi-C datasets and differential chromatin interaction detection

2017 ◽  
Author(s):  
John C. Stansfield ◽  
Mikhail G. Dozmorov

AbstractChanges in spatial chromatin interactions are now emerging as a unifying mechanism or-chestrating regulation of gene expression. Evolution of chromatin conformation capture methods into Hi-C sequencing technology now allows an insight into chromatin interactions on a genome-wide scale. However, Hi-C data contains many DNA sequence- and technology-driven biases. These biases prevent effective comparison of chromatin interactions aimed at identifying genomic regions differentially interacting between, disease-normal states or different cell types. Several methods have been developed for normalizing individual Hi-C datasets. However, they fail to account for biases between two or more Hi-C datasets, hindering comparative analysis of chromatin interactions. We developed a simple and effective method HiCcompare for the joint normalization and differential analysis of multiple Hi-C datasets. The method avoids constraining Hi-C data within a rigid statistical model, allowing a data-driven normalization of biases using locally weighted linear regression (loess). The method identifies region-specific chromatin interaction changes complementary to changes due to large-scale genomic rearrangements, such as copy number variants (CNVs). HiCcompare outperforms methods for normalizing individual Hi-C datasets in detecting a priori known chromatin interaction differences in simulated and real-life settings while detecting biologically relevant changes. HiCcompare is freely available as a Bioconductor R package https://bioconductor.org/packages/HiCcompare/.Author SummaryAdvances in chromosome conformation capture sequencing technologies (Hi-C) have sparked interest in studying the 3-dimensional (3D) chromatin interaction structure of the human genome. The 3D structure of the genome is now considered as a primary regulator of gene expression. Changes to the 3D chromatin interactions are now emerging as a hallmark of cancer and other complex diseases. With the growing availability of Hi-C data generated under different conditions (e.g. tumor-normal, cell-type-specific), methods are needed to compare them. However, biases in Hi-C data hinder their comparative analysis. To account for biases, several normalization techniques have been developed for removing biases in individual Hi-C datasets, but very few were designed to account for between-datasets biases. We developed a new method and R package HiCcompare for the joint normalization of multiple Hi-C datasets and differential chromatin interaction detection. Our results show the superiority of our joint normalization methods compared to methods for normalizing individual datasets in detecting true chromatin interaction changes. HiCcompare enables further research into discovering the dynamics of 3D genomic changes.

2019 ◽  
Vol 35 (17) ◽  
pp. 2916-2923 ◽  
Author(s):  
John C Stansfield ◽  
Kellen G Cresswell ◽  
Mikhail G Dozmorov

Abstract Motivation With the development of chromatin conformation capture technology and its high-throughput derivative Hi-C sequencing, studies of the three-dimensional interactome of the genome that involve multiple Hi-C datasets are becoming available. To account for the technology-driven biases unique to each dataset, there is a distinct need for methods to jointly normalize multiple Hi-C datasets. Previous attempts at removing biases from Hi-C data have made use of techniques which normalize individual Hi-C datasets, or, at best, jointly normalize two datasets. Results Here, we present multiHiCcompare, a cyclic loess regression-based joint normalization technique for removing biases across multiple Hi-C datasets. In contrast to other normalization techniques, it properly handles the Hi-C-specific decay of chromatin interaction frequencies with the increasing distance between interacting regions. multiHiCcompare uses the general linear model framework for comparative analysis of multiple Hi-C datasets, adapted for the Hi-C-specific decay of chromatin interaction frequencies. multiHiCcompare outperforms other methods when detecting a priori known chromatin interaction differences from jointly normalized datasets. Applied to the analysis of auxin-treated versus untreated experiments, and CTCF depletion experiments, multiHiCcompare was able to recover the expected epigenetic and gene expression signatures of loss of chromatin interactions and reveal novel insights. Availability and implementation multiHiCcompare is freely available on GitHub and as a Bioconductor R package https://bioconductor.org/packages/multiHiCcompare. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Guoliang Li ◽  
Tongkai Sun ◽  
Huidan Chang ◽  
Liuyang Cai ◽  
Ping Hong ◽  
...  

AbstractUnderstanding chromatin interactions is important since they create chromosome conformation and link the cis- and trans-regulatory elements to their target genes for transcriptional regulation. Chromatin Interaction Analysis with Paired-End Tag (ChIA-PET) sequencing is a genome-wide high-throughput technology that detects chromatin interactions associated with a specific protein of interest. Previously we developed ChIA-PET Tool in 2010 for ChIA-PET data analysis. Here we present the updated version of ChIA-PET Tool (V3), is a computational package to process the next-generation sequence data generated from ChIA-PET experiments. It processes the short-read data and long-read ChIA-PET data with multithreading and generates the statistics of results in a HTML file. In this paper, we provide a detailed demonstration of the design of ChIA-PET Tool V3 and how to install it and analyze a specific ChIA-PET data set with it. At present, other ChIA-PET data analysis tools have developed including ChiaSig, MICC, Mango and ChIA-PET2 and so on. We compared our tool with other tools using the same public data set in the same machine. Most of peaks detected by ChIA-PET Tool V3 overlap with those from other tools. There is higher enrichment for significant chromatin interactions of ChIA-PET Tool V3 in APA plot. ChIA-PET Tool V3 is open source and is available at GitHub (https://github.com/GuoliangLi-HZAU/ChIA-PET_Tool_V3/).


Author(s):  
Fanli Meng ◽  
Hainan Zhao ◽  
Bo Zhu ◽  
Tao Zhang ◽  
Mingyu Yang ◽  
...  

Abstract Enhancers located in introns are abundant and play a major role in the regulation of gene expression in mammalian species. By contrast, the functions of intronic enhancers in plants have largely been unexplored and only a handful of plant intronic enhancers have been reported. We performed a genome-wide prediction of intronic enhancers in Arabidopsis thaliana using open chromatin signatures based on DNase I sequencing. We identified 941 candidate intronic enhancers associated with 806 genes in seedling tissue and 1,271 intronic enhancers associated with 1,069 genes in floral tissue. We validated the function of 15 of 21 (71%) of the predicted intronic enhancers in transgenic assays using a reporter gene. We also created deletion lines of three intronic enhancers associated with two different genes using CRISPR/Cas. Deletion of these enhancers, which span key transcription factor binding sites, did not abolish gene expression but caused varying levels of transcriptional repression of their cognate genes. Remarkably, the transcriptional repression of the deletion lines occurred at specific developmental stages and resulted in distinct phenotypic effects on plant morphology and development. Clearly, these three intronic enhancers are important in fine-tuning tissue- and development-specific expression of their cognate genes.


2018 ◽  
Author(s):  
Minal Çalışkan ◽  
Elisabetta Manduchi ◽  
H. Shanker Rao ◽  
Julian A Segert ◽  
Marcia Holsbach Beltrame ◽  
...  

ABSTRACTDeciphering the impact of genetic variation on gene regulation is fundamental to understanding common, complex human diseases. Although histone modifications are important markers of gene regulatory regions of the genome, any specific histone modification has not been assayed in more than a few individuals in the human liver. As a result, the impacts of genetic variation that direct histone modification states in the liver are poorly understood. Here, we generate the most comprehensive genome-wide dataset of two epigenetic marks, H3K4me3 and H3K27ac, and annotate thousands of putative regulatory elements in the human liver. We integrate these findings with genome-wide gene expression data collected from the same human liver tissues and high-resolution promoter-focused chromatin interaction maps collected from human liver-derived HepG2 cells. We demonstrate widespread functional consequences of natural genetic variation on putative regulatory element activity and gene expression levels. Leveraging these extensive datasets, we fine-map a total of 77 GWAS loci that have been associated with at least one complex phenotype. Our results contribute to the repertoire of genes and regulatory mechanisms governing complex disease development and further the basic understanding of genetic and epigenetic regulation of gene expression in the human liver tissue.


2019 ◽  
Author(s):  
Kellen G. Cresswell ◽  
John C. Stansfield ◽  
Mikhail G. Dozmorov

AbstractThe three-dimensional (3D) structure of the genome plays a crucial role in regulating gene expression. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), the fundamental building blocks of the genome. Identifying such hierarchical structures is a critical step in understanding regulatory interactions within the genome. Existing tools for TAD calling frequently require tunable parameters, are sensitive to biases such as sequencing depth, resolution, and sparsity of Hi-C data, and are computationally inefficient. Furthermore, the choice of TAD callers within the R/Bioconductor ecosystem is limited. To address these challenges, we frame the problem of TAD detection in a spectral clustering framework. Our SpectralTAD R package has automatic parameter selection, robust to sequencing depth, resolution and sparsity of Hi-C data, and detects hierarchical, biologically relevant TAD structure. Using simulated and real-life Hi-C data, we show that SpectralTAD outperforms rGMAP and TopDom, two state-of-the-art R-based TAD callers. TAD boundaries that are shared among multiple levels of the hierarchy were more enriched in relevant genomic annotations, e.g., CTCF binding sites, suggesting their higher biological importance. In contrast, boundaries of primary TADs, defined as TADs which cannot be split into sub-TADs, were found to be less enriched in genomic annotations, suggesting their more dynamic role in genome regulation. In summary, we present a simple, fast, and user-friendly R package for robust detection of TAD hierarchies supported by biological evidence. SpectralTAD is available on https://github.com/dozmorovlab/SpectralTAD and Bioconductor (submitted).


2016 ◽  
Author(s):  
Sutirtha Chakraborty

AbstractRNAseq technology has revolutionized the face of gene expression profiling by generating read count data measuring the transcript abundances for each queried gene. But on the other side, the underlying technical artefacts generate a wide variety of hidden effects that may potentially distort the primary signals of differential expression between two sample groups. This is in addition to the factors of unwanted biological variability may give rise to a highly complicated pattern of residual expression heterogeneity in the data. Standard normalization techniques fail to correct for these latent variables and leads to a substantial reduction in the power of common statistical tests for differential expression. Here I introduce a novel method SVAPLSseq that aims to capture the traces of hidden variability in the data and incorporate them in a regression framework to re-estimate the primary signals for finding the truly positive genes. Application on both simulated and real-life RNAseq data shows the superior performance of the method compared to other available techniques. The method is provided as an R package ‘SVAPLSseq’ that has been submitted to Bioconductor.


2020 ◽  
Vol 48 (8) ◽  
pp. 4066-4080 ◽  
Author(s):  
Miguel Madrid-Mencía ◽  
Emanuele Raineri ◽  
Tran Bich Ngoc Cao ◽  
Vera Pancaldi

Abstract We introduce an R package and a web-based visualization tool for the representation, analysis and integration of epigenomic data in the context of 3D chromatin interaction networks. GARDEN-NET allows for the projection of user-submitted genomic features on pre-loaded chromatin interaction networks, exploiting the functionalities of the ChAseR package to explore the features in combination with chromatin network topology properties. We demonstrate the approach using published epigenomic and chromatin structure datasets in haematopoietic cells, including a collection of gene expression, DNA methylation and histone modifications data in primary healthy myeloid cells from hundreds of individuals. These datasets allow us to test the robustness of chromatin assortativity, which highlights which epigenomic features, alone or in combination, are more strongly associated with 3D genome architecture. We find evidence for genomic regions with specific histone modifications, DNA methylation, and gene expression levels to be forming preferential contacts in 3D nuclear space, to a different extent depending on the cell type and lineage. Finally, we examine replication timing data and find it to be the genomic feature most strongly associated with overall 3D chromatin organization at multiple scales, consistent with previous results from the literature.


Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1440
Author(s):  
Michał Zawisza-Álvarez ◽  
Claudia Pérez-Calles ◽  
Giacomo Gattoni ◽  
Jordi Garcia-Fernàndez ◽  
Èlia Benito-Gutiérrez ◽  
...  

RNA editing is a relatively unexplored process in which transcribed RNA is modified at specific nucleotides before translation, adding another level of regulation of gene expression. Cephalopods use it extensively to increase the regulatory complexity of their nervous systems, and mammals use it too, but less prominently. Nevertheless, little is known about the specifics of RNA editing in most of the other clades and the relevance of RNA editing from an evolutionary perspective remains unknown. Here we analyze a key element of the editing machinery, the ADAR (adenosine deaminase acting on RNA) gene family, in an animal with a key phylogenetic position at the root of chordates: the cephalochordate amphioxus. We show, that as in cephalopods, ADAR genes in amphioxus are predominantly expressed in the nervous system; we identify a number of RNA editing events in amphioxus; and we provide a newly developed method to identify RNA editing events in highly polymorphic genomes using orthology as a guide. Overall, our work lays the foundations for future comparative analysis of RNA-editing events across the metazoan tree.


Sign in / Sign up

Export Citation Format

Share Document