scholarly journals Amplification-free Library Preparation Improves Quality of Hi-C Analysis

2019 ◽  
Author(s):  
Longjian Niu ◽  
Wei Shen ◽  
Yingzhang Huang ◽  
Na He ◽  
Yuedong Zhang ◽  
...  

AbstractPCR amplification of Hi-C libraries introduces unusable duplicates and results in a biased representation of chromatin interactions. We present a simplified, fast, and economically efficient Hi-C library preparation procedure that generates sufficient non-amplified ligation products for deep sequencing from 30 million Drosophila cells. Comprehensive analysis of the resulting data indicates that amplification-free Hi-C preserves higher complexity of chromatin interaction and lowers sequencing depth dramatically for the same number of unique paired reads. For human cells which has a large genome, this method recovers an amount of ligated fragments enough for direct high-throughput sequencing without amplification on as low as 250 thousand of cells. Comparison with published in situ Hi-C on millions of human cells reveals that amplification introduces distance-dependent amplification bias, which results in increasing background noise level against genomic distance. With amplification bias avoided, our method may produce a chromatin interaction network more faithfully reflecting the real three-dimensional genomic architecture.

2019 ◽  
Author(s):  
Longjian Niu ◽  
Yingzhang Huang ◽  
Chunhui Hou

Abstract PCR amplification of Hi-C libraries introduces unusable duplicates and results in a biased representation of chromatin interactions. We present a simplified, fast, and economically efficient Hi-C library preparation procedure that generates sufficient non-amplified ligation products for deep sequencing. Comprehensive analysis of the resulting data indicates that amplification-free Hi-C preserves higher complexity of chromatin interaction and lowers sequencing depth dramatically for the same number of unique paired reads. With amplification bias avoided, our method may produce a chromatin interaction network more faithfully reflecting the real three-dimensional genomic architecture.


2019 ◽  
Vol 35 (17) ◽  
pp. 2916-2923 ◽  
Author(s):  
John C Stansfield ◽  
Kellen G Cresswell ◽  
Mikhail G Dozmorov

Abstract Motivation With the development of chromatin conformation capture technology and its high-throughput derivative Hi-C sequencing, studies of the three-dimensional interactome of the genome that involve multiple Hi-C datasets are becoming available. To account for the technology-driven biases unique to each dataset, there is a distinct need for methods to jointly normalize multiple Hi-C datasets. Previous attempts at removing biases from Hi-C data have made use of techniques which normalize individual Hi-C datasets, or, at best, jointly normalize two datasets. Results Here, we present multiHiCcompare, a cyclic loess regression-based joint normalization technique for removing biases across multiple Hi-C datasets. In contrast to other normalization techniques, it properly handles the Hi-C-specific decay of chromatin interaction frequencies with the increasing distance between interacting regions. multiHiCcompare uses the general linear model framework for comparative analysis of multiple Hi-C datasets, adapted for the Hi-C-specific decay of chromatin interaction frequencies. multiHiCcompare outperforms other methods when detecting a priori known chromatin interaction differences from jointly normalized datasets. Applied to the analysis of auxin-treated versus untreated experiments, and CTCF depletion experiments, multiHiCcompare was able to recover the expected epigenetic and gene expression signatures of loss of chromatin interactions and reveal novel insights. Availability and implementation multiHiCcompare is freely available on GitHub and as a Bioconductor R package https://bioconductor.org/packages/multiHiCcompare. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Claudio Zaccone ◽  
Edoardo Puglisi ◽  
Fabio Terribile ◽  
Andrea Squartini

<p>A 3-m thick sediment was found in a limestone mine located in the southern part of the Gargano Promontory, Apulia region (south of Italy), at a depth of ca. 25-30 m from the current ground level.</p><p>Samples from 5 layers were analysed by X-ray diffraction (XRD), elementar analysis (CHNS), and Inductively Coupled Plasma Mass Spectrometry (ICP-MS). Microbial DNA was also extracted and bacterial diversity analysed by PCR amplification and Illumina High-Throughput Sequencing (HTS) of the V3-V4 hypervariable regions of 16S rRNA.</p><p>Preliminary data showed that these sediments formed by subsequent weathering of carbonates and silicates, either by in situ oxidation or by dissolution followed by migration and reprecipitation, rather than during the accumulation of shallow marine sediments occurring between the middle Pliocene and the lower Pleistocene, when the extreme western sectors of the Apulian foreland underwent strong subsidence.</p><p>The main mineral compounds occurring in the 5 layers, from the top to the bottom, were the following: calcite (80%) and clay minerals in sample #1, goethite (75%) and hematite in sample #2, manganese (66%) and iron oxides in sample #3, almost exclusively goethite in sample #4, and calcite (71%) and clay minerals in sample #5.</p><p>From the microbiological point of view, drawn from a 16S metabarcoding amplicons sequencing analysis, these 5 layers appear to cluster in three groups: a) the uppermost layer (sample #1), dominated by a single and abundant taxon of Arthrobacter sp., which includes species known for  the capability of calcite precipitation; b) a middle layer (including samples #2 and #3), without prevailing abundances and less consistent occurrences across replicates, which featured members of the Oxalobacteraceae family and of the Methylophilus genus. Their closest matches in Genbank subjects included isolates from habitats such as calcium carbonate (moonmilk) muds in percolating waters within caves, mine tailings and other groundwater microcosms; c) a bottom layer (samples #4 and #5), showing an oligarchic situation and high abundances of bacteria but different from the ones that prevailed in the top layer and including members of the Nocardioidacaeae family. Also for these sequence queries, the closest GenBank subjects include cases with calcium carbonate-precipitating capabilities isolated from cave and groundwater sediments or former mining sites in studies on iron oxidizers in creek sediments at pH 4.4 or at high heavy metal concentrations.</p><p>Overall, such a distribution suggests that, both in the top and bottom layer, different communities would have undergone in situ-reproduction and colonization exploiting metabolically the substrate, whereas the mid layers would have received bacterial convection by passive transport of percolating waters.</p>


2020 ◽  
Author(s):  
Timothy Kunz ◽  
Lila Rieber ◽  
Shaun Mahony

ABSTRACTFew existing methods enable the visualization of relationships between regulatory genomic activities and genome organization as captured by Hi-C experimental data. Genome-wide Hi-C datasets are often displayed using “heatmap” matrices, but it is difficult to intuit from these heatmaps which biochemical activities are compartmentalized together. High-dimensional Hi-C data vectors can alternatively be projected onto three-dimensional space using dimensionality reduction techniques. The resulting three-dimensional structures can serve as scaffolds for projecting other forms of genomic information, thereby enabling the exploration of relationships between genome organization and various genome annotations. However, while three-dimensional models are contextually appropriate for chromatin interaction data, some analyses and visualizations may be more intuitively and conveniently performed in two-dimensional space.We present a novel approach to the visualization and analysis of chromatin organization based on the Self-Organizing Map (SOM). The SOM algorithm provides a two-dimensional manifold which adapts to represent the high dimensional chromatin interaction space. The resulting data structure can then be used to assess the relationships between regulatory genomic activities and chromatin interactions. For example, given a set of genomic coordinates corresponding to a given biochemical activity, the degree to which this activity is segregated or compartmentalized in chromatin interaction space can be intuitively visualized on the 2D SOM grid and quantified using Lorenz curve analysis. We demonstrate our approach for exploratory analysis of genome compartmentalization in a high-resolution Hi-C dataset from the human GM12878 cell line. Our SOM-based approach provides an intuitive visualization of the large-scale structure of Hi-C data and serves as a platform for integrative analyses of the relationships between various genomic activities and genome organization.


2017 ◽  
Vol 15 (06) ◽  
pp. 1740008 ◽  
Author(s):  
Lu Liu ◽  
Jianhua Ruan

Chromatin conformation capture with high-throughput sequencing (Hi-C) is a powerful technique to detect genome-wide chromatin interactions. In this paper, we introduce two novel approaches to detect differentially interacting genomic regions between two Hi-C experiments using a network model. To make input data from multiple experiments comparable, we propose a normalization strategy guided by network topological properties. We then devise two measurements, using local and global connectivity information from the chromatin interaction networks, respectively, to assess the interaction differences between two experiments. When multiple replicates are present in experiments, our approaches provide the flexibility for users to either pool all replicates together to therefore increase the network coverage, or to use the replicates in parallel to increase the signal to noise ratio. We show that while the local method works better in detecting changes from simulated networks, the global method performs better on real Hi-C data. The local and global methods, regardless of pooling, are always superior to two existing methods. Furthermore, our methods work well on both unweighted and weighted networks and our normalization strategy significantly improves the performance compared with raw networks without normalization. Therefore, we believe our methods will be useful for identifying differentially interacting genomic regions.


2021 ◽  
Vol 12 ◽  
Author(s):  
Maria Tsagiopoulou ◽  
Maria Christina Maniou ◽  
Nikolaos Pechlivanis ◽  
Anastasis Togkousidis ◽  
Michaela Kotrová ◽  
...  

A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc.


2018 ◽  
Author(s):  
Denise Thiel ◽  
Nataša Djurdjevac Conrad ◽  
Ria X Peschutter ◽  
Heike Siebert ◽  
Annalisa Marsico

AbstractBackgroundAlthough several studies have provided insights into the role of long non-coding RNAs (lncRNAs), the majority of them has unknown function. Recent evidence has shown the importance of both lncR-NAs and chromatin interactions in transcriptional regulation. Although network-based methods, mainly exploiting gene-lncRNA co-expression, have been applied to characterize lncRNA of unknown function by means of ‘guilt-by-association’ strategies, no method exists which combines co-expression analysis with 3D chromatin interaction data.ResultsTo better understand the function of chromatin interactions in the context of lncRNA-mediated gene regulation, we have developed a multi-step graph analysis approach to examine the RNA polymerase II ChIA-PET chromatin interaction network in the K562 human cell line. We have annotated the network with gene and lncRNA coordinates, and chromatin states from the ENCODE project. We used centrality measures, as well as an adaptation of our previously developed Markov State Models (MSM) clustering method, to gain a better understanding of lncRNAs in transcriptional regulation. The novelty of our approach resides into the detection of fuzzy regulatory modules based on network properties and their optimization based on co-expression analysis between genes and gene-lncRNA pairs. This results in our method returning morebona fideregulatory modules than other state-of-the art approaches for clustering on graphs.ConclusionsInterestingly, we find that lncRNA network hubs tend to be significantly enriched in disease association, positional conservation and enhancer-like functions. We validated regulatory functions for well known lncRNAs, such as MALAT1 and the enhancer-like lncRNA FALEC. In addition, by investigating the modular structure of bigger components we show that we can propose regulatory functional mechanisms for uncharacterized lncRNAs, such FLJ37453, RP11442N24 B.1 and LINC00910.


2019 ◽  
Author(s):  
Gustavo A. Ruiz Buendía ◽  
Marion Leleu ◽  
Flavia Marzetta ◽  
Ludovica Vanzan ◽  
Jennifer Y. Tan ◽  
...  

AbstractExpanded CAG/CTG repeats underlie thirteen neurological disorders, including myotonic dystrophy (DM1) and Huntington’s disease (HD). Upon expansion, CAG/CTG repeat loci acquire heterochromatic characteristics. This observation raises the hypothesis that repeat expansion provokes changes to higher order chromatin folding and thereby affects both gene expression in cis and the genetic instability of the repeat tract. Here we tested this hypothesis directly by performing 4C sequencing at the DMPK and HTT loci from DM1 and HD patient-derived cells. Surprisingly, chromatin contacts remain unchanged upon repeat expansion at both loci. This was true for loci with different DNA methylation levels and CTCF binding. Repeat sizes ranging from 15 to 1,700 displayed strikingly similar chromatin interaction profiles. Our findings argue that extensive changes in heterochromatic properties are not enough to alter chromatin folding at expanded CAG/CTG repeat loci. Moreover, the ectopic insertion of an expanded repeat tract did not change three-dimensional chromatin contacts. We conclude that expanded CAG/CTG repeats have little to no effect on chromatin conformation.


2019 ◽  
Author(s):  
Xu Zhang ◽  
Jing Niu ◽  
Guipeng Li ◽  
Qionghai Dai ◽  
Dayong Jin ◽  
...  

ABSTRACTThere is increasing interest in understanding how the three-dimensional organization of the genome is regulated. Different strategies have been employed to identify chromatin interactions genome wide. However, due to the current limitations in resolving genomic contacts, visualization and validation of these genomic loci with sub-kilobase resolution remain the bottleneck for many years. Here, we describe Tn5 transposase-based Fluorescence in situ Hybridization (Tn5-FISH), a Polymerase Chain Reaction (PCR)-based, cost-effective imaging method, which achieved the co-localization of genomic loci with sub-kilobase resolution, to fine dissect genome architecture at sub-kilobase resolution and to verify chromatin interactions detected by Chromatin Configuration Capture (3C)-derivative methods. Especially, Tn5-FISH is very useful to verify short-range chromatin interactions inside of contact domain and Topologically Associated Domain (TAD). It also offers one powerful molecular diagnosis tool for clinical detection of cytogenetic changes in cancers.


Sign in / Sign up

Export Citation Format

Share Document