A sequence-based deep learning approach to predict CTCF-mediated chromatin loop

Author(s):  
Hao Lv ◽  
Fu-Ying Dao ◽  
Hasan Zulfiqar ◽  
Wei Su ◽  
Hui Ding ◽  
...  

Abstract Three-dimensional (3D) architecture of the chromosomes is of crucial importance for transcription regulation and DNA replication. Various high-throughput chromosome conformation capture-based methods have revealed that CTCF-mediated chromatin loops are a major component of 3D architecture. However, CTCF-mediated chromatin loops are cell type specific, and most chromatin interaction capture techniques are time-consuming and labor-intensive, which restricts their usage on a very large number of cell types. Genomic sequence-based computational models are sophisticated enough to capture important features of chromatin architecture and help to identify chromatin loops. In this work, we develop Deep-loop, a convolutional neural network model, to integrate k-tuple nucleotide frequency component, nucleotide pair spectrum encoding, position conservation, position scoring function and natural vector features for the prediction of chromatin loops. By a series of examination based on cross-validation, Deep-loop shows excellent performance in the identification of the chromatin loops from different cell types. The source code of Deep-loop is freely available at the repository https://github.com/linDing-group/Deep-loop.

2018 ◽  
Author(s):  
Yifeng Qi ◽  
Bin Zhang

ABSTRACTWe introduce a computational model to simulate chromatin structure and dynamics. Starting from one-dimensional genomics and epigenomics data that are available for hundreds of cell types, this model enables de novo prediction of chromatin structures at five-kilo-base resolution. Simulated chromatin structures recapitulate known features of genome organization, including the formation of chromatin loops, topologically associating domains (TADs) and compartments, and are in quantitative agreement with chromosome conformation capture experiments and super-resolution microscopy measurements. Detailed characterization of the predicted structural ensemble reveals the dynamical flexibility of chromatin loops and the presence of cross-talk among neighboring TADs. Analysis of the model’s energy function uncovers distinct mechanisms for chromatin folding at various length scales.


2016 ◽  
Vol 2 (2) ◽  
pp. e1500882 ◽  
Author(s):  
Steven W. Criscione ◽  
Marco De Cecco ◽  
Benjamin Siranosian ◽  
Yue Zhang ◽  
Jill A. Kreiling ◽  
...  

Replicative cellular senescence is a fundamental biological process characterized by an irreversible arrest of proliferation. Senescent cells accumulate a variety of epigenetic changes, but the three-dimensional (3D) organization of their chromatin is not known. We applied a combination of whole-genome chromosome conformation capture (Hi-C), fluorescence in situ hybridization, and in silico modeling methods to characterize the 3D architecture of interphase chromosomes in proliferating, quiescent, and senescent cells. Although the overall organization of the chromatin into active (A) and repressive (B) compartments and topologically associated domains (TADs) is conserved between the three conditions, a subset of TADs switches between compartments. On a global level, the Hi-C interaction matrices of senescent cells are characterized by a relative loss of long-range and gain of short-range interactions within chromosomes. Direct measurements of distances between genetic loci, chromosome volumes, and chromatin accessibility suggest that the Hi-C interaction changes are caused by a significant reduction of the volumes occupied by individual chromosome arms. In contrast, centromeres oppose this overall compaction trend and increase in volume. The structural model arising from our study provides a unique high-resolution view of the complex chromosomal architecture in senescent cells.


2018 ◽  
Author(s):  
Yusen Ye ◽  
Lin Gao ◽  
Shihua Zhang

AbstractThe chromosome conformation capture (3C) technique and its variants have been employed to reveal the existence of a hierarchy of structures in three-dimensional (3D) chromosomal architecture, including compartments, topologically associating domains (TADs), sub-TADs and chromatin loops. However, existing methods for domain detection were only designed based on symmetric Hi-C maps, ignoring long-range interaction structures between domains. To this end, we proposed a generic and efficient method to identify multi-scale topological domains (MSTD), including cis- and trans-interacting regions, from a variety of 3D genomic datasets. We first applied MSTD to detect promoter-anchored interaction domains (PADs) from promoter capture Hi-C datasets across 17 primary blood cell types. The boundaries of PADs are significantly enriched with one or the combination of multiple epigenetic factors. Moreover, PADs between functionally similar cell types are significantly conserved in terms of domain regions and expression states. Cell type-specific PADs involve in distinct cell type-specific activities and regulatory events by dynamic interactions within them. We also employed MSTD to define multi-scale domains from typical symmetric Hi-C datasets and illustrated its distinct superiority to the-state-of-art methods in terms of accuracy, flexibility and efficiency.


2020 ◽  
Author(s):  
Yihang Shen ◽  
Carl Kingsford

AbstractThree-dimensional chromosomal structure plays an important role in gene regulation. Chromosome conformation capture techniques, especially the high-throughput, sequencing-based technique Hi-C, provide new insights on spatial architectures of chromosomes. However, Hi-C data contains artifacts and systemic biases that substantially influence subsequent analysis. Computational models have been developed to address these biases explicitly, however, it is difficult to enumerate and eliminate all the biases in models. Other models are designed to correct biases implicitly, but they will also be invalid in some situations such as copy number variations. We characterize a new kind of artifact in Hi-C data. We find that this artifact is caused by incorrect alignment of Hi-C reads against approximate repeat regions and can lead to erroneous chromatin contact signals. The artifact cannot be corrected by current Hi-C correction methods. We design a probabilistic method and develop a new Hi-C processing pipeline by integrating our probabilistic method with the HiC-Pro pipeline. We find that the new pipeline can remove this new artifact effectively, while preserving important features of the original Hi-C matrices.


2016 ◽  
Vol 9 (1) ◽  
Author(s):  
Tobias A. Knoch ◽  
Malte Wachsmuth ◽  
Nick Kepper ◽  
Michael Lesnussa ◽  
Anis Abuseiris ◽  
...  

Abstract Background The dynamic three-dimensional chromatin architecture of genomes and its co-evolutionary connection to its function—the storage, expression, and replication of genetic information—is still one of the central issues in biology. Here, we describe the much debated 3D architecture of the human and mouse genomes from the nucleosomal to the megabase pair level by a novel approach combining selective high-throughput high-resolution chromosomal interaction capture (T2C), polymer simulations, and scaling analysis of the 3D architecture and the DNA sequence. Results The genome is compacted into a chromatin quasi-fibre with ~5 ± 1 nucleosomes/11 nm, folded into stable ~30–100 kbp loops forming stable loop aggregates/rosettes connected by similar sized linkers. Minor but significant variations in the architecture are seen between cell types and functional states. The architecture and the DNA sequence show very similar fine-structured multi-scaling behaviour confirming their co-evolution and the above. Conclusions This architecture, its dynamics, and accessibility, balance stability and flexibility ensuring genome integrity and variation enabling gene expression/regulation by self-organization of (in)active units already in proximity. Our results agree with the heuristics of the field and allow “architectural sequencing” at a genome mechanics level to understand the inseparable systems genomic properties.


2017 ◽  
Author(s):  
Oana Ursu ◽  
Nathan Boley ◽  
Maryna Taranova ◽  
Y.X. Rachel Wang ◽  
Galip Gurkan Yardimci ◽  
...  

AbstractMotivationThe three-dimensional organization of chromatin plays a critical role in gene regulation and disease. High-throughput chromosome conformation capture experiments such as Hi-C are used to obtain genome-wide maps of 3D chromatin contacts. However, robust estimation of data quality and systematic comparison of these contact maps is challenging due to the multi-scale, hierarchical structure of chromatin contacts and the resulting properties of experimental noise in the data. Measuring concordance of contact maps is important for assessing reproducibility of replicate experiments and for modeling variation between different cellular contexts.ResultsWe introduce a concordance measure called GenomeDISCO (DIfferences between Smoothed COntact maps) for assessing the similarity of a pair of contact maps obtained from chromosome conformation capture experiments. The key idea is to smooth contact maps using random walks on the contact map graph, before estimating concordance. We use simulated datasets to benchmark GenomeDISCO’s sensitivity to different types of noise that affect chromatin contact maps. When applied to a large collection of Hi-C datasets, GenomeDISCO accurately distinguishes biological replicates from samples obtained from different cell types. GenomeDISCO also generalizes to other chromosome conformation capture assays, such as HiChIP.AvailabilitySoftware implementing GenomeDISCO is available at https://github.com/kundajelab/[email protected] informationSupplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Tobias A. Knoch ◽  
Malte Wachsmuth ◽  
Nick Kepper ◽  
Michael Lesnussa ◽  
Anis Abuseiris ◽  
...  

AbstractThe dynamic three-dimensional chromatin architecture of genomes and its co-evolutionary connection to its function – the storage, expression, and replication of genetic information – is still one of the central issues in biology. Here, we describe the much debated 3D-architecture of the human and mouse genomes from the nucleosomal to the megabase pair level by a novel approach combining selective high-throughput high-resolution chromosomal interaction capture (T2C), polymer simulations, and scaling analysis of the 3D-architecture and the DNA sequence: The genome is compacted into a chromatin quasi-fibre with ∼5±1 nucleosomes/11nm, folded into stable ∼30-100 kbp loops forming stable loop aggregates/rosettes connected by similar sized linkers. Minor but significant variations in the architecture are seen between cell types/functional states. The architecture and the DNA sequence show very similar fine-structured multi-scaling behaviour confirming their co-evolution and the above. This architecture, its dynamics, and accessibility balance stability and flexibility ensuring genome integrity and variation enabling gene expression/regulation by self-organization of (in)active units already in proximity. Our results agree with the heuristics of the field and allow “architectural sequencing” at a genome mechanics level to understand the inseparable systems genomic properties.


2016 ◽  
Author(s):  
Hui Zhang ◽  
Feifei Li ◽  
Yan Jia ◽  
Bingxiang Xu ◽  
Yiqun Zhang ◽  
...  

AbstractHigh-throughput chromosome conformation capture technologies, such as Hi-C, have made it possible to survey 3D genome structure. However, the ability to obtain 3D profiles at kilobase resolution at low cost remains a major challenge. Therefore, we herein report a computational method to precisely identify chromatin interaction sites at kilobase resolution from MNase-seq data, termed chromatin interaction site detector (CISD), and a CISD-based chromatin loop predictor (CISD_loop) that predicts chromatin-chromatin interaction (CCI) from low-resolution Hi-C data. The methods are built on a hypothesis that CCIs result in a characteristic nucleosome arrangement pattern flanking the interaction sites. Accordingly, we show that the predictions of CISD and CISD_loop overlap closely with chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) anchors and loops, respectively. Moreover, the methods trained in one cell type can be applied to other cell types with high accuracy. The validity of the methods was further supported by chromosome conformation capture (3C) experiments at 5kb resolution. Finally, we demonstrate that only modest amounts of MNase-seq and Hi-C data are sufficient to achieve ultrahigh resolution CCI map. The predictive power of CISD/CISD_loop supports the hypothesis that CCIs induce local nucleosome rearrangement and that the pattern may serve as probes for 3D dynamics of the genome. Thus, our method will facilitate precise and systematic investigations of the interactions between distal regulatory elements on a larger scale than hitherto have been possible.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Emre Sefer

AbstractChromosome conformation capture experiments such as Hi–C map the three-dimensional spatial organization of genomes in a genome-wide scale. Even though Hi–C interactions are not biased towards any of the histone modifications, previous analysis has revealed denser interactions around many histone modifications. Nevertheless, simultaneous effects of these modifications in Hi–C interaction graph have not been fully characterized yet, limiting our understanding of genome shape. Here, we propose ChromatinCoverage and its extension TemporalPrizeCoverage methods to decompose Hi–C interaction graph in terms of known histone modifications. Both methods are based on set multicover with pairs, where each Hi–C interaction is tried to be covered by histone modification pairs. We find 4 histone modifications H3K4me1, H3K4me3, H3K9me3, H3K27ac to be significantly predictive of most Hi–C interactions across species, cell types and cell cycles. The proposed methods are quite effective in predicting Hi–C interactions and topologically-associated domains in one species, given it is trained on another species or cell types. Overall, our findings reveal the impact of subset of histone modifications in chromatin shape via Hi–C interaction graph.


2018 ◽  
Author(s):  
Ruochi Zhang ◽  
Yuchuan Wang ◽  
Yang Yang ◽  
Yang Zhang ◽  
Jian Ma

AbstractThe three dimensional organization of chromosomes within the cell nucleus is highly regulated. It is known that CTCF is an important architectural protein to mediate long-range chromatin loops. Recent studies have shown that the majority of CTCF binding motif pairs at chromatin loop anchor regions are in convergent orientation. However, it remains unknown whether the genomic context at the sequence level can determine if a convergent CTCF motif pair is able to form chromatin loop. In this paper, we directly ask whether and what sequence-based features (other than the motif itself) may be important to establish CTCF-mediated chromatin loops. We found that motif conservation measured by “branch-of-origin” that accounts for motif turn-over in evolution is an important feature. We developed a new machine learning algorithm called CTCF-MP based on word2vec to demonstrate that sequence-based features alone have the capability to predict if a pair of convergent CTCF motifs would form a loop. Together with functional genomic signals from CTCF ChIP-seq and DNase-seq, CTCF-MP is able to make highly accurate predictions on whether a convergent CTCF motif pair would form a loop in a single cell type and also across different cell types. Our work represents an important step further to understand the sequence determinants that may guide the formation of complex chromatin architectures.


Sign in / Sign up

Export Citation Format

Share Document