scholarly journals GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs

2017 ◽  
Author(s):  
Oana Ursu ◽  
Nathan Boley ◽  
Maryna Taranova ◽  
Y.X. Rachel Wang ◽  
Galip Gurkan Yardimci ◽  
...  

AbstractMotivationThe three-dimensional organization of chromatin plays a critical role in gene regulation and disease. High-throughput chromosome conformation capture experiments such as Hi-C are used to obtain genome-wide maps of 3D chromatin contacts. However, robust estimation of data quality and systematic comparison of these contact maps is challenging due to the multi-scale, hierarchical structure of chromatin contacts and the resulting properties of experimental noise in the data. Measuring concordance of contact maps is important for assessing reproducibility of replicate experiments and for modeling variation between different cellular contexts.ResultsWe introduce a concordance measure called GenomeDISCO (DIfferences between Smoothed COntact maps) for assessing the similarity of a pair of contact maps obtained from chromosome conformation capture experiments. The key idea is to smooth contact maps using random walks on the contact map graph, before estimating concordance. We use simulated datasets to benchmark GenomeDISCO’s sensitivity to different types of noise that affect chromatin contact maps. When applied to a large collection of Hi-C datasets, GenomeDISCO accurately distinguishes biological replicates from samples obtained from different cell types. GenomeDISCO also generalizes to other chromosome conformation capture assays, such as HiChIP.AvailabilitySoftware implementing GenomeDISCO is available at https://github.com/kundajelab/[email protected] informationSupplementary data are available at Bioinformatics online.

2018 ◽  
Vol 34 (16) ◽  
pp. 2701-2707 ◽  
Author(s):  
Oana Ursu ◽  
Nathan Boley ◽  
Maryna Taranova ◽  
Y X Rachel Wang ◽  
Galip Gurkan Yardimci ◽  
...  

2017 ◽  
Vol 20 (4) ◽  
pp. 1205-1214
Author(s):  
Jincheol Park ◽  
Shili Lin

Abstract How chromosomes fold and how distal genomic elements interact with one another at a genomic scale have been actively pursued in the past decade following the seminal work describing the Chromosome Conformation Capture (3C) assay. Essentially, 3C-based technologies produce two-dimensional (2D) contact maps that capture interactions between genomic fragments. Accordingly, a plethora of analytical methods have been proposed to take a 2D contact map as input to recapitulate the underlying whole genome three-dimensional (3D) structure of the chromatin. However, their performance in terms of several factors, including data resolution and ability to handle contact map features, have not been sufficiently evaluated. This task is taken up in this article, in which we consider several recent and/or well-regarded methods, both optimization-based and model-based, for their aptness of producing 3D structures using contact maps generated based on a population of cells. These methods are evaluated and compared using both simulated and real data. Several criteria have been used. For simulated data sets, the focus is on accurate recapitulation of the entire structure given the existence of the gold standard. For real data sets, comparison with distances measured by Florescence in situ Hybridization and consistency with several genomic features of known biological functions are examined.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Jonas Ibn-Salem ◽  
Miguel A. Andrade-Navarro

Abstract Background Knowledge of the three-dimensional structure of the genome is necessary to understand how gene expression is regulated. Recent experimental techniques such as Hi-C or ChIA-PET measure long-range chromatin interactions genome-wide but are experimentally elaborate, have limited resolution and such data is only available for a limited number of cell types and tissues. Results While ChIP-seq was not designed to detect chromatin interactions, the formaldehyde treatment in the ChIP-seq protocol cross-links proteins with each other and with DNA. Consequently, also regions that are not directly bound by the targeted TF but interact with the binding site via chromatin looping are co-immunoprecipitated and sequenced. This produces minor ChIP-seq signals at loop anchor regions close to the directly bound site. We use the position and shape of ChIP-seq signals around CTCF motif pairs to predict whether they interact or not. We implemented this approach in a prediction method, termed Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs (7C). We applied 7C to all CTCF motif pairs within 1 Mb in the human genome and validated predicted interactions with high-resolution Hi-C and ChIA-PET. A single ChIP-seq experiment from known architectural proteins (CTCF, Rad21, Znf143) but also from other TFs (like TRIM22 or RUNX3) predicts loops accurately. Importantly, 7C predicts loops in cell types and for TF ChIP-seq datasets not used in training. Conclusion 7C predicts chromatin loops which can help to associate TF binding sites to regulated genes. Furthermore, profiling of hundreds of ChIP-seq datasets results in novel candidate factors functionally involved in chromatin looping. Our method is available as an R/Bioconductor package: http://bioconductor.org/packages/sevenC.


Genes ◽  
2020 ◽  
Vol 11 (3) ◽  
pp. 289 ◽  
Author(s):  
Ping Hong ◽  
Hao Jiang ◽  
Weize Xu ◽  
Da Lin ◽  
Qian Xu ◽  
...  

It is becoming increasingly important to understand the mechanism of regulatory elements on target genes in long-range genomic distance. 3C (chromosome conformation capture) and its derived methods are now widely applied to investigate three-dimensional (3D) genome organizations and gene regulation. Digestion-ligation-only Hi-C (DLO Hi-C) is a new technology with high efficiency and cost-effectiveness for whole-genome chromosome conformation capture. Here, we introduce the DLO Hi-C tool, a flexible and versatile pipeline for processing DLO Hi-C data from raw sequencing reads to normalized contact maps and for providing quality controls for different steps. It includes more efficient iterative mapping and linker filtering. We applied the DLO Hi-C tool to different DLO Hi-C datasets and demonstrated its ability in processing large data with multithreading. The DLO Hi-C tool is suitable for processing DLO Hi-C and in situ DLO Hi-C datasets. It is convenient and efficient for DLO Hi-C data processing.


2020 ◽  
Vol 36 (12) ◽  
pp. 3645-3651
Author(s):  
Lyam Baudry ◽  
Gaël A Millot ◽  
Agnes Thierry ◽  
Romain Koszul ◽  
Vittore F Scolari

Abstract Motivation Hi-C contact maps reflect the relative contact frequencies between pairs of genomic loci, quantified through deep sequencing. Differential analyses of these maps enable downstream biological interpretations. However, the multi-fractal nature of the chromatin polymer inside the cellular envelope results in contact frequency values spanning several orders of magnitude: contacts between loci pairs separated by large genomic distances are much sparser than closer pairs. The same is true for poorly covered regions, such as repeated sequences. Both distant and poorly covered regions translate into low signal-to-noise ratios. There is no clear consensus to address this limitation. Results We present Serpentine, a fast, flexible procedure operating on raw data, which considers the contacts in each region of a contact map. Binning is performed only when necessary on noisy regions, preserving informative ones. This results in high-quality, low-noise contact maps that can be conveniently visualized for rigorous comparative analyses. Availability and implementation Serpentine is available on the PyPI repository and https://github.com/koszullab/serpentine; documentation and tutorials are provided at https://serpentine.readthedocs.io/en/latest/. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Yusen Ye ◽  
Lin Gao ◽  
Shihua Zhang

AbstractThe chromosome conformation capture (3C) technique and its variants have been employed to reveal the existence of a hierarchy of structures in three-dimensional (3D) chromosomal architecture, including compartments, topologically associating domains (TADs), sub-TADs and chromatin loops. However, existing methods for domain detection were only designed based on symmetric Hi-C maps, ignoring long-range interaction structures between domains. To this end, we proposed a generic and efficient method to identify multi-scale topological domains (MSTD), including cis- and trans-interacting regions, from a variety of 3D genomic datasets. We first applied MSTD to detect promoter-anchored interaction domains (PADs) from promoter capture Hi-C datasets across 17 primary blood cell types. The boundaries of PADs are significantly enriched with one or the combination of multiple epigenetic factors. Moreover, PADs between functionally similar cell types are significantly conserved in terms of domain regions and expression states. Cell type-specific PADs involve in distinct cell type-specific activities and regulatory events by dynamic interactions within them. We also employed MSTD to define multi-scale domains from typical symmetric Hi-C datasets and illustrated its distinct superiority to the-state-of-art methods in terms of accuracy, flexibility and efficiency.


Author(s):  
Filomeno Sánchez Rodríguez ◽  
Shahram Mesdaghi ◽  
Adam J Simpkin ◽  
J Javier Burgos-Mármol ◽  
David L Murphy ◽  
...  

Abstract Summary Covariance-based predictions of residue contacts and inter-residue distances are an increasingly popular data type in protein bioinformatics. Here we present ConPlot, a web-based application for convenient display and analysis of contact maps and distograms. Integration of predicted contact data with other predictions is often required to facilitate inference of structural features. ConPlot can therefore use the empty space near the contact map diagonal to display multiple coloured tracks representing other sequence-based predictions. Popular file formats are natively read and bespoke data can also be flexibly displayed. This novel visualization will enable easier interpretation of predicted contact maps. Availability and implementation available online at www.conplot.org, along with documentation and examples. Alternatively, ConPlot can be installed and used locally using the docker image from the project’s Docker Hub repository. ConPlot is licensed under the BSD 3-Clause. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (17) ◽  
pp. 4560-4567
Author(s):  
Mikhail D Magnitov ◽  
Veronika S Kuznetsova ◽  
Sergey V Ulianov ◽  
Sergey V Razin ◽  
Alexander V Tyakht

Abstract Motivation The application of genome-wide chromosome conformation capture (3C) methods to prokaryotes provided insights into the spatial organization of their genomes and identified patterns conserved across the tree of life, such as chromatin compartments and contact domains. Prokaryotic genomes vary in GC content and the density of restriction sites along the chromosome, suggesting that these properties should be considered when planning experiments and choosing appropriate software for data processing. Diverse algorithms are available for the analysis of eukaryotic chromatin contact maps, but their potential application to prokaryotic data has not yet been evaluated. Results Here, we present a comparative analysis of domain calling algorithms using available single-microbe experimental data. We evaluated the algorithms’ intra-dataset reproducibility, concordance with other tools and sensitivity to coverage and resolution of contact maps. Using RNA-seq as an example, we showed how orthogonal biological data can be utilized to validate the reliability and significance of annotated domains. We also suggest that in silico simulations of contact maps can be used to choose optimal restriction enzymes and estimate theoretical map resolutions before the experiment. Our results provide guidelines for researchers investigating microbes and microbial communities using high-throughput 3C assays such as Hi-C and 3C-seq. Availability and implementation The code of the analysis is available at https://github.com/magnitov/prokaryotic_cids. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Da-Inn Lee ◽  
Sushmita Roy

AbstractThe three-dimensional (3D) organization of the genome plays a critical role in gene regulation for diverse normal and disease processes. High-throughput chromosome conformation capture (3C) assays, such as Hi-C, SPRITE, GAM, and HiChIP, have revealed higher-order organizational units such as topologically associating domains (TADs), which can shape the regulatory landscape governing downstream phenotypes. Analysis of high-throughput 3C data depends on the sequencing depth, which directly affects the resolution and the sparsity of the generated 3D contact count map. Identification of TADs remains a significant challenge due to the sensitivity of existing methods to resolution and sparsity. Here we present GRiNCH, a novel matrix-factorization-based approach for simultaneous TAD discovery and smoothing of contact count matrices from high-throughput 3C data. GRiNCH TADs are enriched in known architectural proteins and chromatin modification signals and are stable to the resolution, and sparsity of the input data. GRiNCH smoothing improves the recovery of structure and significant interactions from low-depth datasets. Furthermore, enrichment analysis of 746 transcription factor motifs in GRiNCH TADs from developmental time-course and cell-line Hi-C datasets predicted transcription factors with potentially novel genome organization roles. GRiNCH is a broadly applicable tool for the analysis of high throughput 3C datasets from a variety of platforms including SPRITE and HiChIP to understand 3D genome organization in diverse biological contexts.


2020 ◽  
Author(s):  
Jianwen Chen ◽  
Shuangjia Zheng ◽  
Huiying Zhao ◽  
Yuedong Yang

AbstractMotivationProtein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are mostly based on the one-dimensional embedding of amino acids that is limited to catch spatially structural information.ResultsIn this study, we have developed a new structure-aware method to predict protein solubility by attentive graph convolutional network (GCN), where the protein topology attribute graph was constructed through predicted contact maps from the sequence. GraphSol was shown to substantially out-perform other sequence-based methods. The model was proven to be stable by consistent R2 of 0.48 in both the cross-validation and independent test of the eSOL dataset. To our best knowledge, this is the first study to utilize the GCN for sequence-based predictions. More importantly, this architecture could be extended to other protein prediction tasks.AvailabilityThe package is available at http://[email protected] informationSupplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document