scholarly journals Accurate loop calling for 3D genomic data with cLoops

2018 ◽  
Author(s):  
Yaqiang Cao ◽  
Xingwei Chen ◽  
Daosheng Ai ◽  
Zhaoxiong Chen ◽  
Guoyu Chen ◽  
...  

AbstractSequencing-based 3D genome mapping technologies can identify loops formed by interactions between regulatory elements hundreds of kilobases apart. Existing loop-calling tools are mostly restricted to a single data type, with accuracy dependent on a pre-defined resolution contact matrix or called peaks, and can have prohibitive hardware costs. Here we introduce cLoops (‘see loops’) to address these limitations. cLoops is based on the clustering algorithm cDBSCAN that directly analyzes the paired-end tags (PETs) to find candidate loops and uses a permuted local background to estimate statistical significance. These two data-type-independent processes enable loops to be reliably identified for both sharp and broad peak data, including but not limited to ChIA-PET, Hi-C, HiChIP and Trac-looping data. Loops identified by cLoops showed much less distance-dependent bias and higher enrichment relative to local regions than existing tools. Altogether, cLoops improves accuracy of detecting of 3D-genomic loops from sequencing data, is versatile, flexible, efficient, and has modest hardware requirements, and is freely available at: https://github.com/YaqiangCao/cLoops.

Author(s):  
Yaqiang Cao ◽  
Zhaoxiong Chen ◽  
Xingwei Chen ◽  
Daosheng Ai ◽  
Guoyu Chen ◽  
...  

Abstract Motivation Sequencing-based 3D genome mapping technologies can identify loops formed by interactions between regulatory elements hundreds of kilobases apart. Existing loop-calling tools are mostly restricted to a single data type, with accuracy dependent on a predefined resolution contact matrix or called peaks, and can have prohibitive hardware costs. Results Here, we introduce cLoops (‘see loops’) to address these limitations. cLoops is based on the clustering algorithm cDBSCAN that directly analyzes the paired-end tags (PETs) to find candidate loops and uses a permuted local background to estimate statistical significance. These two data-type-independent processes enable loops to be reliably identified for both sharp and broad peak data, including but not limited to ChIA-PET, Hi-C, HiChIP and Trac-looping data. Loops identified by cLoops showed much less distance-dependent bias and higher enrichment relative to local regions than existing tools. Altogether, cLoops improves accuracy of detecting of 3D-genomic loops from sequencing data, is versatile, flexible, efficient, and has modest hardware requirements. Availability and implementation cLoops with documentation and example data are freely available at: https://github.com/YaqiangCao/cLoops. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Minji Kim ◽  
Meizhen Zheng ◽  
Simon Zhongyuan Tian ◽  
Byoungkoo Lee ◽  
Jeffrey H. Chuang ◽  
...  

AbstractThe single-molecule multiplex chromatin interaction data are generated by emerging 3D genome mapping technologies such as GAM, SPRITE, and ChIA-Drop. These datasets provide insights into high-dimensional chromatin organization, yet introduce new computational challenges. Thus, we developed MIA-Sig, an algorithmic solution based on signal processing and information theory. We demonstrate its ability to de-noise the multiplex data, assess the statistical significance of chromatin complexes, and identify topological domains and frequent inter-domain contacts. On chromatin immunoprecipitation (ChIP)-enriched data, MIA-Sig can clearly distinguish the protein-associated interactions from the non-specific topological domains. Together, MIA-Sig represents a novel algorithmic framework for multiplex chromatin interaction analysis.


2019 ◽  
Author(s):  
Minji Kim ◽  
Meizhen Zheng ◽  
Simon Zhongyuan Tian ◽  
Daniel Capurso ◽  
Byoungkoo Lee ◽  
...  

AbstractThe single-molecule multiplex chromatin interaction data generated by emerging non-ligation-based 3D genome mapping technologies provide novel insights into high dimensional chromatin organization, yet introduce new computational challenges. We developed MIA-Sig (https://github.com/TheJacksonLaboratory/mia-sig.git), an algorithmic framework to de-noise the data, assess the statistical significance of chromatin complexes, and identify topological domains and inter-domain contacts. On chromatin immunoprecipitation (ChIP)-enriched data, MIA-Sig can clearly distinguish the protein-associated interactions from the non-specific topological domains.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Karolina Stępniak ◽  
Magdalena A. Machnicka ◽  
Jakub Mieczkowski ◽  
Anna Macioszek ◽  
Bartosz Wojtaś ◽  
...  

AbstractChromatin structure and accessibility, and combinatorial binding of transcription factors to regulatory elements in genomic DNA control transcription. Genetic variations in genes encoding histones, epigenetics-related enzymes or modifiers affect chromatin structure/dynamics and result in alterations in gene expression contributing to cancer development or progression. Gliomas are brain tumors frequently associated with epigenetics-related gene deregulation. We perform whole-genome mapping of chromatin accessibility, histone modifications, DNA methylation patterns and transcriptome analysis simultaneously in multiple tumor samples to unravel epigenetic dysfunctions driving gliomagenesis. Based on the results of the integrative analysis of the acquired profiles, we create an atlas of active enhancers and promoters in benign and malignant gliomas. We explore these elements and intersect with Hi-C data to uncover molecular mechanisms instructing gene expression in gliomas.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Xinjun Li ◽  
Kristina Sundquist ◽  
Jan Sundquist ◽  
Asta Försti ◽  
Kari Hemminki

AbstractChildhood acute lymphoblastic leukemia (ALL) has an origin in the fetal period which may distinguish it from ALL diagnosed later in life. We wanted to test whether familial risks differ in ALL diagnosed in the very early childhood from ALL diagnosed later. The Swedish nation-wide family-cancer data were used until year 2016 to calculate standardized incidence ratios (SIRs) for familial risks in ALL in three diagnostic age-groups: 0–4, 5–34 and 35 + years. Among 1335 ALL patients diagnosed before age 5, familial risks were increased for esophageal (4.78), breast (1.42), prostate (1.40) and connective tissue (2.97) cancers and leukemia (2.51, ALL 7.81). In age-group 5–34 years, rectal (1.73) and endometrial (2.40) cancer, myeloma (2.25) and leukemia (2.00, ALL 4.60) reached statistical significance. In the oldest age-group, the only association was with Hodgkin lymphoma (3.42). Diagnostic ages of family members of ALL patients were significantly lower compared to these cancers in the population for breast, prostate and rectal cancers. The patterns of increased familial cancers suggest that BRCA2 mutations could contribute to associations of ALL with breast and prostate cancers, and mismatch gene PMS2 mutations with rectal and endometrial cancers. Future DNA sequencing data will be a test for these familial predictions.


2021 ◽  
Author(s):  
Diana Makai ◽  
Andras Cseh ◽  
Adel Sepsi ◽  
Szabolcs Makai

Spatial organisation of the genome has a fundamental effect on its biological functions. Chromatin-chromatin interactions and 3D spatial structures are involved in transcriptional regulation and have a decisive role in DNA replication and repair. To understand how individual genes and their regulatory elements function within the larger genomic context, and how the genome reacts as a whole to environmental stimuli, the linear sequence information needs to be interpreted in 3-dimensional space. While recent advances in chromatin conformation capture technologies including Hi-C, considerably advanced our understanding of the genomes, defining the DNA, as it is organized in the cell nucleus is still a challenging task. 3D genome modelling needs to reflect the DNA as a flexible polymer, which can wind up to the fraction of its total length and greatly unwind and stretch to implement a multitude of functions. Here we propose a novel approach to model genomes as a multigraph based on Hi-C contact data. Multigraph-based 3D genome modelling of barley and rice revealed the well-known Rabl and Rosetta chromatin organizations, respectively, as well as other higher order structures. Our results shows that the well-established toolset of Graph theory is highly valuable in modelling large genomes in 3D.


2021 ◽  
Author(s):  
Fei Ge ◽  
Jingtao Qu ◽  
Peng Liu ◽  
Lang Pan ◽  
Chaoying Zou ◽  
...  

Heretofore, little is known about the mechanism underlying the genotype-dependence of embryonic callus (EC) induction, which has severely inhibited the development of maize genetic engineering. Here, we report the genome sequence and annotation of a maize inbred line with high EC induction ratio, A188, which is assembled from single-molecule sequencing and optical genome mapping. We assembled a 2,210 Mb genome with a scaffold N50 size of 11.61 million bases (Mb), compared to those of 9.73 Mb for B73 and 10.2 Mb for Mo17. Comparative analysis revealed that ~30% of the predicted A188 genes had large structural variations to B73, Mo17 and W22 genomes, which caused considerable protein divergence and might lead to phenotypic variations between the four inbred lines. Combining our new A188 genome, previously reported QTLs and RNA sequencing data, we reveal 8 large structural variation genes and 4 differentially expressed genes playing potential roles in EC induction.


Methods ◽  
2018 ◽  
Vol 142 ◽  
pp. 1-2 ◽  
Author(s):  
Josée Dostie ◽  
Mathieu Blanchette

2019 ◽  
Vol 4 (1) ◽  
Author(s):  
Andrew Currin ◽  
Neil Swainston ◽  
Mark S Dunstan ◽  
Adrian J Jervis ◽  
Paul Mulherin ◽  
...  

Abstract Synthetic biology utilizes the Design–Build–Test–Learn pipeline for the engineering of biological systems. Typically, this requires the construction of specifically designed, large and complex DNA assemblies. The availability of cheap DNA synthesis and automation enables high-throughput assembly approaches, which generates a heavy demand for DNA sequencing to verify correctly assembled constructs. Next-generation sequencing is ideally positioned to perform this task, however with expensive hardware costs and bespoke data analysis requirements few laboratories utilize this technology in-house. Here a workflow for highly multiplexed sequencing is presented, capable of fast and accurate sequence verification of DNA assemblies using nanopore technology. A novel sample barcoding system using polymerase chain reaction is introduced, and sequencing data are analyzed through a bespoke analysis algorithm. Crucially, this algorithm overcomes the problem of high-error rate nanopore data (which typically prevents identification of single nucleotide variants) through statistical analysis of strand bias, permitting accurate sequence analysis with single-base resolution. As an example, 576 constructs (6 × 96 well plates) were processed in a single workflow in 72 h (from Escherichia coli colonies to analyzed data). Given our procedure’s low hardware costs and highly multiplexed capability, this provides cost-effective access to powerful DNA sequencing for any laboratory, with applications beyond synthetic biology including directed evolution, single nucleotide polymorphism analysis and gene synthesis.


Sign in / Sign up

Export Citation Format

Share Document