Accurate loop calling for 3D genomic data with cLoops

Mapping Intimacies ◽

10.1101/465849 ◽

2018 ◽

Cited By ~ 4

Author(s):

Yaqiang Cao ◽

Xingwei Chen ◽

Daosheng Ai ◽

Zhaoxiong Chen ◽

Guoyu Chen ◽

...

Keyword(s):

Clustering Algorithm ◽

Genome Mapping ◽

Statistical Significance ◽

Regulatory Elements ◽

Data Type ◽

Sequencing Data ◽

3D Genome ◽

Local Background ◽

Single Data ◽

Hardware Costs

AbstractSequencing-based 3D genome mapping technologies can identify loops formed by interactions between regulatory elements hundreds of kilobases apart. Existing loop-calling tools are mostly restricted to a single data type, with accuracy dependent on a pre-defined resolution contact matrix or called peaks, and can have prohibitive hardware costs. Here we introduce cLoops (‘see loops’) to address these limitations. cLoops is based on the clustering algorithm cDBSCAN that directly analyzes the paired-end tags (PETs) to find candidate loops and uses a permuted local background to estimate statistical significance. These two data-type-independent processes enable loops to be reliably identified for both sharp and broad peak data, including but not limited to ChIA-PET, Hi-C, HiChIP and Trac-looping data. Loops identified by cLoops showed much less distance-dependent bias and higher enrichment relative to local regions than existing tools. Altogether, cLoops improves accuracy of detecting of 3D-genomic loops from sequencing data, is versatile, flexible, efficient, and has modest hardware requirements, and is freely available at: https://github.com/YaqiangCao/cLoops.

Download Full-text

Accurate loop calling for 3D genomic data with cLoops

Bioinformatics ◽

10.1093/bioinformatics/btz651 ◽

2019 ◽

Cited By ~ 4

Author(s):

Yaqiang Cao ◽

Zhaoxiong Chen ◽

Xingwei Chen ◽

Daosheng Ai ◽

Guoyu Chen ◽

...

Keyword(s):

Clustering Algorithm ◽

Genome Mapping ◽

Statistical Significance ◽

Regulatory Elements ◽

Data Type ◽

Supplementary Information ◽

Sequencing Data ◽

3D Genome ◽

Local Background ◽

Hardware Costs

Abstract Motivation Sequencing-based 3D genome mapping technologies can identify loops formed by interactions between regulatory elements hundreds of kilobases apart. Existing loop-calling tools are mostly restricted to a single data type, with accuracy dependent on a predefined resolution contact matrix or called peaks, and can have prohibitive hardware costs. Results Here, we introduce cLoops (‘see loops’) to address these limitations. cLoops is based on the clustering algorithm cDBSCAN that directly analyzes the paired-end tags (PETs) to find candidate loops and uses a permuted local background to estimate statistical significance. These two data-type-independent processes enable loops to be reliably identified for both sharp and broad peak data, including but not limited to ChIA-PET, Hi-C, HiChIP and Trac-looping data. Loops identified by cLoops showed much less distance-dependent bias and higher enrichment relative to local regions than existing tools. Altogether, cLoops improves accuracy of detecting of 3D-genomic loops from sequencing data, is versatile, flexible, efficient, and has modest hardware requirements. Availability and implementation cLoops with documentation and example data are freely available at: https://github.com/YaqiangCao/cLoops. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MIA-Sig: multiplex chromatin interaction analysis by signal processing and statistical algorithms

Genome Biology ◽

10.1186/s13059-019-1868-z ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 3

Author(s):

Minji Kim ◽

Meizhen Zheng ◽

Simon Zhongyuan Tian ◽

Byoungkoo Lee ◽

Jeffrey H. Chuang ◽

...

Keyword(s):

Signal Processing ◽

Single Molecule ◽

Genome Mapping ◽

Interaction Analysis ◽

Statistical Significance ◽

Chromatin Interaction ◽

Topological Domains ◽

3D Genome ◽

Algorithmic Solution ◽

Algorithmic Framework

AbstractThe single-molecule multiplex chromatin interaction data are generated by emerging 3D genome mapping technologies such as GAM, SPRITE, and ChIA-Drop. These datasets provide insights into high-dimensional chromatin organization, yet introduce new computational challenges. Thus, we developed MIA-Sig, an algorithmic solution based on signal processing and information theory. We demonstrate its ability to de-noise the multiplex data, assess the statistical significance of chromatin complexes, and identify topological domains and frequent inter-domain contacts. On chromatin immunoprecipitation (ChIP)-enriched data, MIA-Sig can clearly distinguish the protein-associated interactions from the non-specific topological domains. Together, MIA-Sig represents a novel algorithmic framework for multiplex chromatin interaction analysis.

Download Full-text

Multiplex chromatin interaction analysis by signal processing and statistical algorithms

10.1101/665232 ◽

2019 ◽

Cited By ~ 1

Author(s):

Minji Kim ◽

Meizhen Zheng ◽

Simon Zhongyuan Tian ◽

Daniel Capurso ◽

Byoungkoo Lee ◽

...

Keyword(s):

Single Molecule ◽

Chromatin Immunoprecipitation ◽

Genome Mapping ◽

Interaction Analysis ◽

Statistical Significance ◽

Chromatin Interaction ◽

Interaction Data ◽

Topological Domains ◽

3D Genome ◽

Algorithmic Framework

AbstractThe single-molecule multiplex chromatin interaction data generated by emerging non-ligation-based 3D genome mapping technologies provide novel insights into high dimensional chromatin organization, yet introduce new computational challenges. We developed MIA-Sig (https://github.com/TheJacksonLaboratory/mia-sig.git), an algorithmic framework to de-noise the data, assess the statistical significance of chromatin complexes, and identify topological domains and inter-domain contacts. On chromatin immunoprecipitation (ChIP)-enriched data, MIA-Sig can clearly distinguish the protein-associated interactions from the non-specific topological domains.

Download Full-text

Mapping chromatin accessibility and active regulatory elements reveals pathological mechanisms in human gliomas

Nature Communications ◽

10.1038/s41467-021-23922-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Karolina Stępniak ◽

Magdalena A. Machnicka ◽

Jakub Mieczkowski ◽

Anna Macioszek ◽

Bartosz Wojtaś ◽

...

Keyword(s):

Gene Expression ◽

Chromatin Structure ◽

Molecular Mechanisms ◽

Genome Mapping ◽

Malignant Gliomas ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Multiple Tumor ◽

Genes Encoding ◽

Methylation Patterns

AbstractChromatin structure and accessibility, and combinatorial binding of transcription factors to regulatory elements in genomic DNA control transcription. Genetic variations in genes encoding histones, epigenetics-related enzymes or modifiers affect chromatin structure/dynamics and result in alterations in gene expression contributing to cancer development or progression. Gliomas are brain tumors frequently associated with epigenetics-related gene deregulation. We perform whole-genome mapping of chromatin accessibility, histone modifications, DNA methylation patterns and transcriptome analysis simultaneously in multiple tumor samples to unravel epigenetic dysfunctions driving gliomagenesis. Based on the results of the integrative analysis of the acquired profiles, we create an atlas of active enhancers and promoters in benign and malignant gliomas. We explore these elements and intersect with Hi-C data to uncover molecular mechanisms instructing gene expression in gliomas.

Download Full-text

Family history of early onset acute lymphoblastic leukemia is suggesting genetic associations

Scientific Reports ◽

10.1038/s41598-021-90542-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Xinjun Li ◽

Kristina Sundquist ◽

Jan Sundquist ◽

Asta Försti ◽

Kari Hemminki

Keyword(s):

Acute Lymphoblastic Leukemia ◽

Lymphoblastic Leukemia ◽

Statistical Significance ◽

Age Groups ◽

Age Group ◽

Genetic Associations ◽

Sequencing Data ◽

Cancer Data ◽

Group 5 ◽

Prostate Cancers

AbstractChildhood acute lymphoblastic leukemia (ALL) has an origin in the fetal period which may distinguish it from ALL diagnosed later in life. We wanted to test whether familial risks differ in ALL diagnosed in the very early childhood from ALL diagnosed later. The Swedish nation-wide family-cancer data were used until year 2016 to calculate standardized incidence ratios (SIRs) for familial risks in ALL in three diagnostic age-groups: 0–4, 5–34 and 35 + years. Among 1335 ALL patients diagnosed before age 5, familial risks were increased for esophageal (4.78), breast (1.42), prostate (1.40) and connective tissue (2.97) cancers and leukemia (2.51, ALL 7.81). In age-group 5–34 years, rectal (1.73) and endometrial (2.40) cancer, myeloma (2.25) and leukemia (2.00, ALL 4.60) reached statistical significance. In the oldest age-group, the only association was with Hodgkin lymphoma (3.42). Diagnostic ages of family members of ALL patients were significantly lower compared to these cancers in the population for breast, prostate and rectal cancers. The patterns of increased familial cancers suggest that BRCA2 mutations could contribute to associations of ALL with breast and prostate cancers, and mismatch gene PMS2 mutations with rectal and endometrial cancers. Future DNA sequencing data will be a test for these familial predictions.

Download Full-text

Discovering Transcriptional Regulatory Elements From Run‐On and Sequencing Data Using the Web‐Based dREG Gateway

Current Protocols in Bioinformatics ◽

10.1002/cpbi.70 ◽

2018 ◽

Vol 66 (1) ◽

Cited By ~ 4

Author(s):

Tinyi Chu ◽

Zhong Wang ◽

Shao‐Pei Chou ◽

Charles G. Danko

Keyword(s):

Regulatory Elements ◽

Sequencing Data ◽

Web Based ◽

Transcriptional Regulatory Elements ◽

Transcriptional Regulatory ◽

The Web

Download Full-text

A Multigraph model of the 3D genome

10.1101/2021.11.11.468281 ◽

2021 ◽

Author(s):

Diana Makai ◽

Andras Cseh ◽

Adel Sepsi ◽

Szabolcs Makai

Keyword(s):

Information Needs ◽

Dimensional Space ◽

Decisive Role ◽

Regulatory Elements ◽

Sequence Information ◽

Genomic Context ◽

3D Genome ◽

Flexible Polymer ◽

Novel Approach ◽

Contact Data

Spatial organisation of the genome has a fundamental effect on its biological functions. Chromatin-chromatin interactions and 3D spatial structures are involved in transcriptional regulation and have a decisive role in DNA replication and repair. To understand how individual genes and their regulatory elements function within the larger genomic context, and how the genome reacts as a whole to environmental stimuli, the linear sequence information needs to be interpreted in 3-dimensional space. While recent advances in chromatin conformation capture technologies including Hi-C, considerably advanced our understanding of the genomes, defining the DNA, as it is organized in the cell nucleus is still a challenging task. 3D genome modelling needs to reflect the DNA as a flexible polymer, which can wind up to the fraction of its total length and greatly unwind and stretch to implement a multitude of functions. Here we propose a novel approach to model genomes as a multigraph based on Hi-C contact data. Multigraph-based 3D genome modelling of barley and rice revealed the well-known Rabl and Rosetta chromatin organizations, respectively, as well as other higher order structures. Our results shows that the well-established toolset of Graph theory is highly valuable in modelling large genomes in 3D.

Download Full-text

Genome assembly of the maize inbred line A188 provides a new reference genome for functional genomics

10.1101/2021.03.15.435372 ◽

2021 ◽

Author(s):

Fei Ge ◽

Jingtao Qu ◽

Peng Liu ◽

Lang Pan ◽

Chaoying Zou ◽

...

Keyword(s):

Single Molecule ◽

Inbred Line ◽

Genome Mapping ◽

Maize Inbred Line ◽

Sequencing Data ◽

Structural Variations ◽

Single Molecule Sequencing ◽

Maize Genetic ◽

Induction Ratio ◽

Phenotypic Variations

Heretofore, little is known about the mechanism underlying the genotype-dependence of embryonic callus (EC) induction, which has severely inhibited the development of maize genetic engineering. Here, we report the genome sequence and annotation of a maize inbred line with high EC induction ratio, A188, which is assembled from single-molecule sequencing and optical genome mapping. We assembled a 2,210 Mb genome with a scaffold N50 size of 11.61 million bases (Mb), compared to those of 9.73 Mb for B73 and 10.2 Mb for Mo17. Comparative analysis revealed that ~30% of the predicted A188 genes had large structural variations to B73, Mo17 and W22 genomes, which caused considerable protein divergence and might lead to phenotypic variations between the four inbred lines. Combining our new A188 genome, previously reported QTLs and RNA sequencing data, we reveal 8 large structural variation genes and 4 differentially expressed genes playing potential roles in EC induction.

Download Full-text

3D genome mapping and analysis methods

Methods ◽

10.1016/j.ymeth.2018.05.017 ◽

2018 ◽

Vol 142 ◽

pp. 1-2 ◽

Cited By ~ 2

Author(s):

Josée Dostie ◽

Mathieu Blanchette

Keyword(s):

Genome Mapping ◽

3D Genome ◽

Analysis Methods

Download Full-text

Highly multiplexed, fast and accurate nanopore sequencing for verification of synthetic DNA constructs and sequence libraries

Synthetic Biology ◽

10.1093/synbio/ysz025 ◽

2019 ◽

Vol 4 (1) ◽

Cited By ~ 4

Author(s):

Andrew Currin ◽

Neil Swainston ◽

Mark S Dunstan ◽

Adrian J Jervis ◽

Paul Mulherin ◽

...

Keyword(s):

Synthetic Biology ◽

Dna Sequencing ◽

Cost Effective ◽

Polymorphism Analysis ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Synthetic Dna ◽

Design Build ◽

Hardware Costs

Abstract Synthetic biology utilizes the Design–Build–Test–Learn pipeline for the engineering of biological systems. Typically, this requires the construction of specifically designed, large and complex DNA assemblies. The availability of cheap DNA synthesis and automation enables high-throughput assembly approaches, which generates a heavy demand for DNA sequencing to verify correctly assembled constructs. Next-generation sequencing is ideally positioned to perform this task, however with expensive hardware costs and bespoke data analysis requirements few laboratories utilize this technology in-house. Here a workflow for highly multiplexed sequencing is presented, capable of fast and accurate sequence verification of DNA assemblies using nanopore technology. A novel sample barcoding system using polymerase chain reaction is introduced, and sequencing data are analyzed through a bespoke analysis algorithm. Crucially, this algorithm overcomes the problem of high-error rate nanopore data (which typically prevents identification of single nucleotide variants) through statistical analysis of strand bias, permitting accurate sequence analysis with single-base resolution. As an example, 576 constructs (6 × 96 well plates) were processed in a single workflow in 72 h (from Escherichia coli colonies to analyzed data). Given our procedure’s low hardware costs and highly multiplexed capability, this provides cost-effective access to powerful DNA sequencing for any laboratory, with applications beyond synthetic biology including directed evolution, single nucleotide polymorphism analysis and gene synthesis.

Download Full-text