Cas9-Assisted Targeting of CHromosome segments (CATCH) for targeted nanopore sequencing and optical genome mapping

Mapping Intimacies ◽

10.1101/110163 ◽

2017 ◽

Cited By ~ 5

Author(s):

Tslil Gabrieli ◽

Hila Sharim ◽

Yael Michaeli ◽

Yuval Ebenstein

Keyword(s):

Single Molecule ◽

Genome Mapping ◽

Single Point ◽

Read Length ◽

Whole Genome ◽

Sequencing Analysis ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Whole Genome Analysis ◽

Long Read

ABSTRACTVariations in the genetic code, from single point mutations to large structural or copy number alterations, influence susceptibility, onset, and progression of genetic diseases and tumor transformation. Next-generation sequencing analysis is unable to reliably capture aberrations larger than the typical sequencing read length of several hundred bases. Long-read, single-molecule sequencing methods such as SMRT and nanopore sequencing can address larger variations, but require costly whole genome analysis. Here we describe a method for isolation and enrichment of a large genomic region of interest for targeted analysis based on Cas9 excision of two sites flanking the target region and isolation of the excised DNA segment by pulsed field gel electrophoresis. The isolated target remains intact and is ideally suited for optical genome mapping and long-read sequencing at high coverage. In addition, analysis is performed directly on native genomic DNA that retains genetic and epigenetic composition without amplification bias. This method enables detection of mutations and structural variants as well as detailed analysis by generation of hybrid scaffolds composed of optical maps and sequencing data at a fraction of the cost of whole genome sequencing.

Download Full-text

DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation

Genome Biology ◽

10.1186/s13059-021-02510-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yang Liu ◽

Wojciech Rosikiewicz ◽

Ziwei Pan ◽

Nathaniel Jillette ◽

Ping Wang ◽

...

Keyword(s):

Dna Methylation ◽

Single Molecule ◽

Evaluation Criteria ◽

Systematic Evaluation ◽

Whole Genome ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Long Read ◽

Genome Scale ◽

Analytical Tools

Abstract Background Nanopore long-read sequencing technology greatly expands the capacity of long-range, single-molecule DNA-modification detection. A growing number of analytical tools have been developed to detect DNA methylation from nanopore sequencing reads. Here, we assess the performance of different methylation-calling tools to provide a systematic evaluation to guide researchers performing human epigenome-wide studies. Results We compare seven analytic tools for detecting DNA methylation from nanopore long-read sequencing data generated from human natural DNA at a whole-genome scale. We evaluate the per-read and per-site performance of CpG methylation prediction across different genomic contexts, CpG site coverage, and computational resources consumed by each tool. The seven tools exhibit different performances across the evaluation criteria. We show that the methylation prediction at regions with discordant DNA methylation patterns, intergenic regions, low CG density regions, and repetitive regions show room for improvement across all tools. Furthermore, we demonstrate that 5hmC levels at least partly contribute to the discrepancy between bisulfite and nanopore sequencing. Lastly, we provide an online DNA methylation database (https://nanome.jax.org) to display the DNA methylation levels detected by nanopore sequencing and bisulfite sequencing data across different genomic contexts. Conclusions Our study is the first systematic benchmark of computational methods for detection of mammalian whole-genome DNA modifications in nanopore sequencing. We provide a broad foundation for cross-platform standardization and an evaluation of analytical tools designed for genome-scale modified base detection using nanopore sequencing.

Download Full-text

Analysis of mitochondrial genome methylation using Nanopore single-molecule sequencing

10.1101/2021.02.05.429923 ◽

2021 ◽

Author(s):

Theresa Lüth ◽

Christine Klein ◽

Susen Schaake ◽

Ronnie Tse ◽

Sandro Pereira ◽

...

Keyword(s):

Mitochondrial Genome ◽

Single Molecule ◽

Biological Significance ◽

Cpg Methylation ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Low Level ◽

Bisulfite Treatment ◽

Synthetic Dna ◽

Long Read

AbstractThe level and the biological significance of mitochondrial DNA (mtDNA) methylation in human cells is a controversial topic. Using long-read third-generation sequencing technology, mtDNA methylation can be detected directly from the sequencing data, which overcomes previously suggested biases, introduced by bisulfite treatment-dependent methods. We investigated mtDNA from whole blood-derived DNA and established a workflow to detect CpG methylation with Nanopolish. In order to obtain native mtDNA, we adjusted a whole-genome sequencing protocol and performed ligation library preparation and Nanopore sequencing. To validate the workflow, 897bp of methylated and unmethylated synthetic DNA samples at different dilution ratios were sequenced and CpG methylation was detected. Interestingly, we observed that reads with higher methylation in the synthetic DNA did not pass Guppy calling, possibly affecting conclusions about DNA methylation in Nanopore sequencing. We detected in all blood-derived samples overall low-level methylation across the mitochondrial genome, with exceptions at certain CpG sites. Our results suggest that Nanopore sequencing is capable of detecting low-level mtDNA methylation. However, further refinement of the bioinformatical pipelines including Guppy failed reads are recommended.

Download Full-text

Genome-wide epigenetic profiling of 5-hydroxymethylcytosine by long-read optical mapping

10.1101/260166 ◽

2018 ◽

Cited By ~ 1

Author(s):

Tslil Gabrieli ◽

Hila Sharim ◽

Gil Nifker ◽

Jonathan Jeffet ◽

Tamar Shahal ◽

...

Keyword(s):

Long Range ◽

Single Molecule ◽

Human Peripheral Blood ◽

Read Length ◽

Epigenetic Mark ◽

Sequencing Data ◽

Chromosomal Dna ◽

Genome Wide ◽

Long Read ◽

Genomic Regions

AbstractThe epigenetic mark 5-hydroxymethylcytosine (5-hmC) is a distinct product of active enzymatic demethylation that is linked to gene regulation, development and disease. Genome-wide 5-hmC profiles generated by short-read next-generation sequencing are limited in providing long-range epigenetic information relevant to highly variable genomic regions, such as the 3.7 Mbp disease-related Human Leukocyte Antigen (HLA) region. We present a long-read, single-molecule mapping technology that generates hybrid genetic/epigenetic profiles of native chromosomal DNA. The genome-wide distribution of 5- hmC in human peripheral blood cells correlates well with 5-hmC DNA immunoprecipitation (hMeDIP) sequencing. However, the long read length of 100 kbp-1Mbp produces 5-hmC profiles across variable genomic regions that failed to showup in the sequencing data. In addition, optical 5-hmC mapping shows strong correlation between the 5-hmC density in gene bodies and the corresponding level of gene expression. The single molecule concept provides information on the distribution and coexistence of 5-hmC signals at multiple genomic loci on the same genomic DNA molecule, revealing long-range correlations and cell-to-cell epigenetic variation.

Download Full-text

Cas9 targeted enrichment of mobile elements using nanopore sequencing

Nature Communications ◽

10.1038/s41467-021-23918-y ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Torrin L. McDonald ◽

Weichen Zhou ◽

Christopher P. Castro ◽

Camille Mumm ◽

Jessica A. Switzenberg ◽

...

Keyword(s):

Genetic Disorders ◽

Mobile Element ◽

Read Length ◽

Whole Genome ◽

Nanopore Sequencing ◽

Short Read Sequencing ◽

Human Genomes ◽

Long Read ◽

Genomic Regions ◽

Targeted Enrichment

AbstractMobile element insertions (MEIs) are repetitive genomic sequences that contribute to genetic variation and can lead to genetic disorders. Targeted and whole-genome approaches using short-read sequencing have been developed to identify reference and non-reference MEIs; however, the read length hampers detection of these elements in complex genomic regions. Here, we pair Cas9-targeted nanopore sequencing with computational methodologies to capture active MEIs in human genomes. We demonstrate parallel enrichment for distinct classes of MEIs, averaging 44% of reads on-targeted signals and exhibiting a 13.4-54x enrichment over whole-genome approaches. We show an individual flow cell can recover most MEIs (97% L1Hs, 93% AluYb, 51% AluYa, 99% SVA_F, and 65% SVA_E). We identify seventeen non-reference MEIs in GM12878 overlooked by modern, long-read analysis pipelines, primarily in repetitive genomic regions. This work introduces the utility of nanopore sequencing for MEI enrichment and lays the foundation for rapid discovery of elusive, repetitive genetic elements.

Download Full-text

MBRS-47. RAPID MOLECULAR SUBGROUPING OF MEDULLOBLASTOMA BASED ON DNA METHYLATION BY NANOPORE SEQUENCING

Neuro-Oncology ◽

10.1093/neuonc/noaa222.556 ◽

2020 ◽

Vol 22 (Supplement_3) ◽

pp. iii406-iii406

Author(s):

Julien Masliah-Planchon ◽

Elodie Girard ◽

Philipp Euskirchen ◽

Christine Bourneix ◽

Delphine Lequin ◽

...

Keyword(s):

Dna Methylation ◽

Single Molecule ◽

Nanopore Sequencing ◽

Molecular Subgroup ◽

Group Assignment ◽

Group 4 ◽

Methylation Assay ◽

Tumor Group ◽

Long Read ◽

Group 3

Abstract Medulloblastoma (MB) can be classified into four molecular subgroups (WNT group, SHH group, group 3, and group 4). The gold standard of assignment of molecular subgroup through DNA methylation profiling uses Illumina EPIC array. However, this tool has some limitation in terms of cost and timing, in order to get the results soon enough for clinical use. We present an alternative DNA methylation assay based on nanopore sequencing efficient for rapid, cheaper, and reliable subgrouping of clinical MB samples. Low-depth whole genome with long-read single-molecule nanopore sequencing was used to simultaneously assess copy number profile and MB subgrouping based on DNA methylation. The DNA methylation data generated by Nanopore sequencing were compared to a publicly available reference cohort comprising over 2,800 brain tumors including the four subgroups of MB (Capper et al. Nature; 2018) to generate a score that estimates a confidence with a tumor group assignment. Among the 24 MB analyzed with nanopore sequencing (six WNT, nine SHH, five group 3, and four group 4), all of them were classified in the appropriate subgroup established by expression-based Nanostring subgrouping. In addition to the subgrouping, we also examine the genomic profile. Furthermore, all previously identified clinically relevant genomic rearrangements (mostly MYC and MYCN amplifications) were also detected with our assay. In conclusion, we are confirming the full reliability of nanopore sequencing as a novel rapid and cheap assay for methylation-based MB subgrouping. We now plan to implement this technology to other embryonal tumors of the central nervous system.

Download Full-text

Comprehensive identification of transposable element insertions using multiple sequencing technologies

Nature Communications ◽

10.1038/s41467-021-24041-8 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Chong Chu ◽

Rebeca Borges-Monroy ◽

Vinayak V. Viswanadham ◽

Soohyun Lee ◽

Heng Li ◽

...

Keyword(s):

Transposable Element ◽

Structure And Function ◽

Endogenous Retroviruses ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Sequencing Technologies ◽

Long Read ◽

And Function

AbstractTransposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea.

Download Full-text

Sequoia: an interactive visual analytics platform for interpretation and feature extraction from nanopore sequencing datasets

BMC Genomics ◽

10.1186/s12864-021-07791-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ratanond Koonchanok ◽

Swapna Vidhur Daulatabad ◽

Quoseena Mir ◽

Khairi Reda ◽

Sarath Chandra Janga

Keyword(s):

Single Molecule ◽

Visual Analytics ◽

Visual Analysis ◽

Direct Sequencing ◽

Visual Exploration ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Rna Sequences ◽

Sequencing Technologies ◽

Signal Features

Abstract Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia.

Download Full-text

Genome assembly of the maize inbred line A188 provides a new reference genome for functional genomics

10.1101/2021.03.15.435372 ◽

2021 ◽

Author(s):

Fei Ge ◽

Jingtao Qu ◽

Peng Liu ◽

Lang Pan ◽

Chaoying Zou ◽

...

Keyword(s):

Single Molecule ◽

Inbred Line ◽

Genome Mapping ◽

Maize Inbred Line ◽

Sequencing Data ◽

Structural Variations ◽

Single Molecule Sequencing ◽

Maize Genetic ◽

Induction Ratio ◽

Phenotypic Variations

Heretofore, little is known about the mechanism underlying the genotype-dependence of embryonic callus (EC) induction, which has severely inhibited the development of maize genetic engineering. Here, we report the genome sequence and annotation of a maize inbred line with high EC induction ratio, A188, which is assembled from single-molecule sequencing and optical genome mapping. We assembled a 2,210 Mb genome with a scaffold N50 size of 11.61 million bases (Mb), compared to those of 9.73 Mb for B73 and 10.2 Mb for Mo17. Comparative analysis revealed that ~30% of the predicted A188 genes had large structural variations to B73, Mo17 and W22 genomes, which caused considerable protein divergence and might lead to phenotypic variations between the four inbred lines. Combining our new A188 genome, previously reported QTLs and RNA sequencing data, we reveal 8 large structural variation genes and 4 differentially expressed genes playing potential roles in EC induction.

Download Full-text

Whole genome sequencing of Borrelia miyamotoi isolate Izh-4: reference for a complex bacterial genome

BMC Genomics ◽

10.1186/s12864-019-6388-4 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 4

Author(s):

Konstantin V. Kuleshov ◽

Gabriele Margos ◽

Volker Fingerle ◽

Joris Koetsveld ◽

Irina A. Goptar ◽

...

Keyword(s):

Single Molecule ◽

Bacterial Genome ◽

Borrelia Miyamotoi ◽

Whole Genome ◽

Relapsing Fever ◽

Linear Chromosome ◽

The North ◽

Long Read ◽

Variable Major Proteins ◽

Tick Vectors

Abstract Background The genus Borrelia comprises spirochaetal bacteria maintained in natural transmission cycles by tick vectors and vertebrate reservoir hosts. The main groups are represented by a species complex including the causative agents of Lyme borreliosis and relapsing fever group Borrelia. Borrelia miyamotoi belongs to the relapsing fever group of spirochetes and forms distinct populations in North America, Asia, and Europe. As all Borrelia species B. miyamotoi possess an unusual and complex genome consisting of a linear chromosome and a number of linear and circular plasmids. The species is considered an emerging human pathogen and an increasing number of human cases are being described in the Northern hemisphere. The aim of this study was to produce a high quality reference genome that will facilitate future studies into genetic differences between different populations and the genome plasticity of B. miyamotoi. Results We used multiple available sequencing methods, including Pacific Bioscience single-molecule real-time technology (SMRT) and Oxford Nanopore technology (ONT) supplemented with highly accurate Illumina sequences, to explore the suitability for whole genome assembly of the Russian B. miyamotoi isolate, Izh-4. Plasmids were typed according to their potential plasmid partitioning genes (PF32, 49, 50, 57/62). Comparing and combining results of both long-read (SMRT and ONT) and short-read methods (Illumina), we determined that the genome of the isolate Izh-4 consisted of one linear chromosome, 12 linear and two circular plasmids. Whilst the majority of plasmids had corresponding contigs in the Asian B. miyamotoi isolate FR64b, there were only four that matched plasmids of the North American isolate CT13–2396, indicating differences between B. miyamotoi populations. Several plasmids, e.g. lp41, lp29, lp23, and lp24, were found to carry variable major proteins. Amongst those were variable large proteins (Vlp) subtype Vlp-α, Vlp-γ, Vlp-δ and also Vlp-β. Phylogenetic analysis of common plasmids types showed the uniqueness in Russian/Asian isolates of B. miyamotoi compared to other isolates. Conclusions We here describe the genome of a Russian B. miyamotoi clinical isolate, providing a solid basis for future comparative genomics of B. miyamotoi isolates. This will be a great impetus for further basic, molecular and epidemiological research on this emerging tick-borne pathogen.

Download Full-text

Transcriptome Profiling Provides Insight into the Genes in Carotenoid Biosynthesis during the Mesocarp and Seed Developmental Stages of Avocado (Persea americana)

International Journal of Molecular Sciences ◽

10.3390/ijms20174117 ◽

2019 ◽

Vol 20 (17) ◽

pp. 4117 ◽

Cited By ~ 8

Author(s):

Yu Ge ◽

Zhihao Cheng ◽

Xiongyuan Si ◽

Weihong Ma ◽

Lin Tan ◽

...

Keyword(s):

Single Molecule ◽

Developmental Stages ◽

Gene Dosage ◽

Average Length ◽

Beta Carotene ◽

Transcriptome Profiling ◽

Carotenoid Biosynthesis ◽

Persea Americana ◽

Sequencing Data ◽

Long Read

Avocado (Persea americana Mill.) is an economically important crop because of its high nutritional value. However, the absence of a sequenced avocado reference genome has hindered investigations of secondary metabolism. For next-generation high-throughput transcriptome sequencing, we obtained 365,615,152 and 348,623,402 clean reads as well as 109.13 and 104.10 Gb of sequencing data for avocado mesocarp and seed, respectively, during five developmental stages. High-quality reads were assembled into 100,837 unigenes with an average length of 847.40 bp (N50 = 1725 bp). Additionally, 16,903 differentially expressed genes (DEGs) were detected, 17 of which were related to carotenoid biosynthesis. The expression levels of most of these 17 DEGs were higher in the mesocarp than in the seed during five developmental stages. In this study, the avocado mesocarp and seed transcriptome were also sequenced using single-molecule long-read sequencing to acquired 25.79 and 17.67 Gb clean data, respectively. We identified 233,014 and 238,219 consensus isoforms in avocado mesocarp and seed, respectively. Furthermore, 104 and 59 isoforms were found to correspond to the putative 11 carotenoid biosynthetic-related genes in the avocado mesocarp and seed, respectively. The isoform numbers of 10 out of the putative 11 genes involved in the carotenoid biosynthetic pathway were higher in the mesocarp than those in the seed. Besides, alpha- and beta-carotene contents in the avocado mesocarp and seed during five developmental stages were also measured, and they were higher in the mesocarp than in the seed, which validated the results of transcriptome profiling. Gene expression changes and the associated variations in gene dosage could influence carotenoid biosynthesis. These results will help to further elucidate carotenoid biosynthesis in avocado.

Download Full-text