Measuring the Landscape of CpG Methylation of Individual Repetitive Elements

Mapping Intimacies ◽

10.1101/018531 ◽

2015 ◽

Cited By ~ 1

Author(s):

Yuta Suzuki ◽

Jonas Korlach ◽

Stephen W. Turner ◽

Tatsuya Tsukahara ◽

Junko Taniguchi ◽

...

Keyword(s):

Human Genome ◽

Single Molecule ◽

Methylation Status ◽

Repetitive Elements ◽

Read Length ◽

Cpg Sites ◽

Kinetic Information ◽

Copy Numbers ◽

Long Read ◽

Second Generation Sequencing

Determining the methylation state of regions with high copy numbers is challenging for second-generation sequencing, because the read length is insufficient to map reads uniquely, especially when repetitive regions are long and nearly identical to each other. Single-molecule real-time (SMRT) sequencing is a promising method for observing such regions, because it is not vulnerable to GC bias, it performs long read lengths, and its kinetic information is sensitive to DNA modifications. We propose a novel algorithm that combines the kinetic information for neighboring CpG sites and increases the confidence in identifying the methylation states of those sites. Both the sensitivity and precision of our algorithm were ∼93.7% on CpG site basis for the genome of an inbred medaka (Oryzias latipes) strain within a practical read coverage of ∼30-fold. The method is quantitatively accurate because we observed a high correlation coefficient (R = 0.884) between our method and bisulfite sequencing, and 92.0% of CpG sites were in concordance within 0.25. Using this method, we characterized the landscape of the methylation status of repetitive elements, such as LINEs, in the human genome, thereby revealing the strong correlation between CpG density and unmethylation and detecting unmethylation hot spots of LTRs and LINEs. We could uncover the methylation states for nearly identical active transposons, two novel LINE insertions of identity ∼99% and length 6050 base pairs (bp) in the human genome, and sixteen Tol2 elements of identity >99.8% and length 4682 bp in the medaka genome.

Download Full-text

Time-Course Transcriptome Profiling of a Poxvirus Using Long-Read Full-Length Assay

Pathogens ◽

10.3390/pathogens10080919 ◽

2021 ◽

Vol 10 (8) ◽

pp. 919

Author(s):

Dóra Tombácz ◽

István Prazsák ◽

Gábor Torma ◽

Zsolt Csabai ◽

Zsolt Balázs ◽

...

Keyword(s):

Single Molecule ◽

Time Course ◽

Transcriptome Profiling ◽

Time Lapse ◽

Full Length ◽

Read Length ◽

Transcript Isoforms ◽

Long Read ◽

Second Generation Sequencing ◽

Transcriptional Start Sites

Viral transcriptomes that are determined using first- and second-generation sequencing techniques are incomplete. Due to the short read length, these methods are inefficient or fail to distinguish between transcript isoforms, polycistronic RNAs, and transcriptional overlaps and readthroughs. Additionally, these approaches are insensitive for the identification of splice and transcriptional start sites (TSSs) and, in most cases, transcriptional end sites (TESs), especially in transcript isoforms with varying transcript ends, and in multi-spliced transcripts. Long-read sequencing is able to read full-length nucleic acids and can therefore be used to assemble complete transcriptome atlases. Although vaccinia virus (VACV) does not produce spliced RNAs, its transcriptome has a high diversity of TSSs and TESs, and a high degree of polycistronism that leads to enormous complexity. We applied single-molecule, real-time, and nanopore-based sequencing methods to investigate the time-lapse transcriptome patterns of VACV gene expression.

Download Full-text

Cas9-Assisted Targeting of CHromosome segments (CATCH) for targeted nanopore sequencing and optical genome mapping

10.1101/110163 ◽

2017 ◽

Cited By ~ 5

Author(s):

Tslil Gabrieli ◽

Hila Sharim ◽

Yael Michaeli ◽

Yuval Ebenstein

Keyword(s):

Single Molecule ◽

Genome Mapping ◽

Single Point ◽

Read Length ◽

Whole Genome ◽

Sequencing Analysis ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Whole Genome Analysis ◽

Long Read

ABSTRACTVariations in the genetic code, from single point mutations to large structural or copy number alterations, influence susceptibility, onset, and progression of genetic diseases and tumor transformation. Next-generation sequencing analysis is unable to reliably capture aberrations larger than the typical sequencing read length of several hundred bases. Long-read, single-molecule sequencing methods such as SMRT and nanopore sequencing can address larger variations, but require costly whole genome analysis. Here we describe a method for isolation and enrichment of a large genomic region of interest for targeted analysis based on Cas9 excision of two sites flanking the target region and isolation of the excised DNA segment by pulsed field gel electrophoresis. The isolated target remains intact and is ideally suited for optical genome mapping and long-read sequencing at high coverage. In addition, analysis is performed directly on native genomic DNA that retains genetic and epigenetic composition without amplification bias. This method enables detection of mutations and structural variants as well as detailed analysis by generation of hybrid scaffolds composed of optical maps and sequencing data at a fraction of the cost of whole genome sequencing.

Download Full-text

Detection of structural variations and fusion genes in breast cancer samples using third-generation sequencing

10.21203/rs.3.rs-953712/v2 ◽

2021 ◽

Author(s):

Taobo Hu ◽

Jingjing Li ◽

Mengping Long ◽

Jinbo Wu ◽

Zhen Zhang ◽

...

Keyword(s):

Breast Cancer ◽

Cancer Patients ◽

Genetic Alterations ◽

Read Length ◽

Breast Cancer Patients ◽

Structural Variations ◽

Transcriptomic Sequencing ◽

Long Read ◽

Second Generation Sequencing ◽

Generation Sequencing

Abstract Background: Structural variations (SVs) are common genetic alterations in the human genome that could cause different phenotypes and various diseases including cancer. However, the detection of structural variations using the second-generation sequencing was limited by its short read-length which in turn restrained our understanding of structural variations. Methods: In this study, we developed a 28-gene panel for long-read sequencing and employed it to both Oxford Nanopore Technologies and Pacific Biosciences platforms. We analyzed structural variations in the 28 breast cancer-related genes through long-read genomic and transcriptomic sequencing of tumor, para-tumor and blood samples in 19 breast cancer patients. Results: Our results showed that some somatic SVs were recurring among the selected genes, though the majority of them occurred in the non-exonic region. We found evidence supporting the existence of hotspot regions for SVs, which extended our previous understanding that they exist only for single nucleotide variations. Conclusions: In conclusion, we employed long-read genomic and transcriptomic sequencing in identifying SVs from breast cancer patients and proved that this approach holds great potential in clinical application.

Download Full-text

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads

10.1101/635037 ◽

2019 ◽

Cited By ~ 7

Author(s):

Mitchell R. Vollger ◽

Glennis A. Logsdon ◽

Peter A. Audano ◽

Arvis Sulovari ◽

David Porubsky ◽

...

Keyword(s):

Human Genome ◽

Single Molecule ◽

Tandem Repeats ◽

De Novo ◽

Sequence Data ◽

Gene Annotation ◽

Hydatidiform Mole ◽

High Fidelity ◽

Human Genomes ◽

Long Read

AbstractThe sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective stand-alone technology for de novo assembly of human genomes.

Download Full-text

Improving the diversity of captured full-length isoforms using a normalized single-molecule RNA-sequencing method

Communications Biology ◽

10.1038/s42003-020-01125-7 ◽

2020 ◽

Vol 3 (1) ◽

Author(s):

Yueming Hu ◽

Xing-Sheng Shu ◽

Jiaxian Yu ◽

Ming-an Sun ◽

Zewei Chen ◽

...

Keyword(s):

Rna Sequencing ◽

Single Molecule ◽

Signet Ring Cell ◽

Full Length ◽

Read Length ◽

Accurate Identification ◽

Human Genes ◽

Sequencing Method ◽

Long Read ◽

Gene Expression Studies

AbstractHuman genes form a large variety of isoforms after transcription, encoding distinct transcripts to exert different functions. Single-molecule RNA sequencing facilitates accurate identification of the isoforms by extending nucleotide read length significantly. However, the gene or isoform diversity is lowly represented by the mRNA molecules captured by single-molecule RNA sequencing. Here, we show that a cDNA normalization procedure before the library preparation for PacBio RS II sequencing captures 3.2–6.0 fold more full-length high-quality isoform species for different human samples, as compared to the non-normalized capture procedure. Many lowly expressed, functionally important isoforms can be detected. In addition, normalized PacBio RNA sequencing also resolves more allele-specific haplotype transcripts. Finally, we apply the cDNA normalization based long-read RNA sequencing method to profile the transcriptome of human gastric signet-ring cell carcinomas, identify new cancer-specific transcriptome signatures, and thus, bring out the utility of the improved protocols in gene expression studies.

Download Full-text

Genome-wide epigenetic profiling of 5-hydroxymethylcytosine by long-read optical mapping

10.1101/260166 ◽

2018 ◽

Cited By ~ 1

Author(s):

Tslil Gabrieli ◽

Hila Sharim ◽

Gil Nifker ◽

Jonathan Jeffet ◽

Tamar Shahal ◽

...

Keyword(s):

Long Range ◽

Single Molecule ◽

Human Peripheral Blood ◽

Read Length ◽

Epigenetic Mark ◽

Sequencing Data ◽

Chromosomal Dna ◽

Genome Wide ◽

Long Read ◽

Genomic Regions

AbstractThe epigenetic mark 5-hydroxymethylcytosine (5-hmC) is a distinct product of active enzymatic demethylation that is linked to gene regulation, development and disease. Genome-wide 5-hmC profiles generated by short-read next-generation sequencing are limited in providing long-range epigenetic information relevant to highly variable genomic regions, such as the 3.7 Mbp disease-related Human Leukocyte Antigen (HLA) region. We present a long-read, single-molecule mapping technology that generates hybrid genetic/epigenetic profiles of native chromosomal DNA. The genome-wide distribution of 5- hmC in human peripheral blood cells correlates well with 5-hmC DNA immunoprecipitation (hMeDIP) sequencing. However, the long read length of 100 kbp-1Mbp produces 5-hmC profiles across variable genomic regions that failed to showup in the sequencing data. In addition, optical 5-hmC mapping shows strong correlation between the 5-hmC density in gene bodies and the corresponding level of gene expression. The single molecule concept provides information on the distribution and coexistence of 5-hmC signals at multiple genomic loci on the same genomic DNA molecule, revealing long-range correlations and cell-to-cell epigenetic variation.

Download Full-text

High-resolution single-molecule long-fragment rRNA gene amplicon sequencing for uncultured bacterial and fungal communities

10.1101/2021.03.29.437457 ◽

2021 ◽

Author(s):

Chao Fang ◽

Xiaohuan Sun ◽

Fei Fan ◽

Xiaowei Zhang ◽

Ou Wang ◽

...

Keyword(s):

Single Molecule ◽

Second Generation ◽

Soil Samples ◽

Full Length ◽

Fungal Communities ◽

Read Length ◽

Rrna Genes ◽

Rrna Gene ◽

Second Generation Sequencing ◽

Generation Sequencing

Although several large-scale environmental microbial projects have been initiated in the past two decades, understanding of the role of complex microbiotas is still constrained by problems of detecting and identifying unknown microorganisms1-6. Currently, hypervariable regions of rRNA genes as well as internal transcribed spacer regions are broadly used to identify bacteria and fungi within complex communities7, 8, but taxonomic and phylogenetic resolution is hampered by insufficient sequencing length9-11. Direct sequencing of full length rRNA genes is currently limited by read length using second generation sequencing or sacrificed quality and throughput by using single molecule sequencing. We developed a novel method to sequence and assemble nearly full length rRNA genes using second generation sequencing. Benchmarking was performed on mock bacterial and fungal communities as well as two forest soil samples. The majority of rRNA gene sequences of all species in the mock community samples were successfully recovered with identities above 99.5% compared to the reference sequences. For soil samples we obtained exquisite coverage with identification of a large number of putative new species, as well as high abundance correlation between replicates. This approach provides a cost-effective method for obtaining extensive and accurate information on complex environmental microbial communities.

Download Full-text

Comparative Transcriptome Profiling of Disruptive Technology, Single- Molecule Direct RNA Sequencing

Current Bioinformatics ◽

10.2174/1574893614666191017154427 ◽

2020 ◽

Vol 15 (2) ◽

pp. 165-172

Author(s):

Chaithra Pradeep ◽

Dharam Nandan ◽

Arya A. Das ◽

Dinesh Velayutham

Keyword(s):

Rna Sequencing ◽

Single Molecule ◽

Transcriptome Assembly ◽

Transcriptome Profiling ◽

Read Length ◽

Complex Nature ◽

Disruptive Technology ◽

Sequencing Technology ◽

Sequencing Technologies ◽

Long Read

Background: The standard approach for transcriptomic profiling involves high throughput short-read sequencing technology, mainly dominated by Illumina. However, the short reads have limitations in transcriptome assembly and in obtaining full-length transcripts due to the complex nature of transcriptomes with variable length and multiple alternative spliced isoforms. Recent advances in long read sequencing by the Oxford Nanopore Technologies (ONT) offered both cDNA as well as direct RNA sequencing and has brought a paradigm change in the sequencing technology to greatly improve the assembly and expression estimates. ONT enables molecules to be sequenced without fragmentation resulting in ultra-long read length enabling the entire genes and transcripts to be fully characterized. The direct RNA sequencing method, in addition, circumvents the reverse transcription and amplification steps. Objective: In this study, RNA sequencing methods were assessed by comparing data from Illumina (ILM), ONT cDNA (OCD) and ONT direct RNA (ODR). Methods: The sensitivity & specificity of the isoform detection was determined from the data generated by Illumina, ONT cDNA and ONT direct RNA sequencing technologies using Saccharomyces cerevisiae as model. Comparative studies were conducted with two pipelines to detect the isoforms, novel genes and variable gene length. Results: Mapping metrics and qualitative profiles for different pipelines are presented to understand these disruptive technologies. The variability in sequencing technology and the analysis pipeline were studied.

Download Full-text

Human Genome Assembly in 100 Minutes

10.1101/705616 ◽

2019 ◽

Cited By ~ 20

Author(s):

Chen-Shan Chin ◽

Asif Khalak

Keyword(s):

Single Molecule ◽

Dna Sequences ◽

Genome Assembly ◽

De Novo ◽

Critical Factor ◽

Read Length ◽

De Novo Genome Assembly ◽

Small Indels ◽

Sequencing Technologies ◽

Long Read

AbstractDe novo genome assembly provides comprehensive, unbiased genomic information and makes it possible to gain insight into new DNA sequences not present in reference genomes. Many de novo human genomes have been published in the last few years, leveraging a combination of inexpensive short-read and single-molecule long-read technologies. As long-read DNA sequencers become more prevalent, the computational burden of generating assemblies persists as a critical factor. The most common approach to long-read assembly, using an overlap-layout-consensus (OLC) paradigm, requires all-to-all read comparisons, which quadratically scales in computational complexity with the number of reads. We assert that recently achievements in sequencing technology (i.e. with accuracy ~99% and read length ~10-15k) enables a fundamentally better strategy for OLC that is effectively linear rather than quadratic. Our genome assembly implementation, Peregrine uses sparse hierarchical minimizers (SHIMMER) to index reads thereby avoiding the need for an all-to-all read comparison step. Peregrine can assemble 30x human PacBio CCS read datasets in less than 30 CPU hours and around 100 wall-clock minutes to a high contiguity assembly (N50 > 20Mb). The continued advance of sequencing technologies coupled with the Peregrine assembler enables routine generation of human de novo assemblies. This will allow for population scale measurements of more comprehensive genomic variations -- beyond SNPs and small indels -- as well as novel applications requiring rapid access to de novo assemblies.

Download Full-text

New technology uses nano-pores to deliver ultra-long read length single molecule sequence data

Membrane Technology ◽

10.1016/s0958-2118(12)70087-3 ◽

2012 ◽

Vol 2012 (4) ◽

pp. 16

Keyword(s):

Single Molecule ◽

Sequence Data ◽

New Technology ◽

Read Length ◽

Long Read

Download Full-text