scholarly journals Measuring the Landscape of CpG Methylation of Individual Repetitive Elements

2015 ◽  
Author(s):  
Yuta Suzuki ◽  
Jonas Korlach ◽  
Stephen W. Turner ◽  
Tatsuya Tsukahara ◽  
Junko Taniguchi ◽  
...  

Determining the methylation state of regions with high copy numbers is challenging for second-generation sequencing, because the read length is insufficient to map reads uniquely, especially when repetitive regions are long and nearly identical to each other. Single-molecule real-time (SMRT) sequencing is a promising method for observing such regions, because it is not vulnerable to GC bias, it performs long read lengths, and its kinetic information is sensitive to DNA modifications. We propose a novel algorithm that combines the kinetic information for neighboring CpG sites and increases the confidence in identifying the methylation states of those sites. Both the sensitivity and precision of our algorithm were ∼93.7% on CpG site basis for the genome of an inbred medaka (Oryzias latipes) strain within a practical read coverage of ∼30-fold. The method is quantitatively accurate because we observed a high correlation coefficient (R = 0.884) between our method and bisulfite sequencing, and 92.0% of CpG sites were in concordance within 0.25. Using this method, we characterized the landscape of the methylation status of repetitive elements, such as LINEs, in the human genome, thereby revealing the strong correlation between CpG density and unmethylation and detecting unmethylation hot spots of LTRs and LINEs. We could uncover the methylation states for nearly identical active transposons, two novel LINE insertions of identity ∼99% and length 6050 base pairs (bp) in the human genome, and sixteen Tol2 elements of identity >99.8% and length 4682 bp in the medaka genome.

Pathogens ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 919
Author(s):  
Dóra Tombácz ◽  
István Prazsák ◽  
Gábor Torma ◽  
Zsolt Csabai ◽  
Zsolt Balázs ◽  
...  

Viral transcriptomes that are determined using first- and second-generation sequencing techniques are incomplete. Due to the short read length, these methods are inefficient or fail to distinguish between transcript isoforms, polycistronic RNAs, and transcriptional overlaps and readthroughs. Additionally, these approaches are insensitive for the identification of splice and transcriptional start sites (TSSs) and, in most cases, transcriptional end sites (TESs), especially in transcript isoforms with varying transcript ends, and in multi-spliced transcripts. Long-read sequencing is able to read full-length nucleic acids and can therefore be used to assemble complete transcriptome atlases. Although vaccinia virus (VACV) does not produce spliced RNAs, its transcriptome has a high diversity of TSSs and TESs, and a high degree of polycistronism that leads to enormous complexity. We applied single-molecule, real-time, and nanopore-based sequencing methods to investigate the time-lapse transcriptome patterns of VACV gene expression.


2017 ◽  
Author(s):  
Tslil Gabrieli ◽  
Hila Sharim ◽  
Yael Michaeli ◽  
Yuval Ebenstein

ABSTRACTVariations in the genetic code, from single point mutations to large structural or copy number alterations, influence susceptibility, onset, and progression of genetic diseases and tumor transformation. Next-generation sequencing analysis is unable to reliably capture aberrations larger than the typical sequencing read length of several hundred bases. Long-read, single-molecule sequencing methods such as SMRT and nanopore sequencing can address larger variations, but require costly whole genome analysis. Here we describe a method for isolation and enrichment of a large genomic region of interest for targeted analysis based on Cas9 excision of two sites flanking the target region and isolation of the excised DNA segment by pulsed field gel electrophoresis. The isolated target remains intact and is ideally suited for optical genome mapping and long-read sequencing at high coverage. In addition, analysis is performed directly on native genomic DNA that retains genetic and epigenetic composition without amplification bias. This method enables detection of mutations and structural variants as well as detailed analysis by generation of hybrid scaffolds composed of optical maps and sequencing data at a fraction of the cost of whole genome sequencing.


2021 ◽  
Author(s):  
Taobo Hu ◽  
Jingjing Li ◽  
Mengping Long ◽  
Jinbo Wu ◽  
Zhen Zhang ◽  
...  

Abstract Background: Structural variations (SVs) are common genetic alterations in the human genome that could cause different phenotypes and various diseases including cancer. However, the detection of structural variations using the second-generation sequencing was limited by its short read-length which in turn restrained our understanding of structural variations. Methods: In this study, we developed a 28-gene panel for long-read sequencing and employed it to both Oxford Nanopore Technologies and Pacific Biosciences platforms. We analyzed structural variations in the 28 breast cancer-related genes through long-read genomic and transcriptomic sequencing of tumor, para-tumor and blood samples in 19 breast cancer patients. Results: Our results showed that some somatic SVs were recurring among the selected genes, though the majority of them occurred in the non-exonic region. We found evidence supporting the existence of hotspot regions for SVs, which extended our previous understanding that they exist only for single nucleotide variations. Conclusions: In conclusion, we employed long-read genomic and transcriptomic sequencing in identifying SVs from breast cancer patients and proved that this approach holds great potential in clinical application.


2019 ◽  
Author(s):  
Mitchell R. Vollger ◽  
Glennis A. Logsdon ◽  
Peter A. Audano ◽  
Arvis Sulovari ◽  
David Porubsky ◽  
...  

AbstractThe sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective stand-alone technology for de novo assembly of human genomes.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Yueming Hu ◽  
Xing-Sheng Shu ◽  
Jiaxian Yu ◽  
Ming-an Sun ◽  
Zewei Chen ◽  
...  

AbstractHuman genes form a large variety of isoforms after transcription, encoding distinct transcripts to exert different functions. Single-molecule RNA sequencing facilitates accurate identification of the isoforms by extending nucleotide read length significantly. However, the gene or isoform diversity is lowly represented by the mRNA molecules captured by single-molecule RNA sequencing. Here, we show that a cDNA normalization procedure before the library preparation for PacBio RS II sequencing captures 3.2–6.0 fold more full-length high-quality isoform species for different human samples, as compared to the non-normalized capture procedure. Many lowly expressed, functionally important isoforms can be detected. In addition, normalized PacBio RNA sequencing also resolves more allele-specific haplotype transcripts. Finally, we apply the cDNA normalization based long-read RNA sequencing method to profile the transcriptome of human gastric signet-ring cell carcinomas, identify new cancer-specific transcriptome signatures, and thus, bring out the utility of the improved protocols in gene expression studies.


2018 ◽  
Author(s):  
Tslil Gabrieli ◽  
Hila Sharim ◽  
Gil Nifker ◽  
Jonathan Jeffet ◽  
Tamar Shahal ◽  
...  

AbstractThe epigenetic mark 5-hydroxymethylcytosine (5-hmC) is a distinct product of active enzymatic demethylation that is linked to gene regulation, development and disease. Genome-wide 5-hmC profiles generated by short-read next-generation sequencing are limited in providing long-range epigenetic information relevant to highly variable genomic regions, such as the 3.7 Mbp disease-related Human Leukocyte Antigen (HLA) region. We present a long-read, single-molecule mapping technology that generates hybrid genetic/epigenetic profiles of native chromosomal DNA. The genome-wide distribution of 5- hmC in human peripheral blood cells correlates well with 5-hmC DNA immunoprecipitation (hMeDIP) sequencing. However, the long read length of 100 kbp-1Mbp produces 5-hmC profiles across variable genomic regions that failed to showup in the sequencing data. In addition, optical 5-hmC mapping shows strong correlation between the 5-hmC density in gene bodies and the corresponding level of gene expression. The single molecule concept provides information on the distribution and coexistence of 5-hmC signals at multiple genomic loci on the same genomic DNA molecule, revealing long-range correlations and cell-to-cell epigenetic variation.


2021 ◽  
Author(s):  
Chao Fang ◽  
Xiaohuan Sun ◽  
Fei Fan ◽  
Xiaowei Zhang ◽  
Ou Wang ◽  
...  

Although several large-scale environmental microbial projects have been initiated in the past two decades, understanding of the role of complex microbiotas is still constrained by problems of detecting and identifying unknown microorganisms1-6. Currently, hypervariable regions of rRNA genes as well as internal transcribed spacer regions are broadly used to identify bacteria and fungi within complex communities7, 8, but taxonomic and phylogenetic resolution is hampered by insufficient sequencing length9-11. Direct sequencing of full length rRNA genes is currently limited by read length using second generation sequencing or sacrificed quality and throughput by using single molecule sequencing. We developed a novel method to sequence and assemble nearly full length rRNA genes using second generation sequencing. Benchmarking was performed on mock bacterial and fungal communities as well as two forest soil samples. The majority of rRNA gene sequences of all species in the mock community samples were successfully recovered with identities above 99.5% compared to the reference sequences. For soil samples we obtained exquisite coverage with identification of a large number of putative new species, as well as high abundance correlation between replicates. This approach provides a cost-effective method for obtaining extensive and accurate information on complex environmental microbial communities.


2020 ◽  
Vol 15 (2) ◽  
pp. 165-172
Author(s):  
Chaithra Pradeep ◽  
Dharam Nandan ◽  
Arya A. Das ◽  
Dinesh Velayutham

Background: The standard approach for transcriptomic profiling involves high throughput short-read sequencing technology, mainly dominated by Illumina. However, the short reads have limitations in transcriptome assembly and in obtaining full-length transcripts due to the complex nature of transcriptomes with variable length and multiple alternative spliced isoforms. Recent advances in long read sequencing by the Oxford Nanopore Technologies (ONT) offered both cDNA as well as direct RNA sequencing and has brought a paradigm change in the sequencing technology to greatly improve the assembly and expression estimates. ONT enables molecules to be sequenced without fragmentation resulting in ultra-long read length enabling the entire genes and transcripts to be fully characterized. The direct RNA sequencing method, in addition, circumvents the reverse transcription and amplification steps. Objective: In this study, RNA sequencing methods were assessed by comparing data from Illumina (ILM), ONT cDNA (OCD) and ONT direct RNA (ODR). Methods: The sensitivity & specificity of the isoform detection was determined from the data generated by Illumina, ONT cDNA and ONT direct RNA sequencing technologies using Saccharomyces cerevisiae as model. Comparative studies were conducted with two pipelines to detect the isoforms, novel genes and variable gene length. Results: Mapping metrics and qualitative profiles for different pipelines are presented to understand these disruptive technologies. The variability in sequencing technology and the analysis pipeline were studied.


2019 ◽  
Author(s):  
Chen-Shan Chin ◽  
Asif Khalak

AbstractDe novo genome assembly provides comprehensive, unbiased genomic information and makes it possible to gain insight into new DNA sequences not present in reference genomes. Many de novo human genomes have been published in the last few years, leveraging a combination of inexpensive short-read and single-molecule long-read technologies. As long-read DNA sequencers become more prevalent, the computational burden of generating assemblies persists as a critical factor. The most common approach to long-read assembly, using an overlap-layout-consensus (OLC) paradigm, requires all-to-all read comparisons, which quadratically scales in computational complexity with the number of reads. We assert that recently achievements in sequencing technology (i.e. with accuracy ~99% and read length ~10-15k) enables a fundamentally better strategy for OLC that is effectively linear rather than quadratic. Our genome assembly implementation, Peregrine uses sparse hierarchical minimizers (SHIMMER) to index reads thereby avoiding the need for an all-to-all read comparison step. Peregrine can assemble 30x human PacBio CCS read datasets in less than 30 CPU hours and around 100 wall-clock minutes to a high contiguity assembly (N50 > 20Mb). The continued advance of sequencing technologies coupled with the Peregrine assembler enables routine generation of human de novo assemblies. This will allow for population scale measurements of more comprehensive genomic variations -- beyond SNPs and small indels -- as well as novel applications requiring rapid access to de novo assemblies.


Sign in / Sign up

Export Citation Format

Share Document