Erratum: Corrigendum: Long-read sequencing of the human cytomegalovirus transcriptome with the pacific biosciences RSII platform

Zsolt Balázs; Dóra Tombácz; Attila Szűcs; Michael Snyder; Zsolt Boldogkői

doi:10.1038/sdata.2018.32

Long-read sequencing of the human cytomegalovirus transcriptome with the Pacific Biosciences RSII platform

Scientific Data ◽

10.1038/sdata.2017.194 ◽

2017 ◽

Vol 4 (1) ◽

Cited By ~ 13

Author(s):

Zsolt Balázs ◽

Dóra Tombácz ◽

Attila Szűcs ◽

Michael Snyder ◽

Zsolt Boldogkői

Keyword(s):

Rna Sequencing ◽

Human Cytomegalovirus ◽

Preparation Methods ◽

Human Lung Fibroblast ◽

Rna Molecules ◽

Pacific Biosciences ◽

The Pacific ◽

Sequencing Studies ◽

Long Read ◽

Indispensable Tool

Abstract Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.

Download Full-text

High-coverage, long-read sequencing of Han Chinese trio reference samples

10.1101/562611 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ying-Chih Wang ◽

Nathan D Olson ◽

Gintaras Deikus ◽

Hardik Shah ◽

Aaron M Wenger ◽

...

Keyword(s):

Single Molecule ◽

Han Chinese ◽

High Coverage ◽

Link Type ◽

Pacific Biosciences ◽

The Pacific ◽

Chinese Descent ◽

Long Read ◽

Reference Samples

AbstractSingle-molecule long-read sequencing datasets were generated for a son-father-mother trio of Han Chinese descent that is part of the Genome In a Bottle (GIAB) consortium portfolio. The dataset was generated using the Pacific Biosciences Sequel System. The son and each parent were sequenced to an average coverage of 60 and 30, respectively, with N50 subread lengths between 16 and 18 kb. Raw reads and reads aligned to both the GRCh37 and GRCh38 are available at the NCBI GIAB ftp site (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ChineseTrio/) and the raw read data is archived in NCBI SRA (SRX4739017, SRX4739121, and SRX4739122). This dataset is available for anyone to develop and evaluate long-read bioinformatics methods.

Download Full-text

Genome sequences of human cytomegalovirus strain TB40/E variants propagated in fibroblasts and epithelial cells

Virology Journal ◽

10.1186/s12985-021-01583-3 ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Ahmed Al Qaffas ◽

Salvatore Camiolo ◽

Mai Vo ◽

Alexis Aguiar ◽

Amine Ourahmane ◽

...

Keyword(s):

Epithelial Cells ◽

Human Cytomegalovirus ◽

Viral Entry ◽

Sequence Data ◽

Laboratory Strain ◽

Serial Passage ◽

Wild Type Virus ◽

Protein Coding ◽

Genetic Changes ◽

Long Read

AbstractThe advent of whole genome sequencing has revealed that common laboratory strains of human cytomegalovirus (HCMV) have major genetic deficiencies resulting from serial passage in fibroblasts. In particular, tropism for epithelial and endothelial cells is lost due to mutations disrupting genes UL128, UL130, or UL131A, which encode subunits of a virion-associated pentameric complex (PC) important for viral entry into these cells but not for entry into fibroblasts. The endothelial cell-adapted strain TB40/E has a relatively intact genome and has emerged as a laboratory strain that closely resembles wild-type virus. However, several heterogeneous TB40/E stocks and cloned variants exist that display a range of sequence and tropism properties. Here, we report the use of PacBio sequencing to elucidate the genetic changes that occurred, both at the consensus level and within subpopulations, upon passaging a TB40/E stock on ARPE-19 epithelial cells. The long-read data also facilitated examination of the linkage between mutations. Consistent with inefficient ARPE-19 cell entry, at least 83% of viral genomes present before adaptation contained changes impacting PC subunits. In contrast, and consistent with the importance of the PC for entry into endothelial and epithelial cells, genomes after adaptation lacked these or additional mutations impacting PC subunits. The sequence data also revealed six single noncoding substitutions in the inverted repeat regions, single nonsynonymous substitutions in genes UL26, UL69, US28, and UL122, and a frameshift truncating gene UL141. Among the changes affecting protein-coding regions, only the one in UL122 was strongly selected. This change, resulting in a D390H substitution in the encoded protein IE2, has been previously implicated in rendering another viral protein, UL84, essential for viral replication in fibroblasts. This finding suggests that IE2, and perhaps its interactions with UL84, have important functions unique to HCMV replication in epithelial cells.

Download Full-text

Dual Isoform Sequencing Reveals a Multifaceted Transcriptional Architecture of a Prototype Baculovirus

10.21203/rs.3.rs-637036/v1 ◽

2021 ◽

Author(s):

Gábor Torma ◽

Dóra Tombácz ◽

Norbert Moldován ◽

Ádám Fülöp ◽

István Prazsák ◽

...

Keyword(s):

Protein Coding ◽

Rna Molecules ◽

Non Coding Rna ◽

Oxford Nanopore ◽

The Pacific ◽

Viral Genes ◽

Long Read ◽

Oxford Nanopore Technologies ◽

Overlapping Transcripts

Abstract In this study, we used two long-read sequencing (LRS) techniques, Sequel from the Pacific Biosciences and MinION from Oxford Nanopore Technologies, for the transcriptional characterization of a prototype baculovirus, Autographacalifornica multiple nucleopolyhedrovirus. LRS is able to read full-length RNA molecules, and thereby to distinguish between transcript isoforms, mono- and polycistronic RNAs, and overlapping transcripts. Altogether, we detected 875 transcripts, of which 759 are novel and 116 have been annotated previously. These RNA molecules include 41 novel putative protein coding transcript (each containing 5’-truncated in-frame ORFs), 14 monocistronic transcripts, 99 multicistronic RNAs, 101 non-coding RNA, and 504 length isoforms. We also detected RNA methylation in 12 viral genes and RNA hyper-editing in the longer 5’-UTR transcript isoform of ORF 19 gene.

Download Full-text

Direct sequencing of small genomes on the Pacific Biosciences RS without library preparation

BioTechniques ◽

10.2144/000113962 ◽

2012 ◽

Vol 53 (6) ◽

Cited By ~ 33

Author(s):

Paul Coupland ◽

Tamir Chandra ◽

Mike Quail ◽

Wolf Reik ◽

Harold Swerdlow

Keyword(s):

Direct Sequencing ◽

Library Preparation ◽

Pacific Biosciences ◽

The Pacific

Download Full-text

High-Quality Whole-Genome Sequence of an Estradiol-Degrading Strain, Novosphingobium tardaugens NBRC 16725

Microbiology Resource Announcements ◽

10.1128/mra.01715-18 ◽

2019 ◽

Vol 8 (11) ◽

Cited By ~ 3

Author(s):

J. Ibero ◽

D. Sanz ◽

B. Galán ◽

E. Díaz ◽

J. L. García

Keyword(s):

Genome Sequence ◽

Complete Sequence ◽

Whole Genome Sequence ◽

Whole Genome ◽

High Quality ◽

Content Type ◽

Pacific Biosciences ◽

Degrading Bacterium ◽

The Pacific

In this work we report the complete sequence and assembly of the estradiol-degrading bacterium Novosphingobium tardaugens NBRC 16725 genome into a single contig using the Pacific Biosciences RS II system.

Download Full-text

Microsatellite marker discovery using single molecule real-time circular consensus sequencing on the Pacific Biosciences RS

BioTechniques ◽

10.2144/000114104 ◽

2013 ◽

Vol 55 (5) ◽

Cited By ~ 15

Author(s):

Markus A. Grohme ◽

Roberto Frias Soler ◽

Michael Wink ◽

Marcus Frohme

Keyword(s):

Real Time ◽

Microsatellite Marker ◽

Single Molecule ◽

Pacific Biosciences ◽

The Pacific ◽

Marker Discovery ◽

Circular Consensus Sequencing

Download Full-text

Exploring DNA structures in real-time polymerase kinetics using Pacific Biosciences sequencer data

10.1101/001024 ◽

2013 ◽

Author(s):

Sterling Sawaya ◽

James Boocock ◽

Mik Black ◽

Neil Gemmell

Keyword(s):

Real Time ◽

Dna Sequences ◽

Double Helix ◽

Dna Structure ◽

R Package ◽

Dna Structures ◽

Pacific Biosciences ◽

Tandem Repeat Sequences ◽

The Pacific ◽

Polymerase Pausing

Pausing of DNA polymerase can indicate the presence of a DNA structure that differs from the canonical double-helix. Here we detail a method to investigate how polymerase pausing in the Pacific Biosciences sequencer reads can be related to DNA structure. The Pacific Biosciences sequencer uses optics to view a polymerase and its interaction with a single DNA molecule in real-time, offering a unique way to detect potential alternative DNA structures. We have developed a new way to examine polymerase kinetics and relate it to the DNA sequence by using a wavelet transform of read information from the sequencer. We use this method to examine how polymerase kinetics are related to nucleotide base composition. We then examine tandem repeat sequences known for their ability to form different DNA structures: (CGG)n and (CG)n repeats which can, respectively, form G-quadruplex DNA and Z-DNA. We find pausing around the (CGG)n repeat that may indicate the presence of G-quadruplexes in some of the sequencer reads. The (CG)n repeat does not appear to cause polymerase pausing, but its kinetics signature nevertheless suggests the possibility that alternative nucleotide conformations may sometimes be present. We discuss the implications of using our method to discover DNA sequences capable of forming alternative structures. The analyses presented here can be reproduced on any Pacific Biosciences kinetics data for any DNA pattern of interest using an R package that we have made publicly available.

Download Full-text

Preparation of next-generation DNA sequencing libraries from ultra-low amounts of input DNA: Application to single-molecule, real-time (SMRT) sequencing on the Pacific Biosciences RS II.

10.1101/003566 ◽

2014 ◽

Cited By ~ 4

Author(s):

Castle Raley ◽

David Munroe ◽

Kristie Jones ◽

Yu-Chih Tsai ◽

Yan Guo ◽

...

Keyword(s):

Dna Sequencing ◽

Single Molecule ◽

Smrt Sequencing ◽

High Input ◽

Single Molecule Sequencing ◽

Pacific Biosciences ◽

Preparation Conditions ◽

Pacbio Rs Ii ◽

The Pacific ◽

The Common

We have developed and validated an amplification-free method for generating DNA sequencing libraries from very low amounts of input DNA (500 picograms - 20 nanograms) for single-molecule sequencing on the Pacific Biosciences (PacBio) RS II sequencer. The common challenge of high input requirements for single-molecule sequencing is overcome by using a carrier DNA in conjunction with optimized sequencing preparation conditions and re-use of the MagBead-bound complex. Here we describe how this method can be used to produce sequencing yields comparable to those generated from standard input amounts, but by using 1000-fold less starting material.

Download Full-text

Integrative profiling of Epstein–Barr virus transcriptome using a multiplatform approach

Virology Journal ◽

10.1186/s12985-021-01734-6 ◽

2022 ◽

Vol 19 (1) ◽

Author(s):

Ádám Fülöp ◽

Gábor Torma ◽

Norbert Moldován ◽

Kálmán Szenthe ◽

Ferenc Bánáti ◽

...

Keyword(s):

Epstein Barr Virus ◽

Open Reading Frames ◽

Integrative Approach ◽

Splice Isoforms ◽

Sequencing Platform ◽

Barr Virus ◽

Pacific Biosciences ◽

Sequencing Technologies ◽

Long Read ◽

Epstein Barr

Abstract Background Epstein–Barr virus (EBV) is an important human pathogenic gammaherpesvirus with carcinogenic potential. The EBV transcriptome has previously been analyzed using both Illumina-based short read-sequencing and Pacific Biosciences RS II-based long-read sequencing technologies. Since the various sequencing methods have distinct strengths and limitations, the use of multiplatform approaches have proven to be valuable. The aim of this study is to provide a more complete picture on the transcriptomic architecture of EBV. Methods In this work, we apply the Oxford Nanopore Technologies MinION (long-read sequencing) platform for the generation of novel transcriptomic data, and integrate these with other’s data generated by another LRS approach, Pacific BioSciences RSII sequencing and Illumina CAGE-Seq and Poly(A)-Seq approaches. Both amplified and non-amplified cDNA sequencings were applied for the generation of sequencing reads, including both oligo-d(T) and random oligonucleotide-primed reverse transcription. EBV transcripts are identified and annotated using the LoRTIA software suite developed in our laboratory. Results This study detected novel genes embedded into longer host genes containing 5′-truncated in-frame open reading frames, which potentially encode N-terminally truncated proteins. We also detected a number of novel non-coding RNAs and transcript length isoforms encoded by the same genes but differing in their start and/or end sites. This study also reports the discovery of novel splice isoforms, many of which may represent altered coding potential, and of novel replication-origin-associated transcripts. Additionally, novel mono- and multigenic transcripts were identified. An intricate meshwork of transcriptional overlaps was revealed. Conclusions An integrative approach applying multi-technique sequencing technologies is suitable for reliable identification of complex transcriptomes because each techniques has different advantages and limitations, and the they can be used for the validation of the results obtained by a particular approach.

Download Full-text