Determining the Suitability of MinION’s Direct RNA and DNA Amplicon Sequencing for Viral Subtype Identification

Deborah M. Leigh; Christopher Schefer; Carolina Cornejo

doi:10.3390/v12080801

Determining the Suitability of MinION’s Direct RNA and DNA Amplicon Sequencing for Viral Subtype Identification

Viruses ◽

10.3390/v12080801 ◽

2020 ◽

Vol 12 (8) ◽

pp. 801

Author(s):

Deborah M. Leigh ◽

Christopher Schefer ◽

Carolina Cornejo

Keyword(s):

Dna Sequencing ◽

Rna Sequencing ◽

Error Rate ◽

De Novo ◽

Sequence Divergence ◽

Virus Detection ◽

Amplicon Sequencing ◽

Sequencing Error ◽

Consensus Sequences ◽

Subtype Identification

The MinION sequencer is increasingly being used for the detection and outbreak surveillance of pathogens due to its rapid throughput. For RNA viruses, MinION’s new direct RNA sequencing is the next significant development. Direct RNA sequencing studies are currently limited and comparisons of its diagnostic performance relative to different DNA sequencing approaches are lacking as a result. We sought to address this gap and sequenced six subtypes from the mycovirus CHV-1 using MinION’s direct RNA sequencing and DNA sequencing based on a targeted viral amplicon. Reads from both techniques could correctly identify viral presence and species using BLAST, though direct RNA reads were more frequently misassigned to closely related CHV species. De novo consensus sequences were error prone but suitable for viral species identification. However, subtype identification was less accurate from both reads and consensus sequences. This is due to the high sequencing error rate and the limited sequence divergence between some CHV-1 subtypes. Importantly, neither RNA nor amplicon sequencing reads could be used to obtain reliable intra-host variants. Overall, both sequencing techniques were suitable for virus detection, though limitations are present due to the error rate of MinION reads.

Download Full-text

Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system

10.7287/peerj.preprints.778v1 ◽

2015 ◽

Cited By ~ 5

Author(s):

Patrick D Schloss ◽

Sarah L Westcott ◽

Matthew L Jenior ◽

Sarah K Highlander

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Dna Sequencing ◽

Error Rate ◽

Sequencing Error ◽

Rrna Gene ◽

Sequencing Data ◽

Mock Community ◽

Sequencing Platforms ◽

The 16S Rrna Gene

Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality, but short sequences. These platforms have allowed researchers to significantly improve the design of their experiments. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The synthetic mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 2.16% to 0.32%. Unfortunately, this error rate was still 16-times higher than the error rate that has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the longer reads frequently provided better classification, the wider adoption of this approach for 16S rRNA gene sequencing is likely limited by its high sequencing error and low yield of sequencing data relative to the other available platforms.

Download Full-text

De novo Sequencing of Novel Mycoviruses From Fusarium sambucinum: An Attempt on Direct RNA Sequencing of Viral dsRNAs

Frontiers in Microbiology ◽

10.3389/fmicb.2021.641484 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yukiyoshi Mizutani ◽

Kazuma Uesaka ◽

Ayane Ota ◽

Matteo Calassanzio ◽

Claudio Ratti ◽

...

Keyword(s):

Rna Sequencing ◽

Reading Ability ◽

High Throughput Sequencing ◽

De Novo ◽

Genomic Sequence ◽

Homology Search ◽

Fusarium Sambucinum ◽

Consensus Sequences ◽

Molecular Features ◽

Wide Range

An increasing number of viruses are continuously being found in a wide range of organisms, including fungi. Recent studies have revealed a wide viral diversity in microbes and a potential importance of these viruses in the natural environment. Although virus exploration has been accelerated by short-read, high-throughput sequencing (HTS), and viral de novo sequencing is still challenging because of several biological/molecular features such as micro-diversity and secondary structure of RNA genomes. This study conducted de novo sequencing of multiple double-stranded (ds) RNA (dsRNA) elements that were obtained from fungal viruses infecting two Fusarium sambucinum strains, FA1837 and FA2242, using conventional HTS and long-read direct RNA sequencing (DRS). De novo assembly of the read data from both technologies generated near-entire genomic sequence of the viruses, and the sequence homology search and phylogenetic analysis suggested that these represented novel species of the Hypoviridae, Totiviridae, and Mitoviridae families. However, the DRS-based consensus sequences contained numerous indel errors that differed from the HTS consensus sequences, and these errors hampered accurate open reading frame (ORF) prediction. Although with its present performance, the use of DRS is premature to determine viral genome sequences, the DRS-mediated sequencing shows great potential as a user-friendly platform for a one-shot, whole-genome sequencing of RNA viruses due to its long-reading ability and relative structure-tolerant nature.

Download Full-text

353 ASAS-EAAP Talk: Low-coverage whole-genome sequencing in local livestock breeds

Journal of Animal Science ◽

10.1093/jas/skaa278.149 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 81-82

Author(s):

Joaquim Casellas ◽

Melani Martín de Hijas-Villalba ◽

Marta Vázquez-Gómez ◽

Samir Id Lahoucine

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Error Rate ◽

Allele Frequencies ◽

Paternity Testing ◽

Sequencing Error ◽

Whole Genome ◽

Genomic Evaluation ◽

Sequencing Error Rate ◽

Low Coverage

Abstract Current European regulations for autochthonous livestock breeds put a special emphasis on pedigree completeness, which requires laboratory paternity testing by genetic markers in most cases. This entails significant economic expenditure for breed societies and precludes other investments in breeding programs, such as genomic evaluation. Within this context, we developed paternity testing through low-coverage whole-genome data in order to reuse these data for genomic evaluation at no cost. Simulations relied on diploid genomes composed by 30 chromosomes (100 cM each) with 3,000,000 SNP per chromosome. Each population evolved during 1,000 non-overlapping generations with effective size 100, mutation rate 10–4, and recombination by Kosambi’s function. Only those populations with 1,000,000 ± 10% polymorphic SNP per chromosome in generation 1,000 were retained for further analyses, and expanded to the required number of parents and offspring. Individuals were sequenced at 0.01, 0.05, 0.1, 0.5 and 1X depth, with 100, 500, 1,000 or 10,000 base-pair reads and by assuming a random sequencing error rate per SNP between 10–2 and 10–5. Assuming known allele frequencies in the population and sequencing error rate, 0.05X depth sufficed to corroborate the true father (85,0%) and to discard other candidates (96,3%). Those percentages increased up to 99,6% and 99,9% with 0,1X depth, respectively (read length = 10,000 bp; smaller read lengths slightly improved the results because they increase the number of sequenced SNP). Results were highly sensitive to biases in allele frequencies and robust to inaccuracies regarding sequencing error rate. Low-coverage whole-genome sequencing data could be subsequently integrated into genomic BLUP equations by appropriately constructing the genomic relationship matrix. This approach increased the correlation between simulated and predicted breeding values by 1.21% (h2 = 0.25; 100 parents and 900 offspring; 0.1X depth by 10,000 bp reads). Although small, this increase opens the door to genomic evaluation in local livestock breeds.

Download Full-text

Accurate long-read de novo assembly evaluation with Inspector

Genome Biology ◽

10.1186/s13059-021-02527-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yu Chen ◽

Yixin Zhang ◽

Amy Y. Wang ◽

Min Gao ◽

Zechen Chong

Keyword(s):

Genome Assembly ◽

De Novo Assembly ◽

In Silico ◽

Large Scale ◽

De Novo ◽

Small Scale ◽

De Novo Genome Assembly ◽

Consensus Sequences ◽

Assembly Evaluation ◽

Long Read

AbstractLong-read de novo genome assembly continues to advance rapidly. However, there is a lack of effective tools to accurately evaluate the assembly results, especially for structural errors. We present Inspector, a reference-free long-read de novo assembly evaluator which faithfully reports types of errors and their precise locations. Notably, Inspector can correct the assembly errors based on consensus sequences derived from raw reads covering erroneous regions. Based on in silico and long-read assembly results from multiple long-read data and assemblers, we demonstrate that in addition to providing generic metrics, Inspector can accurately identify both large-scale and small-scale assembly errors.

Download Full-text

Characterization of an unusually conserved AluI highly reiterated DNA sequence family from the honeybee, Apis mellifera.

Genetics ◽

10.1093/genetics/134.4.1195 ◽

1993 ◽

Vol 134 (4) ◽

pp. 1195-1204

Author(s):

S Tarès ◽

J M Cornuet ◽

P Abad

Keyword(s):

Apis Mellifera ◽

Dna Sequence ◽

Dna Sequences ◽

Sequence Data ◽

Sequence Divergence ◽

Repeated Sequence ◽

Consensus Sequences ◽

Dna Sequence Data ◽

Repeat Class ◽

Honeybee Subspecies

Abstract An AluI family of highly reiterated nontranscribed sequences has been found in the genome of the honeybee Apis mellifera. This repeated sequence is shown to be present at approximately 23,000 copies per haploid genome constituting about 2% of the total genomic DNA. The nucleotide sequence of 10 monomers was determined. The consensus sequences is 176 nucleotides long and has an A + T content of 58%. There are clusters of both direct and inverted repeats. Internal subrepeating units ranging from 11 to 17 nucleotides are observed, suggesting that it could have evolved from a shorter sequence. DNA sequence data reveal that this repeat class is unusually homogeneous compared to the other class of invertebrate highly reiterated DNA sequences. The average pairwise sequence divergence between the repeats is 2.5%. In spite of this unusual homogeneity, divergence has been found in the repeated sequence hybridization ladder between four different honeybee subspecies. Therefore, the AluI highly reiterated sequences provide a new probe for fingerprinting in A. m. mellifera.

Download Full-text

Phylogenetic Study of Bufonidae (Amphibia: Anura) from Sumatera and Asia Based on Cytochrome B Gene

Journal of Tropical Life Science ◽

10.11594/jtls.11.03.15 ◽

2021 ◽

Vol 11 (3) ◽

pp. 383-387

Author(s):

Djong Tjong ◽

◽

Dewi Roesma ◽

Silvia Indra ◽

◽

...

Keyword(s):

Dna Sequencing ◽

Cytochrome B ◽

Cell Biology ◽

Sequence Divergence ◽

Phylogenetic Study ◽

Tissue Samples ◽

Duttaphrynus Melanostictus ◽

Level Increase ◽

West Sumatra ◽

Biology Laboratory

Bufonidae is widely distributed amphibian family around the world, including Indonesia (Sumatra) and Asia. Sumatra is an island which has separated because of sea level increase and create gene flows barrier for amphibian in Sumatra and Asia. The purpose of this research is to analyse Bufonidae phylogenetic which exist in several locations in Sumatra and Asia based on gene cytochrome b analysis. Samples were collected from six localities in West Sumatra. DNA extraction and amplification has been done in Genetics and Cell Biology Laboratory, Andalas University, whereas DNA sequencing were analyzed at MacroGen USA DNA Sequencing Laboratory, South Korea. About 14 tissue samples of Bufonidae from West Sumatra has been analyzed. The relationship of Bufonidae phylogenetics in Sumatra and Asia were divided into three main clusters. Duttaphrynus melanostictus in Sumatra is closer to Duttaphrynus melanostictus from India with 0,3-0,5% sequence divergence and it is a group of paraphyletic with Duttaphrynus melanostictus from Vietnam, Taiwan, and China.

Download Full-text

Workstation benchmark of Spark Capable Genome Analysis ToolKit 4 Variant Calling

10.1101/2020.05.17.101105 ◽

2020 ◽

Author(s):

Marcus H. Hansen ◽

Anita T. Simonsen ◽

Hans B. Ommen ◽

Charlotte G. Nyvold

Keyword(s):

Dna Sequencing ◽

Genome Analysis ◽

High Speed ◽

High Performance ◽

Variant Calling ◽

Amplicon Sequencing ◽

Targeted Sequencing ◽

Sequencing Analysis ◽

Genome Analysis Toolkit ◽

Order Of Magnitude

AbstractBackgroundRapid and practical DNA-sequencing processing has become essential for modern biomedical laboratories, especially in the field of cancer, pathology and genetics. While sequencing turn-over time has been, and still is, a bottleneck in research and diagnostics, the field of bioinformatics is moving at a rapid pace – both in terms of hardware and software development. Here, we benchmarked the local performance of three of the most important Spark-enabled Genome analysis toolkit 4 (GATK4) tools in a targeted sequencing workflow: Duplicate marking, base quality score recalibration (BQSR) and variant calling on targeted DNA sequencing using a modest hyperthreading 12-core single CPU and a high-speed PCI express solid-state drive.ResultsCompared to the previous GATK version the performance of Spark-enabled BQSR and HaplotypeCaller is shifted towards a more efficient usage of the available cores on CPU and outperforms the earlier GATK3.8 version with an order of magnitude reduction in processing time to analysis ready variants, whereas MarkDuplicateSpark was found to be thrice as fast. Furthermore, HaploTypeCallerSpark and BQSRPipelineSpark were significantly faster than the equivalent GATK4 standard tools with a combined ∼86% reduction in execution time, reaching a median rate of ten million processed bases per second, and duplicate marking was reduced ∼42%. The called variants were found to be in close agreement between the Spark and non-Spark versions, with an overall concordance of 98%. In this setup, the tools were also highly efficient when compared execution on a small 72 virtual CPU/18-node Google Cloud cluster.ConclusionIn conclusion, GATK4 offers practical parallelization possibilities for DNA sequence processing, and the Spark-enabled tools optimize performance and utilization of local CPUs. Spark utilizing GATK variant calling is several times faster than previous GATK3.8 multithreading with the same multi-core, single CPU, configuration. The improved opportunities for parallel computations not only hold implications for high-performance cluster, but also for modest laboratory or research workstations for targeted sequencing analysis, such as exome, panel or amplicon sequencing.

Download Full-text

Abstract 193: RNA-sequencing of Disease-specific iPSC as a New Filter to Identify Genes Associated With Hypoplastic Left Heart Syndrome

Circulation Research ◽

10.1161/res.117.suppl_1.193 ◽

2015 ◽

Vol 117 (suppl_1) ◽

Author(s):

Almudena Martinez Fernandez ◽

Xing Li ◽

Jeanne L Theis ◽

Andre Terzic ◽

Timothy M Olson ◽

...

Keyword(s):

Rna Sequencing ◽

Hypoplastic Left Heart Syndrome ◽

De Novo ◽

Nuclear Family ◽

Cardiac Differentiation ◽

Left Heart ◽

Hypoplastic Left Heart ◽

Differentiated Cells ◽

Left Heart Syndrome

Hypoplastic Left Heart Syndrome (HLHS) is a complex multifactorial disease for which no definitive genetic causes have been found. Current genetic filtering strategies render lists of genes with unknown relevance in terms of pathogenesis. A complementary filter based on biological evidence would create a new approach to prioritize relevant candidate genes and mutations. In our study, 5 members of a nuclear family including a child with HLHS were evaluated using echocardiography and their genetic information was obtained through whole genome sequencing (WGS). Data filtering including rarity, functional impact and mode of inheritance was implemented, resulting in identification of 34 genes with recessive or de novo variants potentially involved in the pathogenesis of HLHS. Additionally, iPSC were derived from proband and parents and subjected to RNA-sequencing at the undifferentiated state and following spontaneous differentiation. Comparative transcriptional analyses identified genes differentially expressed in proband samples at each stage. These gene sets were used as an additional filter for the previously generated WGS data. This strategy revealed that out of 34 mutated genes originally identified, 10 displayed transcriptional differences in undifferentiated iPSC from the HLHS-affected individual while 16 out of 34 mutated genes showed significantly different expression levels in differentiated cells from proband. Furthermore, expression dynamics were studied during guided cardiac differentiation for the 9 genes fulfilling all applied criteria. Two genes not previously linked to HLHS, ELF4 and HSPG2 were found to behave significantly different in HLHS-iPSC when compared to control counterparts. In summary, filtering WGS data according to a new layer of transcriptional information that leverages iPSC plasticity allows prioritization of genes associated with HLHS in an in vitro model of disease.

Download Full-text

Expanding the molecular taxonomy of NUT midline carcinomas with multiomic analyses.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.e21008 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. e21008-e21008

Author(s):

Henry G. Kaplan ◽

Alex Barrett ◽

Jiaxin Niu ◽

Somasundaram Subramaniam ◽

Maria Matsangou

Keyword(s):

Dna Sequencing ◽

Rna Sequencing ◽

Molecular Taxonomy ◽

Fusion Partner ◽

Targeted Proteomics ◽

Single Nucleotide Variants ◽

Proteomic Data ◽

Spatial Profiling ◽

Tumor Mutational Burden ◽

Custom Panel

e21008 Background: NUT midline carcinoma (NMC) is an aggressive squamous cell carcinoma molecularly defined by a chromosomal rearrangement of nuclear protein in testis (NUTM1) with bromodomain-containing protein 3 or 4 (BRD3/4). While NMCs are characterized by this rare canonical gene rearrangement little is known about the transcriptome and proteosome of this rare disease. As such, we set out to comprehensively characterize five NMC cases in which we attained targeted DNA sequencing, full-transcriptome RNA sequencing, and targeted proteomics. We further examine and integrate these results in order to better understand the relationship between gene expression and protein abundance within the context of NMC. Methods: All cases were analyzed for genomic and transcriptomic alterations against a custom panel via the Tempus xT tissue biopsy assay (DNA sequencing of 648 genes in tumor and matched normal samples at 500x depth and full-transcriptome RNA sequencing) for germline and/or somatic mutations. The xT assay detects single nucleotide variants, specific insertion/deletions, amplifications and gene fusions, as well as tumor mutational burden (TMB) and microsatellite instability (MSI) status. Proteomic data were obtained utilizing digital spatial profiling through Nanostring immune, MAPK and PI3/AKT, and pan tumor nCounter GeoMix panels. Results: Clinical characteristics, histology, and genomic/proteomic alterations for 5 NMC cases are presented. Cases were defined by pathological assessment and the identification of the canonical NUTM1 fusion, further broken down by fusion partner with three patients having NUTM1-BRD4 fusions, one NUT-BRD3, and one NUT-ZMYND8. TMBs ranged for 0.8-.6 mutations/megabases (n=5). All patients were MSI stable (5/5). Of three patients with available PD-L1 IHC result, one had elevated PD-L1 tumor staining at 70%. Results will be presented from full-transcriptome RNA expression analysis indicating overexpression of BRAF, MYC, mTOR, and EGFR, among others. Targeted proteomics were performed to assess relative abundance at the protein level (results to be presented). Clinical follow up for the five patients revealed that two have survived beyond 7 months. A lung primary patient treated with surgical resection and post op radiation (XRT) is NED at 63 months. A sinus primary patient is NED at 16 months after a partial response (PR) to taxotere/5FU/Cisplatin followed by resection and XRT/cis platin. One patient had a brief PR from ifosphamide/etoposide/vorinostat. One patient's tumor grew through XRT/cisplatin. Conclusions: Multi-omic analysis has the potential to further elucidate the mechanisms of tumor growth in NMC and identify new targets for the treatment of this aggressive and poor prognosis disease.

Download Full-text

A mushroom-inducing DNA sequence isolated from the Basidiomycete, Schizophyllum commune.

Genetics ◽

10.1093/genetics/129.3.707 ◽

1991 ◽

Vol 129 (3) ◽

pp. 707-716 ◽

Cited By ~ 1

Author(s):

J S Horton ◽

C A Raper

Keyword(s):

Dna Sequence ◽

Dna Sequences ◽

De Novo ◽

Schizophyllum Commune ◽

Sequence Divergence ◽

Fruiting Bodies ◽

Genomic Dnas ◽

Different Strains ◽

Mating Interaction ◽

Normal Requirement

Abstract A DNA sequence capable of inducing the de novo development of fruiting bodies (mushrooms) when integrated into the genome of unmated, nonfruiting strains of the Basidiomycete Schizophyllum commune has been isolated and partially characterized. This sequence, designated FRT1, overrides the normal requirement of a mating interaction for fruiting in this organism. It has been shown to integrate stably in different chromosome locations and appears to be trans-acting. It also enhances the normal process of fruiting that occurs after mating. Additional DNA sequences with similarity to FRT1 were detected within the genome of the strain of origin by hybridization of labeled FRT1 DNA to blots of digested genomic DNAs. FRT1 and the genomic sequences similar to it were shown to be genetically linked. Southern hybridization experiments suggested sequence divergence at the FRT1 locus between different strains of S. commune. A testable model for how FRT1 may act as a key element in the pathway for the differentiation of fruiting bodies is presented as a working hypothesis for further investigation.

Download Full-text