Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection

Christophe Lambert; Cassandra Braxton; Robert Charlebois; Avisek Deyati; Paul Duncan; Fabio La Neve; Heather Malicki; Sebastien Ribrioux; Daniel Rozelle; Brandye Michaels; Wenping Sun; Zhihui Yang; Arifa Khan

doi:10.3390/v10100528

Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection

Viruses ◽

10.3390/v10100528 ◽

2018 ◽

Vol 10 (10) ◽

pp. 528 ◽

Cited By ~ 9

Author(s):

Christophe Lambert ◽

Cassandra Braxton ◽

Robert Charlebois ◽

Avisek Deyati ◽

Paul Duncan ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computational Cost ◽

Virus Detection ◽

Computation Time ◽

Memory Storage ◽

Risk Level ◽

Bioinformatics Pipeline ◽

Sequence Identification ◽

Data Volume

High-throughput sequencing (HTS) has demonstrated capabilities for broad virus detection based upon discovery of known and novel viruses in a variety of samples, including clinical, environmental, and biological. An important goal for HTS applications in biologics is to establish parameter settings that can afford adequate sensitivity at an acceptable computational cost (computation time, computer memory, storage, expense or/and efficiency), at critical steps in the bioinformatics pipeline, including initial data quality assessment, trimming/cleaning, and assembly (to reduce data volume and increase likelihood of appropriate sequence identification). Additionally, the quality and reliability of the results depend on the availability of a complete and curated viral database for obtaining accurate results; selection of sequence alignment programs and their configuration, that retains specificity for broad virus detection with reduced false-positive signals; removal of host sequences without loss of endogenous viral sequences of interest; and use of a meaningful reporting format, which can retain critical information of the analysis for presentation of readily interpretable data and actionable results. Furthermore, after alignment, both automated and manual evaluation may be needed to verify the results and help assign a potential risk level to residual, unmapped reads. We hope that the collective considerations discussed in this paper aid toward optimization of data analysis pipelines for virus detection by HTS.

Download Full-text

Virus Detection by High-Throughput Sequencing of Small RNAs: Large-Scale Performance Testing of Sequence Analysis Strategies

Phytopathology ◽

10.1094/phyto-02-18-0067-r ◽

2019 ◽

Vol 109 (3) ◽

pp. 488-497 ◽

Cited By ~ 29

Author(s):

Sebastien Massart ◽

Michela Chiumenti ◽

Kris De Jonghe ◽

Rachel Glover ◽

Annelies Haegeman ◽

...

Keyword(s):

High Throughput ◽

Large Scale ◽

High Throughput Sequencing ◽

Performance Test ◽

Virus Detection ◽

Performance Testing ◽

Plant Viruses ◽

Reference Sequence ◽

Bioinformatics Pipeline ◽

Double Blind

Recent developments in high-throughput sequencing (HTS), also called next-generation sequencing (NGS), technologies and bioinformatics have drastically changed research on viral pathogens and spurred growing interest in the field of virus diagnostics. However, the reliability of HTS-based virus detection protocols must be evaluated before adopting them for diagnostics. Many different bioinformatics algorithms aimed at detecting viruses in HTS data have been reported but little attention has been paid thus far to their sensitivity and reliability for diagnostic purposes. Therefore, we compared the ability of 21 plant virology laboratories, each employing a different bioinformatics pipeline, to detect 12 plant viruses through a double-blind large-scale performance test using 10 datasets of 21- to 24-nucleotide small RNA (sRNA) sequences from three different infected plants. The sensitivity of virus detection ranged between 35 and 100% among participants, with a marked negative effect when sequence depth decreased. The false-positive detection rate was very low and mainly related to the identification of host genome-integrated viral sequences or misinterpretation of the results. Reproducibility was high (91.6%). This work revealed the key influence of bioinformatics strategies for the sensitive detection of viruses in HTS sRNA datasets and, more specifically (i) the difficulty in detecting viral agents when they are novel or their sRNA abundance is low, (ii) the influence of key parameters at both assembly and annotation steps, (iii) the importance of completeness of reference sequence databases, and (iv) the significant level of scientific expertise needed when interpreting pipeline results. Overall, this work underlines key parameters and proposes recommendations for reliable sRNA-based detection of known and unknown viruses.

Download Full-text

Broad-Range Virus Detection and Discovery Using Microfluidic PCR Coupled with High-throughput Sequencing

10.1101/2020.06.10.145052 ◽

2020 ◽

Author(s):

Ying Tao ◽

Clinton R. Paden ◽

Krista Queen ◽

Jing Zhang ◽

Eishita Tyagi ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Virus Detection ◽

Viral Pathogens ◽

Large Numbers ◽

Straightforward Method ◽

Sequence Identification ◽

Pcr Assays ◽

Taxonomic Groups ◽

Next Generation Sequencing Ngs

AbstractThere is a need for a comprehensive and sensitive method to test for a broad range of viral pathogens in samples without any identifiable pathogen detected. Real-time PCR assays are sensitive and rapid, but their specificity limits their utility in detecting divergent agents. Shotgun high-throughput sequencing methods provide unbiased sequence identification, however, they have limited sensitivity and require complex analyses. In order to meet the need for a sensitive, high-throughput virus detection and discovery platform with good sensitivity, we combine two existing technologies, broadly-reactive consensus-degenerate pan-viral group PCR and the MiSeq sequencer (Illumina), using the Access Array (Fluidigm), a commercially-available microfluidic PCR system. Pan-viral group primers target conserved regions of virus taxonomic groups and can amplify known and potentially novel species. The Access Array employs dozens of these assays in parallel, which are then sequenced all at once on the MiSeq. In this study, we run a respiratory panel of pan-viral group PCR assays using AA-PCR-Seq. We validate the panel on a collection of representative human and animal samples, comparing it to qPCR and shotgun next-generation sequencing (NGS). AA-PCR-Seq provides a robust, straightforward method for screening large numbers of samples for virus detection and discovery.

Download Full-text

Comparison of high throughput sequencing to standard protocols for virus detection in berry crops

Plant Disease ◽

10.1094/pdis-05-21-0949-re ◽

2021 ◽

Author(s):

Dan Edward Veloso Villamor ◽

Karen E Keller ◽

Robert Martin ◽

Ioannis Emmanouil Tzanetakis

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Wide Spectrum ◽

Virus Detection ◽

Rt Pcr ◽

Host Interactions ◽

Growing Seasons ◽

Berry Crops ◽

Sampling Times ◽

Better Than

A comprehensive study comparing virus detection between high throughput sequencing (HTS) and standard protocols in 30 berry selections (12 Fragaria, 10 Vaccinium and 8 Rubus) with known virus profiles was completed. The study examined temporal detection of viruses at four sampling times encompassing two growing seasons. Within the standard protocols, RT-PCR proved better than biological indexing. Detection of known viruses by HTS and RT-PCR nearly mirrored each other. HTS provided superior detection compared to RT-PCR on a wide spectrum of virus variants and discovery of novel viruses. More importantly, in most cases where the two protocols showed parallel virus detection, 11 viruses in 16 berry selections were not consistently detected by both methods at all sampling points. Based on these data we propose a four sampling times/two-year testing requirement for berry and potentially other crops to ensure that no virus remains undetected independent of titer, distribution or other virus/virus or virus/host interactions.

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

MycoKeys ◽

10.3897/mycokeys.39.28109 ◽

2018 ◽

Vol 39 ◽

pp. 29-40 ◽

Cited By ~ 21

Author(s):

Sten Anslan ◽

R. Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text

Current Perspectives on High-Throughput Sequencing (HTS) for Adventitious Virus Detection: Upstream Sample Processing and Library Preparation

Viruses ◽

10.3390/v10100566 ◽

2018 ◽

Vol 10 (10) ◽

pp. 566 ◽

Cited By ~ 9

Author(s):

Siemon Ng ◽

Cassandra Braxton ◽

Marc Eloit ◽

Szi Feng ◽

Romain Fragnoud ◽

...

Keyword(s):

Sample Preparation ◽

Nucleic Acids ◽

High Throughput ◽

High Throughput Sequencing ◽

Virus Detection ◽

Extraction Methods ◽

Control Measures ◽

Acid Extraction ◽

Library Preparation ◽

Sample Processing

A key step for broad viral detection using high-throughput sequencing (HTS) is optimizing the sample preparation strategy for extracting viral-specific nucleic acids since viral genomes are diverse: They can be single-stranded or double-stranded RNA or DNA, and can vary from a few thousand bases to over millions of bases, which might introduce biases during nucleic acid extraction. In addition, viral particles can be enveloped or non-enveloped with variable resistance to pre-treatment, which may influence their susceptibility to extraction procedures. Since the identity of the potential adventitious agents is unknown prior to their detection, efficient sample preparation should be unbiased toward all different viral types in order to maximize the probability of detecting any potential adventitious viruses using HTS. Furthermore, the quality assessment of each step for sample processing is also a critical but challenging aspect. This paper presents our current perspectives for optimizing upstream sample processing and library preparation as part of the discussion in the Advanced Virus Detection Technologies Interest group (AVDTIG). The topics include: Use of nuclease treatment to enrich for encapsidated nucleic acids, techniques for amplifying low amounts of virus nucleic acids, selection of different extraction methods, relevant controls, the use of spike recovery experiments, and quality control measures during library preparation.

Download Full-text

MiDRMpol: A High-Throughput Multiplexed Amplicon Sequencing Workflow to Quantify HIV-1 Drug Resistance Mutations against Protease, Reverse Transcriptase, and Integrase Inhibitors

Viruses ◽

10.3390/v11090806 ◽

2019 ◽

Vol 11 (9) ◽

pp. 806

Author(s):

Shambhu G. Aralaguppe ◽

Anoop T. Ambikan ◽

Manickam Ashokkumar ◽

Milner M. Kumar ◽

Luke Elizabeth Hanna ◽

...

Keyword(s):

Drug Resistance ◽

High Throughput ◽

Large Scale ◽

High Throughput Sequencing ◽

Polymorphism Analysis ◽

Resistance Mutations ◽

Subtype C ◽

Bioinformatics Pipeline ◽

Drug Resistance Mutations ◽

Hiv 1

The detection of drug resistance mutations (DRMs) in minor viral populations is of potential clinical importance. However, sophisticated computational infrastructure and competence for analysis of high-throughput sequencing (HTS) data lack at most diagnostic laboratories. Thus, we have proposed a new pipeline, MiDRMpol, to quantify DRM from the HIV-1 pol region. The gag-vpu region of 87 plasma samples from HIV-infected individuals from three cohorts was amplified and sequenced by Illumina HiSeq2500. The sequence reads were adapter-trimmed, followed by analysis using in-house scripts. Samples from Swedish and Ethiopian cohorts were also sequenced by Sanger sequencing. The pipeline was validated against the online tool PASeq (Polymorphism Analysis by Sequencing). Based on an error rate of <1%, a value of >1% was set as reliable to consider a minor variant. Both pipelines detected the mutations in the dominant viral populations, while discrepancies were observed in minor viral populations. In five HIV-1 subtype C samples, minor mutations were detected at the <5% level by MiDRMpol but not by PASeq. MiDRMpol is a computationally as well as labor efficient bioinformatics pipeline for the detection of DRM from HTS data. It identifies minor viral populations (<20%) of DRMs. Our method can be incorporated into large-scale surveillance of HIV-1 DRM.

Download Full-text

High Throughput Sequencing For Plant Virus Detection and Discovery

Phytopathology ◽

10.1094/phyto-07-18-0257-rvw ◽

2019 ◽

Vol 109 (5) ◽

pp. 716-725 ◽

Cited By ~ 44

Author(s):

D. E. V. Villamor ◽

T. Ho ◽

M. Al Rwahnih ◽

R. R. Martin ◽

I. E. Tzanetakis

Keyword(s):

High Throughput ◽

Plant Virus ◽

High Throughput Sequencing ◽

Virus Detection ◽

Agricultural Crops ◽

Virus Discovery ◽

Plant Virus Detection ◽

Virus Biology ◽

Disease Causality

Over the last decade, virologists have discovered an unprecedented number of viruses using high throughput sequencing (HTS), which led to the advancement of our knowledge on the diversity of viruses in nature, particularly unraveling the virome of many agricultural crops. However, these new virus discoveries have often widened the gaps in our understanding of virus biology; the forefront of which is the actual role of a new virus in disease, if any. Yet, when used critically in etiological studies, HTS is a powerful tool to establish disease causality between the virus and its host. Conversely, with globalization, movement of plant material is increasingly more common and often a point of dispute between countries. HTS could potentially resolve these issues given its capacity to detect and discover. Although many pipelines are available for plant virus discovery, all share a common backbone. A description of the process of plant virus detection and discovery from HTS data are presented, providing a summary of the different pipelines available for scientists’ utility in their research.

Download Full-text

WebPrInSeS: automated full-length clone sequence identification and verification using high-throughput sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkq431 ◽

2010 ◽

Vol 38 (suppl_2) ◽

pp. W378-W384 ◽

Cited By ~ 5

Author(s):

Andreas Massouras ◽

Frederik Decouttere ◽

Korneel Hens ◽

Bart Deplancke

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Full Length ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequence Identification ◽

Clone Sequence ◽

Full Length Clone

Download Full-text

Advanced Virus Detection Technologies Interest Group (AVDTIG): Efforts on High Throughput Sequencing (HTS) for Virus Detection

PDA Journal of Pharmaceutical Science and Technology ◽

10.5731/pdajpst.2016.007161 ◽

2016 ◽

Vol 70 (6) ◽

pp. 591-595 ◽

Cited By ~ 12

Author(s):

A. S. Khan ◽

D. A. Vacante ◽

J.-P. Cassart ◽

S. H. S. Ng ◽

C. Lambert ◽

...

Keyword(s):

Interest Group ◽

High Throughput ◽

High Throughput Sequencing ◽

Virus Detection ◽

Detection Technologies

Download Full-text

SARS-CoV-2 Evolution among Oncological Population: In-Depth Virological Analysis of a Clinical Cohort

Microorganisms ◽

10.3390/microorganisms9102145 ◽

2021 ◽

Vol 9 (10) ◽

pp. 2145

Author(s):

Florian Laubscher ◽

Samuel Cordey ◽

Alex Friedlaender ◽

Cecilia Schweblin ◽

Sarah Noetzlin ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Cohort Analysis ◽

Rapid Evolution ◽

Nasopharyngeal Swab ◽

Bioinformatics Pipeline ◽

Oncological Patient ◽

Patient Cohort ◽

Oncological Patients ◽

Minority Variants

Background: Oncological patients have a higher risk of prolonged SARS-CoV-2 shedding, which, in turn, can lead to evolutionary mutations and emergence of novel viral variants. The aim of this study was to analyze biological samples of a cohort of oncological patients by deep sequencing to detect any significant viral mutations. Methods: High-throughput sequencing was performed on selected samples from a SARS-CoV-2-positive oncological patient cohort. Analysis of variants and minority variants was performed using a validated bioinformatics pipeline. Results: Among 54 oncological patients, we analyzed 12 samples of 6 patients, either serial nasopharyngeal swab samples or samples from the upper and lower respiratory tracts, by high-throughput sequencing. We identified amino acid changes D614G and P4715L as well as mutations at nucleotide positions 241 and 3037 in all samples. There were no other significant mutations, but we observed intra-host evolution in some minority variants, mainly in the ORF1ab gene. There was no significant mutation identified in the spike region and no minority variants common to several hosts. Conclusions: There was no major and rapid evolution of viral strains in this oncological patient cohort, but there was minority variant evolution, reflecting a dynamic pattern of quasi-species replication.

Download Full-text