A Primer on the Analysis of High-Throughput Sequencing Data for Detection of Plant Viruses

Denis Kutnjak; Lucie Tamisier; Ian Adams; Neil Boonham; Thierry Candresse; Michela Chiumenti; Kris De Jonghe; Jan F. Kreuze; Marie Lefebvre; Gonçalo Silva; Martha Malapi-Wight; Paolo Margaria; Irena Mavrič Pleško; Sam McGreig; Laura Miozzi; Benoit Remenant; Jean-Sebastien Reynard; Johan Rollin; Mike Rott; Olivier Schumpp; Sébastien Massart; Annelies Haegeman

doi:10.3390/microorganisms9040841

A Primer on the Analysis of High-Throughput Sequencing Data for Detection of Plant Viruses

Microorganisms ◽

10.3390/microorganisms9040841 ◽

2021 ◽

Vol 9 (4) ◽

pp. 841

Author(s):

Denis Kutnjak ◽

Lucie Tamisier ◽

Ian Adams ◽

Neil Boonham ◽

Thierry Candresse ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

Plant Virus ◽

High Throughput Sequencing ◽

Plant Viruses ◽

Acid Extraction ◽

Sequencing Data ◽

Bioinformatic Tools ◽

Advantages And Disadvantages ◽

Plant Virus Detection

High-throughput sequencing (HTS) technologies have become indispensable tools assisting plant virus diagnostics and research thanks to their ability to detect any plant virus in a sample without prior knowledge. As HTS technologies are heavily relying on bioinformatics analysis of the huge amount of generated sequences, it is of utmost importance that researchers can rely on efficient and reliable bioinformatic tools and can understand the principles, advantages, and disadvantages of the tools used. Here, we present a critical overview of the steps involved in HTS as employed for plant virus detection and virome characterization. We start from sample preparation and nucleic acid extraction as appropriate to the chosen HTS strategy, which is followed by basic data analysis requirements, an extensive overview of the in-depth data processing options, and taxonomic classification of viral sequences detected. By presenting the bioinformatic tools and a detailed overview of the consecutive steps that can be used to implement a well-structured HTS data analysis in an easy and accessible way, this paper is targeted at both beginners and expert scientists engaging in HTS plant virome projects.

Download Full-text

Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis

Genomics ◽

10.1016/j.ygeno.2017.01.005 ◽

2017 ◽

Vol 109 (2) ◽

pp. 83-90 ◽

Cited By ~ 44

Author(s):

Yan Guo ◽

Yulin Dai ◽

Hui Yu ◽

Shilin Zhao ◽

David C. Samuels ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

High Throughput Sequencing For Plant Virus Detection and Discovery

Phytopathology ◽

10.1094/phyto-07-18-0257-rvw ◽

2019 ◽

Vol 109 (5) ◽

pp. 716-725 ◽

Cited By ~ 44

Author(s):

D. E. V. Villamor ◽

T. Ho ◽

M. Al Rwahnih ◽

R. R. Martin ◽

I. E. Tzanetakis

Keyword(s):

High Throughput ◽

Plant Virus ◽

High Throughput Sequencing ◽

Virus Detection ◽

Agricultural Crops ◽

Virus Discovery ◽

Plant Virus Detection ◽

Virus Biology ◽

Disease Causality

Over the last decade, virologists have discovered an unprecedented number of viruses using high throughput sequencing (HTS), which led to the advancement of our knowledge on the diversity of viruses in nature, particularly unraveling the virome of many agricultural crops. However, these new virus discoveries have often widened the gaps in our understanding of virus biology; the forefront of which is the actual role of a new virus in disease, if any. Yet, when used critically in etiological studies, HTS is a powerful tool to establish disease causality between the virus and its host. Conversely, with globalization, movement of plant material is increasingly more common and often a point of dispute between countries. HTS could potentially resolve these issues given its capacity to detect and discover. Although many pipelines are available for plant virus discovery, all share a common backbone. A description of the process of plant virus detection and discovery from HTS data are presented, providing a summary of the different pipelines available for scientists’ utility in their research.

Download Full-text

xIP-seq Platform: An Integrative Framework for High-Throughput Sequencing Data Analysis

2009 Ohio Collaborative Conference on Bioinformatics ◽

10.1109/occbio.2009.20 ◽

2009 ◽

Cited By ~ 2

Author(s):

Xin Wang ◽

Mingxiang Teng ◽

Guohua Wang ◽

Yuming Zhao ◽

Xu Han ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Integrative Framework ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

NASQAR: A web-based platform for high-throughput sequencing data analysis and visualization

10.1101/709980 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ayman Yousif ◽

Nizar Drou ◽

Jillian Rowe ◽

Mohammed Khalfan ◽

Kristin C Gunsalus

Keyword(s):

New York ◽

Data Analysis ◽

Open Source ◽

High Throughput ◽

High Throughput Sequencing ◽

Web Applications ◽

Rna Seq ◽

Sequencing Data ◽

Web Based ◽

Link Type

AbstractBackgroundAs high-throughput sequencing applications continue to evolve, the rapid growth in quantity and variety of sequence-based data calls for the development of new software libraries and tools for data analysis and visualization. Often, effective use of these tools requires computational skills beyond those of many researchers. To ease this computational barrier, we have created a dynamic web-based platform, NASQAR (Nucleic Acid SeQuence Analysis Resource).ResultsNASQAR offers a collection of custom and publicly available open-source web applications that make extensive use of a variety of R packages to provide interactive data analysis and visualization. The platform is publicly accessible at http://nasqar.abudhabi.nyu.edu/. Open-source code is on GitHub at https://github.com/nasqar/NASQAR, and the system is also available as a Docker image at https://hub.docker.com/r/aymanm/nasqarall. NASQAR is a collaboration between the core bioinformatics teams of the NYU Abu Dhabi and NYU New York Centers for Genomics and Systems Biology.ConclusionsNASQAR empowers non-programming experts with a versatile and intuitive toolbox to easily and efficiently explore, analyze, and visualize their Transcriptomics data interactively. Popular tools for a variety of applications are currently available, including Transcriptome Data Preprocessing, RNA-seq Analysis (including Single-cell RNA-seq), Metagenomics, and Gene Enrichment.

Download Full-text

Recent Advances on Detection and Characterization of Fruit Tree Viruses Using High-Throughput Sequencing Technologies

Viruses ◽

10.3390/v10080436 ◽

2018 ◽

Vol 10 (8) ◽

pp. 436 ◽

Cited By ~ 30

Author(s):

Varvara Maliogka ◽

Angelantonio Minafra ◽

Pasquale Saldarelli ◽

Ana Ruiz-García ◽

Miroslav Glasa ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Infected Plant ◽

Fruit Trees ◽

Plant Viruses ◽

Fruit Tree ◽

Advantages And Disadvantages ◽

Sequencing Technologies ◽

Recent Advances

Perennial crops, such as fruit trees, are infected by many viruses, which are transmitted through vegetative propagation and grafting of infected plant material. Some of these pathogens cause severe crop losses and often reduce the productive life of the orchards. Detection and characterization of these agents in fruit trees is challenging, however, during the last years, the wide application of high-throughput sequencing (HTS) technologies has significantly facilitated this task. In this review, we present recent advances in the discovery, detection, and characterization of fruit tree viruses and virus-like agents accomplished by HTS approaches. A high number of new viruses have been described in the last 5 years, some of them exhibiting novel genomic features that have led to the proposal of the creation of new genera, and the revision of the current virus taxonomy status. Interestingly, several of the newly identified viruses belong to virus genera previously unknown to infect fruit tree species (e.g., Fabavirus, Luteovirus) a fact that challenges our perspective of plant viruses in general. Finally, applied methodologies, including the use of different molecules as templates, as well as advantages and disadvantages and future directions of HTS in fruit tree virology are discussed.

Download Full-text

HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis

PLoS ONE ◽

10.1371/journal.pone.0085879 ◽

2014 ◽

Vol 9 (1) ◽

pp. e85879 ◽

Cited By ~ 67

Author(s):

Fabrice P. A. David ◽

Julien Delafontaine ◽

Solenne Carat ◽

Frederick J. Ross ◽

Gregory Lefebvre ◽

...

Keyword(s):

Data Analysis ◽

Open Access ◽

High Throughput ◽

Web Application ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data

Canadian Journal of Microbiology ◽

10.1139/cjm-2015-0821 ◽

2016 ◽

Vol 62 (8) ◽

pp. 692-703 ◽

Cited By ~ 132

Author(s):

Gregory B. Gloor ◽

Gregor Reid

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Compositional Data ◽

Critical Role ◽

Compositional Data Analysis ◽

Data Sets ◽

Clear Understanding ◽

Sequencing Data ◽

Microbiome Data

A workshop held at the 2015 annual meeting of the Canadian Society of Microbiologists highlighted compositional data analysis methods and the importance of exploratory data analysis for the analysis of microbiome data sets generated by high-throughput DNA sequencing. A summary of the content of that workshop, a review of new methods of analysis, and information on the importance of careful analyses are presented herein. The workshop focussed on explaining the rationale behind the use of compositional data analysis, and a demonstration of these methods for the examination of 2 microbiome data sets. A clear understanding of bioinformatics methodologies and the type of data being analyzed is essential, given the growing number of studies uncovering the critical role of the microbiome in health and disease and the need to understand alterations to its composition and function following intervention with fecal transplant, probiotics, diet, and pharmaceutical agents.

Download Full-text

Comparison of sequencing data processing pipelines and application to underrepresented African human populations

BMC Bioinformatics ◽

10.1186/s12859-021-04407-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Gwenna Breton ◽

Anna C. V. Johansson ◽

Per Sjödin ◽

Carina M. Schlebusch ◽

Mattias Jakobsson

Keyword(s):

Best Practices ◽

High Throughput ◽

High Throughput Sequencing ◽

Variant Calling ◽

Human Populations ◽

Sequencing Data ◽

High Coverage ◽

Individual Level ◽

Bioinformatic Tools ◽

The Individual

Abstract Background Population genetic studies of humans make increasing use of high-throughput sequencing in order to capture diversity in an unbiased way. There is an abundance of sequencing technologies, bioinformatic tools and the available genomes are increasing in number. Studies have evaluated and compared some of these technologies and tools, such as the Genome Analysis Toolkit (GATK) and its “Best Practices” bioinformatic pipelines. However, studies often focus on a few genomes of Eurasian origin in order to detect technical issues. We instead surveyed the use of the GATK tools and established a pipeline for processing high coverage full genomes from a diverse set of populations, including Sub-Saharan African groups, in order to reveal challenges from human diversity and stratification. Results We surveyed 29 studies using high-throughput sequencing data, and compared their strategies for data pre-processing and variant calling. We found that processing of data is very variable across studies and that the GATK “Best Practices” are seldom followed strictly. We then compared three versions of a GATK pipeline, differing in the inclusion of an indel realignment step and with a modification of the base quality score recalibration step. We applied the pipelines on a diverse set of 28 individuals. We compared the pipelines in terms of count of called variants and overlap of the callsets. We found that the pipelines resulted in similar callsets, in particular after callset filtering. We also ran one of the pipelines on a larger dataset of 179 individuals. We noted that including more individuals at the joint genotyping step resulted in different counts of variants. At the individual level, we observed that the average genome coverage was correlated to the number of variants called. Conclusions We conclude that applying the GATK “Best Practices” pipeline, including their recommended reference datasets, to underrepresented populations does not lead to a decrease in the number of called variants compared to alternative pipelines. We recommend to aim for coverage of > 30X if identifying most variants is important, and to work with large sample sizes at the variant calling stage, also for underrepresented individuals and populations.

Download Full-text

HaTSPiL: A modular pipeline for high-throughput sequencing data analysis

PLoS ONE ◽

10.1371/journal.pone.0222512 ◽

2019 ◽

Vol 14 (10) ◽

pp. e0222512

Author(s):

Edoardo Morandi ◽

Matteo Cereda ◽

Danny Incarnato ◽

Caterina Parlato ◽

Giulia Basile ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

High throughput sequencing data analysis workflow: mtDNA variant detection and identification of STR/Y-STR alleles and iso-alleles

Forensic Science International Genetics Supplement Series ◽

10.1016/j.fsigss.2019.10.121 ◽

2019 ◽

Vol 7 (1) ◽

pp. 639-640

Author(s):

C.S. Liu ◽

L. Luo ◽

J. McGuigan ◽

J. Wu ◽

J. Todd ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Detection And Identification ◽

Analysis Workflow ◽

Variant Detection ◽

Sequencing Data Analysis

Download Full-text