AlmostSignificant: Simplifying quality control of high-throughput sequencing data

Mapping Intimacies ◽

10.1101/053702 ◽

2016 ◽

Author(s):

Joseph Ward ◽

Christian Cole ◽

Melanie Febrer ◽

Geoffrey Barton

Keyword(s):

Quality Control ◽

Dna Sequencing ◽

Illumina Sequencing ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Multiple Sources ◽

Meta Data ◽

Sequencing Technologies ◽

High Throughput Sequencing Data

AbstractMotivationThe current generation of DNA sequencing technologies produce a large amount of data quickly. All of these data need to pass some form of quality control processing and checking before they can be used for any analysis. The large number of samples that are run through Illumina sequencing machines makes the process of quality control an onerous and time-consuming task that requires multiple pieces of information from several sources.ResultsAlmostSignificant is an open-source platform for aggregating multiple sources of quality metrics as well as meta-data associated with DNA sequencing runs from Illumina sequencing machines. AlmostSignificant is a graphical platform to streamline the quality control of DNA sequencing data, to collect and store these data for future reference and to collect extra meta-data associated with the sequencing runs to check for errors and monitor the volume of data produced by the associated machines. AlmostSignificant has been used to track the quality of over 80 sequencing runs covering over 2500 samples produced over the last three years.AvailabilityThe code and documentation for AlmostSignificant is freely available at https://github.com/bartongroup/[email protected], [email protected]

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

MycoKeys ◽

10.3897/mycokeys.39.28109 ◽

2018 ◽

Vol 39 ◽

pp. 29-40 ◽

Cited By ~ 21

Author(s):

Sten Anslan ◽

R. Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text

PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

Cancer Informatics ◽

10.4137/cin.s13890 ◽

2014 ◽

Vol 13s1 ◽

pp. CIN.S13890 ◽

Cited By ~ 1

Author(s):

Changjin Hong ◽

Solaiappan Manimaran ◽

William Evan Johnson

Keyword(s):

Quality Control ◽

High Throughput ◽

High Performance ◽

High Throughput Sequencing ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Sequencing Data ◽

Computationally Efficient ◽

High Throughput Sequencing Data ◽

Downstream Analysis

Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/ .

Download Full-text

Rqc: A Bioconductor Package for Quality Control of High-Throughput Sequencing Data

Journal of Statistical Software ◽

10.18637/jss.v087.c02 ◽

2018 ◽

Vol 87 (Code Snippet 2) ◽

Cited By ~ 2

Author(s):

Wélliton de Souza ◽

Benilton de Sá Carvalho ◽

Iscia Lopes-Cendes

Keyword(s):

Quality Control ◽

High Throughput ◽

High Throughput Sequencing ◽

Bioconductor Package ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

Calculating the quality of public high-throughput sequencing data to obtain a suitable subset for reanalysis from the Sequence Read Archive

GigaScience ◽

10.1093/gigascience/gix029 ◽

2017 ◽

Vol 6 (6) ◽

Cited By ~ 5

Author(s):

Tazro Ohta ◽

Takeru Nakazato ◽

Hidemasa Bono

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Sequence Read Archive ◽

High Throughput Sequencing Data

Download Full-text

Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btv566 ◽

2015 ◽

pp. btv566 ◽

Cited By ~ 210

Author(s):

Konstantin Okonechnikov ◽

Ana Conesa ◽

Fernando García-Alcalde

Keyword(s):

Quality Control ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Sample Quality ◽

High Throughput Sequencing Data

Download Full-text

ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab112 ◽

2021 ◽

Vol 3 (4) ◽

Author(s):

Rajesh Detroja ◽

Alessandro Gorohovski ◽

Olawumi Giwa ◽

Gideon Baum ◽

Milana Frenkel-Morgenstern

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Complex Disease ◽

Single Cells ◽

Reference Database ◽

Sequencing Data ◽

Sequencing Technologies ◽

High Throughput Sequencing Data ◽

Chimeric Rnas ◽

Sensitivity Specificity

Abstract Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation sequencing technologies, such as DNA-Seq or RNA-Seq, together with the development of suitable computational tools, has made the global identification of chimeras in tumors possible. However, the testing of over 20 computational methods showed these to be limited in terms of chimera prediction sensitivity, specificity, and accurate quantification of junction reads. These shortcomings motivated us to develop the first ‘reference-based’ approach termed ChiTaH (Chimeric Transcripts from High–throughput sequencing data). ChiTaH uses 43,466 non–redundant known human chimeras as a reference database to map sequencing reads and to accurately identify chimeric reads. We benchmarked ChiTaH and four other methods to identify human chimeras, leveraging both simulated and real sequencing datasets. ChiTaH was found to be the most accurate and fastest method for identifying known human chimeras from simulated and sequencing datasets. Moreover, especially ChiTaH uncovered heterogeneity of the BCR-ABL1 chimera in both bulk and single-cells of the K-562 cell line, which was confirmed experimentally.

Download Full-text

A comprehensive review of scaffolding methods in genome assembly

Briefings in Bioinformatics ◽

10.1093/bib/bbab033 ◽

2021 ◽

Author(s):

Junwei Luo ◽

Yawei Wei ◽

Mengna Lyu ◽

Zhengjiang Wu ◽

Xiaoyan Liu ◽

...

Keyword(s):

Genome Assembly ◽

High Throughput Sequencing ◽

Rapid Development ◽

Genomic Research ◽

Future Research ◽

Sequencing Data ◽

Sequencing Technologies ◽

Biological Studies ◽

Downstream Analysis

Abstract In the field of genome assembly, scaffolding methods make it possible to obtain a more complete and contiguous reference genome, which is the cornerstone of genomic research. Scaffolding methods typically utilize the alignments between contigs and sequencing data (reads) to determine the orientation and order among contigs and to produce longer scaffolds, which are helpful for genomic downstream analysis. With the rapid development of high-throughput sequencing technologies, diverse types of reads have emerged over the past decade, especially in long-range sequencing, which have greatly enhanced the assembly quality of scaffolding methods. As the number of scaffolding methods increases, biology and bioinformatics researchers need to perform in-depth analyses of state-of-the-art scaffolding methods. In this article, we focus on the difficulties in scaffolding, the differences in characteristics among various kinds of reads, the methods by which current scaffolding methods address these difficulties, and future research opportunities. We hope this work will benefit the design of new scaffolding methods and the selection of appropriate scaffolding methods for specific biological studies.

Download Full-text

HTSQualC is a flexible and one-step quality control software for high-throughput sequencing data analysis

Scientific Reports ◽

10.1038/s41598-021-98124-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Renesh Bedre ◽

Carlos Avila ◽

Kranthi Mandadi

Keyword(s):

Quality Control ◽

High Throughput ◽

High Throughput Sequencing ◽

Science Research ◽

Control Analysis ◽

Sequencing Data ◽

Quality Control Analysis ◽

High Throughput Sequencing Data ◽

One Step ◽

Automated Quality Control

AbstractUse of high-throughput sequencing (HTS) has become indispensable in life science research. Raw HTS data contains several sequencing artifacts, and as a first step it is imperative to remove the artifacts for reliable downstream bioinformatics analysis. Although there are multiple stand-alone tools available that can perform the various quality control steps separately, availability of an integrated tool that can allow one-step, automated quality control analysis of HTS datasets will significantly enhance handling large number of samples parallelly. Here, we developed HTSQualC, a stand-alone, flexible, and easy-to-use software for one-step quality control analysis of raw HTS data. HTSQualC can evaluate HTS data quality and perform filtering and trimming analysis in a single run. We evaluated the performance of HTSQualC for conducting batch analysis of HTS datasets with 322 samples with an average ~ 1 M (paired end) sequence reads per sample. HTSQualC accomplished the QC analysis in ~ 3 h in distributed mode and ~ 31 h in shared mode, thus underscoring its utility and robust performance. In addition to command-line execution, we integrated HTSQualC into the free, open-source, CyVerse cyberinfrastructure resource as a GUI interface, for wider access to experimental biologists who have limited computational resources and/or programming abilities.

Download Full-text