scholarly journals AlmostSignificant: Simplifying quality control of high-throughput sequencing data

2016 ◽  
Author(s):  
Joseph Ward ◽  
Christian Cole ◽  
Melanie Febrer ◽  
Geoffrey Barton

AbstractMotivationThe current generation of DNA sequencing technologies produce a large amount of data quickly. All of these data need to pass some form of quality control processing and checking before they can be used for any analysis. The large number of samples that are run through Illumina sequencing machines makes the process of quality control an onerous and time-consuming task that requires multiple pieces of information from several sources.ResultsAlmostSignificant is an open-source platform for aggregating multiple sources of quality metrics as well as meta-data associated with DNA sequencing runs from Illumina sequencing machines. AlmostSignificant is a graphical platform to streamline the quality control of DNA sequencing data, to collect and store these data for future reference and to collect extra meta-data associated with the sequencing runs to check for errors and monitor the volume of data produced by the associated machines. AlmostSignificant has been used to track the quality of over 80 sequencing runs covering over 2500 samples produced over the last three years.AvailabilityThe code and documentation for AlmostSignificant is freely available at https://github.com/bartongroup/[email protected], [email protected]

MycoKeys ◽  
2018 ◽  
Vol 39 ◽  
pp. 29-40 ◽  
Author(s):  
Sten Anslan ◽  
R. Henrik Nilsson ◽  
Christian Wurzbacher ◽  
Petr Baldrian ◽  
Leho Tedersoo ◽  
...  

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.


2014 ◽  
Vol 13s1 ◽  
pp. CIN.S13890 ◽  
Author(s):  
Changjin Hong ◽  
Solaiappan Manimaran ◽  
William Evan Johnson

Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/ .


2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Rajesh Detroja ◽  
Alessandro Gorohovski ◽  
Olawumi Giwa ◽  
Gideon Baum ◽  
Milana Frenkel-Morgenstern

Abstract Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation sequencing technologies, such as DNA-Seq or RNA-Seq, together with the development of suitable computational tools, has made the global identification of chimeras in tumors possible. However, the testing of over 20 computational methods showed these to be limited in terms of chimera prediction sensitivity, specificity, and accurate quantification of junction reads. These shortcomings motivated us to develop the first ‘reference-based’ approach termed ChiTaH (Chimeric Transcripts from High–throughput sequencing data). ChiTaH uses 43,466 non–redundant known human chimeras as a reference database to map sequencing reads and to accurately identify chimeric reads. We benchmarked ChiTaH and four other methods to identify human chimeras, leveraging both simulated and real sequencing datasets. ChiTaH was found to be the most accurate and fastest method for identifying known human chimeras from simulated and sequencing datasets. Moreover, especially ChiTaH uncovered heterogeneity of the BCR-ABL1 chimera in both bulk and single-cells of the K-562 cell line, which was confirmed experimentally.


Author(s):  
Junwei Luo ◽  
Yawei Wei ◽  
Mengna Lyu ◽  
Zhengjiang Wu ◽  
Xiaoyan Liu ◽  
...  

Abstract In the field of genome assembly, scaffolding methods make it possible to obtain a more complete and contiguous reference genome, which is the cornerstone of genomic research. Scaffolding methods typically utilize the alignments between contigs and sequencing data (reads) to determine the orientation and order among contigs and to produce longer scaffolds, which are helpful for genomic downstream analysis. With the rapid development of high-throughput sequencing technologies, diverse types of reads have emerged over the past decade, especially in long-range sequencing, which have greatly enhanced the assembly quality of scaffolding methods. As the number of scaffolding methods increases, biology and bioinformatics researchers need to perform in-depth analyses of state-of-the-art scaffolding methods. In this article, we focus on the difficulties in scaffolding, the differences in characteristics among various kinds of reads, the methods by which current scaffolding methods address these difficulties, and future research opportunities. We hope this work will benefit the design of new scaffolding methods and the selection of appropriate scaffolding methods for specific biological studies.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Renesh Bedre ◽  
Carlos Avila ◽  
Kranthi Mandadi

AbstractUse of high-throughput sequencing (HTS) has become indispensable in life science research. Raw HTS data contains several sequencing artifacts, and as a first step it is imperative to remove the artifacts for reliable downstream bioinformatics analysis. Although there are multiple stand-alone tools available that can perform the various quality control steps separately, availability of an integrated tool that can allow one-step, automated quality control analysis of HTS datasets will significantly enhance handling large number of samples parallelly. Here, we developed HTSQualC, a stand-alone, flexible, and easy-to-use software for one-step quality control analysis of raw HTS data. HTSQualC can evaluate HTS data quality and perform filtering and trimming analysis in a single run. We evaluated the performance of HTSQualC for conducting batch analysis of HTS datasets with 322 samples with an average ~ 1 M (paired end) sequence reads per sample. HTSQualC accomplished the QC analysis in ~ 3 h in distributed mode and ~ 31 h in shared mode, thus underscoring its utility and robust performance. In addition to command-line execution, we integrated HTSQualC into the free, open-source, CyVerse cyberinfrastructure resource as a GUI interface, for wider access to experimental biologists who have limited computational resources and/or programming abilities.


2016 ◽  
Vol 32 (24) ◽  
pp. 3850-3851 ◽  
Author(s):  
Joseph Ward ◽  
Christian Cole ◽  
Melanie Febrer ◽  
Geoffrey J. Barton

Sign in / Sign up

Export Citation Format

Share Document