ngsReports: An R Package for managing FastQC reports and other NGS related log files

Mapping Intimacies ◽

10.1101/313148 ◽

2018 ◽

Cited By ~ 1

Author(s):

Christopher M. Ward ◽

Hein To ◽

Stephen M Pederson

Keyword(s):

Quality Control ◽

Gc Content ◽

R Package ◽

Dependent Manner ◽

Batch Effects ◽

Large Sample ◽

Log Files ◽

Shiny App ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

AbstractMotivationHigh throughput next generation sequencing (NGS) has become exceedingly cheap facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and can be found in the outputs of popular bioinformatics tools such as FastQC and Picard. Although these tools provide considerable power when carrying out QC, large sample numbers can make identification of systemic bias a challenge.ResultsWe present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC output as well as that from aligners such as HISAT2, STAR and Bowtie2. Visualization can be carried out across many samples using heatmaps rendered using ggplot2 and plotly. Moreover, these can be displayed in an interactive shiny app or a HTML report. We also provide methods to assess observed GC content in an organism dependent manner for both transcriptomic and genomic datasets. Importantly, hierarchical clustering can be carried out on heatmaps with large sample sizes to quickly identify outliers and batch effects.Availability and ImplementationngsReports is available at https://github.com/UofABioinformaticsHub/ngsReports.

Download Full-text

ngsReports: a Bioconductor package for managing FastQC reports and other NGS related log files

Bioinformatics ◽

10.1093/bioinformatics/btz937 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2587-2588 ◽

Cited By ~ 10

Author(s):

Christopher M Ward ◽

Thu-Hien To ◽

Stephen M Pederson

Keyword(s):

Quality Control ◽

R Package ◽

Supplementary Information ◽

Bioconductor Package ◽

Supplementary Data ◽

Large Sample ◽

Log Files ◽

Shiny App ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Motivation High throughput next generation sequencing (NGS) has become exceedingly cheap, facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and the outputs of popular bioinformatics tools such as FastQC and Picard can provide information on individual samples. Although these tools provide considerable power when carrying out QC, large sample numbers can make inspection of all samples and identification of systemic bias a challenge. Results We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC reports along with outputs from other tools. Visualization can be carried out across many samples using default, highly customizable plots with options to perform hierarchical clustering to quickly identify outlier libraries. Moreover, these can be displayed in an interactive shiny app or HTML report for ease of analysis. Availability and implementation The ngsReports package is available on Bioconductor and the GUI shiny app is available at https://github.com/UofABioinformaticsHub/shinyNgsreports. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A preliminary Quality Control (QC) for next generation sequencing (NGS) library evaluation turns out to be a very useful tool for a rapid detection of BRCA1/2 deleterious mutations

Clinica Chimica Acta ◽

10.1016/j.cca.2014.06.026 ◽

2014 ◽

Vol 437 ◽

pp. 72-77 ◽

Cited By ~ 16

Author(s):

Paola Concolino ◽

Alessandra Costella ◽

Angelo Minucci ◽

Giovanni Luca Scaglione ◽

Concetta Santonocito ◽

...

Keyword(s):

Quality Control ◽

Next Generation Sequencing ◽

Rapid Detection ◽

Deleterious Mutations ◽

Next Generation ◽

Library Evaluation ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Download Full-text

The effect of variant interference on de novo assembly for viral deep sequencing

10.1101/815480 ◽

2019 ◽

Cited By ~ 1

Author(s):

Christina J. Castro ◽

Rachel L. Marine ◽

Edward Ramos ◽

Terry Fei Fan Ng

Keyword(s):

Deep Sequencing ◽

De Novo ◽

Gc Content ◽

Read Length ◽

Viral Genomes ◽

Minor Variant ◽

Main Driver ◽

Next Generation Sequencing Ngs ◽

Viral Sequences ◽

Generation Sequencing

AbstractViruses have high mutation rates and generally exist as a mixture of variants in biological samples. Next-generation sequencing (NGS) approach has surpassed Sanger for generating long viral sequences, yet how variants affect NGS de novo assembly remains largely unexplored. Our results from >15,000 simulated experiments showed that presence of variants can turn an assembly of one genome into tens to thousands of contigs. This “variant interference” (VI) is highly consistent and reproducible by ten most used de novo assemblers, and occurs independent of genome length, read length, and GC content. The main driver of VI is pairwise identities between viral variants. These findings were further supported by in silico simulations, where selective removal of minor variant reads from clinical datasets allow the “rescue” of full viral genomes from fragmented contigs. These results call for careful interpretation of contigs and contig numbers from de novo assembly in viral deep sequencing.

Download Full-text

Demystification of RNAseq Quality Control

JITA - Journal of Information Technology and Applications (Banja Luka) - APEIRON ◽

10.7251/jit2102073d ◽

2021 ◽

Vol 22 (2) ◽

Author(s):

Dragana Dudić ◽

Bojana Banović Đeri ◽

Vesna Pajić ◽

Gordana Pavlović-Lažetić

Keyword(s):

Quality Control ◽

Next Generation Sequencing ◽

Rna Sequencing ◽

Next Generation ◽

Comprehensive Guidance ◽

Dna And Rna ◽

Control Evaluation ◽

Downstream Analysis ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Next Generation Sequencing (NGS) analysis has become a widely used method for studying the structure of DNA and RNA, but complexity of the procedure leads to obtaining error-prone datasets which need to be cleansed in order to avoid misinterpretation of data. We address the usage and proper interpretations of characteristic metrics for RNA sequencing (RNAseq) quality control, implemented in and reported by FastQC, and provide a comprehensive guidance for their assessment in the context of total RNAseq quality control of Illumina raw reads. Additionally, we give recommendations how to adequately perform the quality control preprocessing step of raw total RNAseq Illumina reads according to the obtained results of the quality control evaluation step; the aim is to provide the best dataset to downstream analysis, rather than to get better FastQC results. We also tested effects of different preprocessing approaches to the downstream analysis and recommended the most suitable approach.

Download Full-text

The Complete Chloroplast Genome of Critically Endangered Chimonobambusa hirtinoda (Poaceae: Chimonobambusa) and Phylogenetic Analysis

10.21203/rs.3.rs-1019626/v1 ◽

2021 ◽

Author(s):

yanjiang liu ◽

Xiao Zhu ◽

Mingli Wu ◽

Xue Xu ◽

Zhaoxia Dai ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Gc Content ◽

Trna Genes ◽

Protein Coding ◽

Complete Chloroplast Genome ◽

Usage Frequency ◽

Cp Genome ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing ◽

Simple Sequence

Abstract Chimonobambusa hirtinoda is a threatened species and only naturally distributed in Doupeng Mountain, Duyun, Guizhou, China. Next-generation sequencing (NGS) is used obtained the complete chloroplast (cp) genome sequence of C. hirtinoda, and then the sequence was assembled and analyze for phylogenetic and evolutionary. We also analyzed comparing the cp genome among Chimonobambusa species with previously published. The complete cp genome of C. hirtinoda has the total length of 139, 561 bp, 38.90% GC content was detected. A total of 130 genes were founded in the cp genome, including 85 protein coding genes, 37 tRNA genes, 8 rRNA. Some genes are missing and the introns occur lost in the cp genome of C. hirtinoda. A total of 48 simple sequence repeat (SSR) were detected and by measuring the codon usage frequency of amino acids, the A/U preference of the third nucleotide in the cp genome of C. hirtinoda was obtained. Furthermore, phylogenetic analysis using complete cp sequences, matk gene exhibited genetic relationship within the Chimonobambusa genus.

Download Full-text

gpart: human genome partitioning and visualization of high-density SNP data by identifying haplotype blocks

Bioinformatics ◽

10.1093/bioinformatics/btz308 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4419-4421 ◽

Cited By ~ 3

Author(s):

Sun Ah Kim ◽

Myriam Brossard ◽

Delnaz Roshandel ◽

Andrew D Paterson ◽

Shelley B Bull ◽

...

Keyword(s):

Clustering Algorithms ◽

R Package ◽

Supplementary Information ◽

Visualization Tool ◽

Sequencing Data ◽

Haplotype Blocks ◽

Snp Data ◽

Computing Environments ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Summary For the analysis of high-throughput genomic data produced by next-generation sequencing (NGS) technologies, researchers need to identify linkage disequilibrium (LD) structure in the genome. In this work, we developed an R package gpart which provides clustering algorithms to define LD blocks or analysis units consisting of SNPs. The visualization tool in gpart can display the LD structure and gene positions for up to 20 000 SNPs in one image. The gpart functions facilitate construction of LD blocks and SNP partitions for vast amounts of genome sequencing data within reasonable time and memory limits in personal computing environments. Availability and implementation The R package is available at https://bioconductor.org/packages/gpart. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Novel bioinformatics quality control metric for next-generation sequencing experiments in the clinical context

Nucleic Acids Research ◽

10.1093/nar/gkz775 ◽

2019 ◽

Vol 47 (21) ◽

pp. e135-e135

Author(s):

Maxim Ivanov ◽

Mikhail Ivanov ◽

Artem Kasianov ◽

Ekaterina Rozhavskaya ◽

Sergey Musienko ◽

...

Keyword(s):

Quality Control ◽

Next Generation Sequencing ◽

Performance Measure ◽

Next Generation ◽

Coverage Depth ◽

Clinical Context ◽

Context Availability ◽

Control Metric ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract As the use of next-generation sequencing (NGS) for the Mendelian diseases diagnosis is expanding, the performance of this method has to be improved in order to achieve higher quality. Typically, performance measures are considered to be designed in the context of each application and, therefore, account for a spectrum of clinically relevant variants. We present EphaGen, a new computational methodology for bioinformatics quality control (QC). Given a single NGS dataset in BAM format and a pre-compiled VCF-file of targeted clinically relevant variants it associates this dataset with a single arbiter parameter. Intrinsically, EphaGen estimates the probability to miss any variant from the defined spectrum within a particular NGS dataset. Such performance measure virtually resembles the diagnostic sensitivity of given NGS dataset. Here we present case studies of the use of EphaGen in context of BRCA1/2 and CFTR sequencing in a series of 14 runs across 43 blood samples and 504 publically available NGS datasets. EphaGen is superior to conventional bioinformatics metrics such as coverage depth and coverage uniformity. We recommend using this software as a QC step in NGS studies in the clinical context. Availability: https://github.com/m4merg/EphaGen or https://hub.docker.com/r/m4merg/ephagen.

Download Full-text

High-Throughput Sequencing of Virus-infected Cucurbita pepo Samples Revealed The Presence of Zucchini Shoestring Virus in Zimbabwe

10.21203/rs.2.15833/v1 ◽

2019 ◽

Author(s):

Charles Karavina ◽

Jacques Davy Ibaba ◽

Augustine Gubba

Keyword(s):

South African ◽

Cucurbita Pepo ◽

High Throughput Sequencing ◽

Gc Content ◽

Large Open Reading Frame ◽

Polya Tail ◽

Reading Frame ◽

Growing Seasons ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Objectives: Plant-infecting viruses remain a serious challenge towards achieving food security worldwide. Cucurbits, in Zimbabwe, like in the other parts of the world, are used in various ways. A small-scaled cucurbit virus survey was conducted in Zimbabwe during the 2014 and 2015 growing seasons. Cucurbit leaf samples displaying virus-like symptoms were collected and stored until analysis. The samples were then subjected to next-generation sequencing (NGS). The data generated from NGS were analysed using genomics technologies. Zucchini shoestring virus (ZSSV), a cucurbit-infecting potyvirus previously described in South Africa was one of the viruses identified. The genomes of three ZSSV isolates from Zimbabwe are described in this note. Results: The three ZSSV isolates had the same genome size of 10297 bp excluding the polyA tail with a 43% GC content. The large open reading frame (ORF) was found at positions 69 to 10106 on the genome and encodes a 3345 amino acids long polyprotein which had the same cleavage site sequences as those described on the South African isolates except for the P1-pro site. The smaller ORF, also called the pretty interesting Potyviridae ORF, was located at positions 3611 to 3793 on the genomes for all three ZSSV isolates.

Download Full-text

Next-Generation Sequencing (NGS) in COVID-19: A Tool for SARS-CoV-2 Diagnosis, Monitoring New Strains and Phylodynamic Modeling in Molecular Epidemiology

Current Issues in Molecular Biology ◽

10.3390/cimb43020061 ◽

2021 ◽

Vol 43 (2) ◽

pp. 845-867

Author(s):

Goldin John ◽

Nikhil Shri Sahajpal ◽

Ashis K. Mondal ◽

Sudha Ananth ◽

Colin Williams ◽

...

Keyword(s):

Quality Control ◽

Next Generation Sequencing ◽

Real World ◽

Next Generation ◽

Comprehensive Review ◽

Current Testing ◽

Phylogenetic Evolution ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

This review discusses the current testing methodologies for COVID-19 diagnosis and explores next-generation sequencing (NGS) technology for the detection of SARS-CoV-2 and monitoring phylogenetic evolution in the current COVID-19 pandemic. The review addresses the development, fundamentals, assay quality control and bioinformatics processing of the NGS data. This article provides a comprehensive review of the obstacles and opportunities facing the application of NGS technologies for the diagnosis, surveillance, and study of SARS-CoV-2 and other infectious diseases. Further, we have contemplated the opportunities and challenges inherent in the adoption of NGS technology as a diagnostic test with real-world examples of its utility in the fight against COVID-19.

Download Full-text

NGseqBasic - a single-command UNIX tool for ATAC-seq, DNaseI-seq, Cut-and-Run, and ChIP-seq data mapping, high-resolution visualisation, and quality control

10.1101/393413 ◽

2018 ◽

Cited By ~ 8

Author(s):

Jelena Telenius ◽

Jim R. Hughes ◽

Keyword(s):

Quality Control ◽

Big Data ◽

Next Generation Sequencing ◽

High Resolution ◽

Data Processing ◽

Version Control ◽

Data Set ◽

Genome Group ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

ABSTRACTWith decreasing cost of next-generation sequencing (NGS), we are observing a rapid rise in the volume of ‘big data’ in academic research, healthcare and drug discovery sectors. The present bottleneck for extracting value from these ‘big data’ sets is data processing and analysis. Considering this, there is still a lack of reliable, automated and easy to use tools that will allow experimentalists to assess the quality of the sequenced libraries and explore the data first hand, without the need of investing a lot of time of computational core analysts in the early stages of analysis.NGseqBasic is an easy-to-use single-command analysis tool for chromatin accessibility (ATAC, DNaseI) and ChIP sequencing data, providing support to also new techniques such as low cell number sequencing and Cut-and-Run. It takes in fastq, fastq.gz or bam files, conducts all quality control, trimming and mapping steps, along with quality control and data processing statistics, and combines all this to a single-click loadable UCSC data hub, with integral statistics html page providing detailed reports from the analysis tools and quality control metrics. The tool is easy to set up, and no installation is needed. A wide variety of parameters are provided to fine-tune the analysis, with optional setting to generate DNase footprint or high resolution ChIP-seq tracks. A tester script is provided to help in the setup, along with a test data set and downloadable example user cases.NGseqBasic has been used in the routine analysis of next generation sequencing (NGS) data in high-impact publications 1,2. The code is actively developed, and accompanied with Git version control and Github code repository. Here we demonstrate NGseqBasic analysis and features using DNaseI-seq data from GSM689849, and CTCF-ChIP-seq data from GSM2579421, as well as a Cut-and-Run CTCF data set GSM2433142, and provide the one-click loadable UCSC data hubs generated by the tool, allowing for the ready exploration of the run results and quality control files generated by the tool.AvailabilityDownload, setup and help instructions are available on the NGseqBasic web site http://userweb.molbiol.ox.ac.uk/public/telenius/NGseqBasicManual/external/Bioconda users can load the tool as library “ngseqbasic”. The source code with Git version control is available in https://github.com/Hughes-Genome-Group/NGseqBasic/[email protected]

Download Full-text