An NGS Workflow Blueprint for DNA Sequencing Data and Its Application in Individualized Molecular Oncology

Cancer Informatics ◽

10.4137/cin.s30793 ◽

2015 ◽

Vol 14s5 ◽

pp. CIN.S30793 ◽

Cited By ~ 2

Author(s):

Jian Li ◽

Aarif Mohamed Nazeer Batcha ◽

Björn Gaining ◽

Ulrich R. Mansmann

Keyword(s):

Data Storage ◽

Individualized Medicine ◽

Integrated Analysis ◽

Sequencing Data ◽

Molecular Oncology ◽

Analysis Workflow ◽

Sequencing Quality ◽

Technical Simplicity ◽

Next Generation Sequencing Ngs ◽

Ngs Data

Next-generation sequencing (NGS) technologies that have advanced rapidly in the past few years possess the potential to classify diseases, decipher the molecular code of related cell processes, identify targets for decision-making on targeted therapy or prevention strategies, and predict clinical treatment response. Thus, NGS is on its way to revolutionize oncology. With the help of NGS, we can draw a finer map for the genetic basis of diseases and can improve our understanding of diagnostic and prognostic applications and therapeutic methods. Despite these advantages and its potential, NGS is facing several critical challenges, including reduction of sequencing cost, enhancement of sequencing quality, improvement of technical simplicity and reliability, and development of semiautomated and integrated analysis workflow. In order to address these challenges, we conducted a literature research and summarized a four-stage NGS workflow for providing a systematic review on NGS-based analysis, explaining the strength and weakness of diverse NGS-based software tools, and elucidating its potential connection to individualized medicine. By presenting this four-stage NGS workflow, we try to provide a minimal structural layout required for NGS data storage and reproducibility.

Download Full-text

Communicating Regulatory High Throughput Sequencing Data Using BioCompute Objects

10.1101/2020.12.07.415059 ◽

2020 ◽

Author(s):

Charles Hadley S. King ◽

Jonathon Keeney ◽

Nuria Guimera ◽

Souvik Das ◽

Brian Fochtman ◽

...

Keyword(s):

High Throughput Sequencing ◽

Biological Data ◽

Sequencing Data ◽

Regulatory Submission ◽

High Throughput Sequencing Data ◽

Analysis Workflow ◽

Regulatory Submissions ◽

High Concordance ◽

Next Generation Sequencing Ngs ◽

Ngs Data

AbstractFor regulatory submissions of next generation sequencing (NGS) data it is vital for the analysis workflow to be robust, reproducible, and understandable. This project demonstrates that the use of the IEEE 2791-2020 Standard, (BioCompute objects [BCO]) enables complete and concise communication of NGS data analysis results. One arm of a clinical trial was replicated using synthetically generated data made to resemble real biological data. Two separate, independent analyses were then carried out using BCOs as the tool for communication of analysis: one to simulate a pharmaceutical regulatory submission to the FDA, and another to simulate the FDA review. The two results were compared and tabulated for concordance analysis: of the 118 simulated patient samples generated, the final results of 117 (99.15%) were in agreement. This high concordance rate demonstrates the ability of a BCO, when a verification kit is included, to effectively capture and clearly communicate NGS analyses within regulatory submissions. BCO promotes transparency and induces reproducibility, thereby reinforcing trust in the regulatory submission process.

Download Full-text

The ICR96 exon CNV validation series: a resource for orthogonal assessment of exon CNV calling in NGS data

Wellcome Open Research ◽

10.12688/wellcomeopenres.11689.1 ◽

2017 ◽

Vol 2 ◽

pp. 35 ◽

Cited By ~ 7

Author(s):

Shazia Mahamdallie ◽

Elise Ruark ◽

Shawn Yost ◽

Emma Ramsay ◽

Imran Uddin ◽

...

Keyword(s):

Sequencing Data ◽

Targeted Next Generation Sequencing ◽

Negative Results ◽

Targeted Ngs ◽

Predisposition Genes ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Validation Series ◽

Generation Sequencing ◽

Dependent Probe

Detection of deletions and duplications of whole exons (exon CNVs) is a key requirement of genetic testing. Accurate detection of this variant type has proved very challenging in targeted next-generation sequencing (NGS) data, particularly if only a single exon is involved. Many different NGS exon CNV calling methods have been developed over the last five years. Such methods are usually evaluated using simulated and/or in-house data due to a lack of publicly-available datasets with orthogonally generated results. This hinders tool comparisons, transparency and reproducibility. To provide a community resource for assessment of exon CNV calling methods in targeted NGS data, we here present the ICR96 exon CNV validation series. The dataset includes high-quality sequencing data from a targeted NGS assay (the TruSight Cancer Panel) together with Multiplex Ligation-dependent Probe Amplification (MLPA) results for 96 independent samples. 66 samples contain at least one validated exon CNV and 30 samples have validated negative results for exon CNVs in 26 genes. The dataset includes 46 exon CNVs in BRCA1, BRCA2, TP53, MLH1, MSH2, MSH6, PMS2, EPCAM or PTEN, giving excellent representation of the cancer predisposition genes most frequently tested in clinical practice. Moreover, the validated exon CNVs include 25 single exon CNVs, the most difficult type of exon CNV to detect. The FASTQ files for the ICR96 exon CNV validation series can be accessed through the European-Genome phenome Archive (EGA) under the accession number EGAS00001002428.

Download Full-text

Bioinformatic strategies for the analysis of genomic aberrations detected by targeted NGS panels with clinical application

PeerJ ◽

10.7717/peerj.10897 ◽

2021 ◽

Vol 9 ◽

pp. e10897

Author(s):

Jakub Hynst ◽

Veronika Navrkalova ◽

Karol Pal ◽

Sarka Pospisilova

Keyword(s):

Relevant Information ◽

Bioinformatic Analysis ◽

Rapid Identification ◽

Integrated Analysis ◽

Clinical Settings ◽

Sequencing Data ◽

Targeted Ngs ◽

Genomic Aberrations ◽

Validation Procedure ◽

Next Generation Sequencing Ngs

Molecular profiling of tumor samples has acquired importance in cancer research, but currently also plays an important role in the clinical management of cancer patients. Rapid identification of genomic aberrations improves diagnosis, prognosis and effective therapy selection. This can be attributed mainly to the development of next-generation sequencing (NGS) methods, especially targeted DNA panels. Such panels enable a relatively inexpensive and rapid analysis of various aberrations with clinical impact specific to particular diagnoses. In this review, we discuss the experimental approaches and bioinformatic strategies available for the development of an NGS panel for a reliable analysis of selected biomarkers. Compliance with defined analytical steps is crucial to ensure accurate and reproducible results. In addition, a careful validation procedure has to be performed before the application of NGS targeted assays in routine clinical practice. With more focus on bioinformatics, we emphasize the need for thorough pipeline validation and management in relation to the particular experimental setting as an integral part of the NGS method establishment. A robust and reproducible bioinformatic analysis running on powerful machines is essential for proper detection of genomic variants in clinical settings since distinguishing between experimental noise and real biological variants is fundamental. This review summarizes state-of-the-art bioinformatic solutions for careful detection of the SNV/Indels and CNVs for targeted sequencing resulting in translation of sequencing data into clinically relevant information. Finally, we share our experience with the development of a custom targeted NGS panel for an integrated analysis of biomarkers in lymphoproliferative disorders.

Download Full-text

HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data

10.1101/220830 ◽

2017 ◽

Cited By ~ 1

Author(s):

Xin Zhou ◽

Serafim Batzoglou ◽

Arend Sidow ◽

Lu Zhang

Keyword(s):

False Positive ◽

De Novo ◽

False Positives ◽

Sequencing Data ◽

De Novo Mutations ◽

Congenital Diseases ◽

Genome Wide ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Haplotype Information

AbstractBackgroundDe novo mutations (DNMs) are associated with neurodevelopmental and congenital diseases, and their detection can contribute to understanding disease pathogenicity. However, accurate detection is challenging because of their small number relative to the genome-wide false positives in next generation sequencing (NGS) data. Software such as DeNovoGear and TrioDeNovo have been developed to detect DNMs, but at good sensitivity they still produce many false positive calls.ResultsTo address this challenge, we develop HAPDeNovo, a program that leverages phasing information from linked read sequencing, to remove false positive DNMs from candidate lists generated by DNM-detection tools. Short reads from each phasing block are allocated to each of the two haplotypes followed by generating a haploid genotype for each putative DNM.HAPDeNovo removes variants that are called as heterozygous in one of the haplotypes because they are almost certainly false positives. Our experiments on 10X Chromium linked read sequencing trio data reveal that HAPDeNovo eliminates 80% to 99% of false positives regardless of how large the candidate DNM set is.ConclusionsHAPDeNovo leverages the haplotype information from linked read sequencing to remove spurious false positive DNMs effectively, and it increases accuracy of DNM detection dramatically without sacrificing sensitivity.

Download Full-text

Broom: Application for non-redundant storage of High Throughput Sequencing data

10.1101/312306 ◽

2018 ◽

Author(s):

Levent Albayrak ◽

Kamil Khanipov ◽

George Golovko ◽

Yuriy Fofanov

Keyword(s):

Data Storage ◽

High Throughput ◽

High Throughput Sequencing ◽

Data Generation ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Quality ◽

Redundant Storage ◽

Recent Trends ◽

The Cost

AbstractMotivationThe data generation capabilities of High Throughput Sequencing (HTS) instruments have exponentially increased over the last few years, while the cost of sequencing has dramatically decreased allowing this technology to become widely used in biomedical studies. For small labs and individual researchers, however, storage and transfer of large amounts of HTS data present a significant challenge. The recent trends in increased sequencing quality and genome coverage can be used to reconsider HTS data storage strategies.ResultsWe present Broom, a stand-alone application designed to select and store only high-quality sequencing reads at extremely high compression rates. Written in C++, the application accepts single and paired-end reads in FASTQ and FASTA formats and decompresses data in FASTA format.AvailabilityC++ code available at https://scsb.utmb.edu/labgroups/fofanov/[email protected]

Download Full-text

MitoSuite: a graphical tool for human mitochondrial genome profiling in massive parallel sequencing

PeerJ ◽

10.7717/peerj.3406 ◽

2017 ◽

Vol 5 ◽

pp. e3406 ◽

Cited By ~ 12

Author(s):

Koji Ishiya ◽

Shintaroh Ueda

Keyword(s):

Mitochondrial Genome ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Coverage ◽

Graphical Tool ◽

Genome Variations ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Human Mitochondrial Genome

Recent rapid advances in high-throughput, next-generation sequencing (NGS) technologies have promoted mitochondrial genome studies in the fields of human evolution, medical genetics, and forensic casework. However, scientists unfamiliar with computer programming often find it difficult to handle the massive volumes of data that are generated by NGS. To address this limitation, we developed MitoSuite, a user-friendly graphical tool for analysis of data from high-throughput sequencing of the human mitochondrial genome. MitoSuite generates a visual report on NGS data with simple mouse operations. Moreover, it analyzes high-coverage sequencing data but runs on a stand-alone computer, without the need for file upload. Therefore, MitoSuite offers outstanding usability for handling massive NGS data, and is ideal for evolutionary, clinical, and forensic studies on the human mitochondrial genome variations. It is freely available for download from the website https://mitosuite.com.

Download Full-text

Ktrim: an extra-fast and accurate adapter- and quality-trimmer for sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa171 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3561-3562 ◽

Cited By ~ 8

Author(s):

Kun Sun

Keyword(s):

Data Preprocessing ◽

Poor Quality ◽

Read Length ◽

Supplementary Information ◽

Sequencing Data ◽

Efficient Tool ◽

Source Codes ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Abstract Motivation Next-generation sequencing (NGS) data frequently suffer from poor-quality cycles and adapter contaminations therefore need to be preprocessed before downstream analyses. With the ever-growing throughput and read length of modern sequencers, the preprocessing step turns to be a bottleneck in data analysis due to unmet performance of current tools. Extra-fast and accurate adapter- and quality-trimming tools for sequencing data preprocessing are therefore still of urgent demand. Results Ktrim was developed in this work. Key features of Ktrim include: built-in support to adapters of common library preparation kits; supports user-supplied, customized adapter sequences; supports both paired-end and single-end data; supports parallelization to accelerate the analysis. Ktrim was ∼2–18 times faster than current tools and also showed high accuracy when applied on the testing datasets. Ktrim could thus serve as a valuable and efficient tool for short-read NGS data preprocessing. Availability and implementation Source codes and scripts to reproduce the results descripted in this article are freely available at https://github.com/hellosunking/Ktrim/, distributed under the GPL v3 license. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Sharing of Very Short IBD Segments between Humans, Neandertals, and Denisovans

10.1101/003988 ◽

2014 ◽

Author(s):

Gundula Povysil ◽

Sepp Hochreiter

Keyword(s):

Gene Flow ◽

Rare Variants ◽

Demographic History ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Chromosome X ◽

False Discovery ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

We analyze the sharing of very short identity by descent (IBD) segments between humans, Neandertals, and Denisovans to gain new insights into their demographic history. Short IBD segments convey information about events far back in time because the shorter IBD segments are, the older they are assumed to be. The identification of short IBD segments becomes possible through next generation sequencing (NGS), which offers high variant density and reports variants of all frequencies. However, only recently HapFABIA has been proposed as the first method for detecting very short IBD segments in NGS data. HapFABIA utilizes rare variants to identify IBD segments with a low false discovery rate. We applied HapFABIA to the 1000 Genomes Project whole genome sequencing data to identify IBD segments that are shared within and between populations. Many IBD segments have to be old since they are shared with Neandertals or Denisovans, which explains their shorter lengths compared to segments that are not shared with these ancient genomes. The Denisova genome most prominently matches IBD segments that are shared by Asians. Many of these segments were found exclusively in Asians and they are longer than segments shared between other continental populations and the Denisova genome. Therefore, we could confirm an introgression from Deniosvans into ancestors of Asians after their migration out of Africa. While Neandertal-matching IBD segments are most often shared by Asians, Europeans share a considerably higher percentage of IBD segments with Neandertals compared to other populations, too. Again, many of these Neandertal-matching IBD segments are found exclusively in Asians, whereas Neandertal-matching IBD segments that are shared by Europeans are often found in other populations, too. Neandertal-matching IBD segments that are shared by Asians or Europeans are longer than those observed in Africans. These IBD segments hint at a gene flow from Neandertals into ancestors of Asians and Europeans after they left Africa. Interestingly, many Neandertal- and/or Denisova-matching IBD segments are predominantly observed in Africans - some of them even exclusively. IBD segments shared between Africans and Neandertals or Denisovans are strikingly short, therefore we assume that they are very old. Consequently, we conclude that DNA regions from ancestors of humans, Neandertals, and Denisovans have survived in Africans. As expected, IBD segments on chromosome X are on average longer than IBD segments on the autosomes. Neandertal-matching IBD segments on chromosome X confirm gene flow from Neandertals into ancestors of Asians and Europeans outside Africa that was already found on the autosomes. Interestingly, there is hardly any signal of Denisova introgression on the X chromosome.

Download Full-text

Automated processing of NGS data from raw sequencing files to ready-to-use information tables for genome modeling

Genomics and Computational Biology ◽

10.18547/gcb.2018.vol4.iss2.e100042 ◽

2018 ◽

Vol 4 (2) ◽

pp. 100042

Author(s):

Robert Deelen ◽

Martin Wieland ◽

Susanne Gerber ◽

David Fournier

Keyword(s):

Regulation Of Gene Expression ◽

Sequencing Data ◽

Processing Power ◽

Automated Processing ◽

Speed Up ◽

Data Files ◽

On Chip ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data

Epigenetic features such as histone and DNA modifications are important mechanisms for the regulation of gene expression and for cell and tissue development. As a result, extensive efforts are currently undertaken using next-generation sequencing (NGS) to generate vast amounts of data regarding the epigenetic regulation of genomes. Several tools and frameworks for the processing of these NGS data have been developed in the last decade. Nevertheless, each user still bares the challenge to integrate all these tasks to perform the analysis. This procedure is not only tedious but also resource-intensive due to the putative large processing power involved. To automate, standardize and speed up the handling of NGS data, with focus on ChIP-seq data, we present a user-friendly pipeline that automatically processes a list of sequencing data files and returns a ready-to-use purified table for subsequent modelling or analysis attempts.

Download Full-text

CoMA – an intuitive and user-friendly pipeline for amplicon-sequencing data analysis

PLoS ONE ◽

10.1371/journal.pone.0243241 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243241

Author(s):

Sebastian Hupfauf ◽

Mohammad Etemadi ◽

Marina Fernández-Delgado Juárez ◽

María Gómez-Brandón ◽

Heribert Insam ◽

...

Keyword(s):

Operating System ◽

Data Analysis ◽

Amplicon Sequencing ◽

Sequencing Data ◽

Taxonomic Assignment ◽

Benchmark Test ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data ◽

Mock Communities

In recent years, there has been a veritable boost in next-generation sequencing (NGS) of gene amplicons in biological and medical studies. Huge amounts of data are produced and need to be analyzed adequately. Various online and offline analysis tools are available; however, most of them require extensive expertise in computer science or bioinformatics, and often a Linux-based operating system. Here, we introduce “CoMA–Comparative Microbiome Analysis” as a free and intuitive analysis pipeline for amplicon-sequencing data, compatible with any common operating system. Moreover, the tool offers various useful services including data pre-processing, quality checking, clustering to operational taxonomic units (OTUs), taxonomic assignment, data post-processing, data visualization, and statistical appraisal. The workflow results in highly esthetic and publication-ready graphics, as well as output files in standardized formats (e.g. tab-delimited OTU-table, BIOM, NEWICK tree) that can be used for more sophisticated analyses. The CoMA output was validated by a benchmark test, using three mock communities with different sample characteristics (primer set, amplicon length, diversity). The performance was compared with that of Mothur, QIIME and QIIME2-DADA2, popular packages for NGS data analysis. Furthermore, the functionality of CoMA is demonstrated on a practical example, investigating microbial communities from three different soils (grassland, forest, swamp). All tools performed well in the benchmark test and were able to reveal the majority of all genera in the mock communities. Also for the soil samples, the results of CoMA were congruent to those of the other pipelines, in particular when looking at the key microbial players.

Download Full-text