Identification of single nucleotide variants using position-specific error estimation in deep sequencing data

Mapping Intimacies ◽

10.1101/475947 ◽

2018 ◽

Author(s):

Dimitrios Kleftogiannis ◽

Marco Punta ◽

Anuradha Jayaram ◽

Shahneen Sandhu ◽

Stephen Q. Wong ◽

...

Keyword(s):

Deep Sequencing ◽

Low Frequency ◽

Poisson Model ◽

Real Data ◽

Analytical Sensitivity ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Deep Sequencing Data ◽

Targeted Deep Sequencing

AbstractBackgroundTargeted deep sequencing is a highly effective technology to identify known and novel single nucleotide variants (SNVs) with many applications in translational medicine, disease monitoring and cancer profiling. However, identification of SNVs using deep sequencing data is a challenging computational problem as different sequencing artifacts limit the analytical sensitivity of SNV detection, especially at low variant allele frequencies (VAFs).MethodsTo address the problem of relatively high noise levels in amplicon-based deep sequencing data (e.g. with the Ion AmpliSeq technology) in the context of SNV calling, we have developed a new bioinformatics tool called AmpliSolve. AmpliSolve uses a set of normal samples to model position-specific, strand-specific and nucleotide-specific background artifacts (noise), and deploys a Poisson model-based statistical framework for SNV detection.ResultsOur tests on both synthetic and real data indicate that AmpliSolve achieves a good trade-off between precision and sensitivity, even at VAF below 5% and as low as 1%. We further validate AmpliSolve by applying it to the detection of SNVs in 96 circulating tumor DNA samples at three clinically relevant genomic positions and compare the results to digital droplet PCR experiments.ConclusionsAmpliSolve is a new tool for in-silico estimation of background noise and for detection of low frequency SNVs in targeted deep sequencing data. Although AmpliSolve has been specifically designed for and tested on amplicon-based libraries sequenced with the Ion Torrent platform it can, in principle, be applied to other sequencing platforms as well. AmpliSolve is freely available at https://github.com/dkleftogi/AmpliSolve.

Download Full-text

Identification of single nucleotide variants using position-specific error estimation in deep sequencing data

BMC Medical Genomics ◽

10.1186/s12920-019-0557-9 ◽

2019 ◽

Vol 12 (1) ◽

Cited By ~ 5

Author(s):

Dimitrios Kleftogiannis ◽

Marco Punta ◽

Anuradha Jayaram ◽

Shahneen Sandhu ◽

Stephen Q. Wong ◽

...

Keyword(s):

Error Estimation ◽

Deep Sequencing ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Deep Sequencing Data ◽

Specific Error

Download Full-text

Nautilus: A Bioinformatics Package for the Analysis of HIV Type 1 Targeted Deep Sequencing Data

AIDS Research and Human Retroviruses ◽

10.1089/aid.2013.0175 ◽

2013 ◽

Vol 29 (10) ◽

pp. 1361-1364 ◽

Cited By ~ 6

Author(s):

Gustavo H. Kijak ◽

Phuc Pham ◽

Eric Sanders-Buell ◽

Elizabeth A. Harbolick ◽

Leigh Anne Eller ◽

...

Keyword(s):

Deep Sequencing ◽

Sequencing Data ◽

Deep Sequencing Data ◽

Targeted Deep Sequencing ◽

Hiv Type 1

Download Full-text

Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data

BMC Genomics ◽

10.1186/s12864-015-1456-x ◽

2015 ◽

Vol 16 (1) ◽

Cited By ~ 34

Author(s):

Richard J Orton ◽

Caroline F Wright ◽

Marco J Morelli ◽

David J King ◽

David J Paton ◽

...

Keyword(s):

Deep Sequencing ◽

Low Frequency ◽

Sequencing Data ◽

Rt Pcr ◽

Deep Sequencing Data ◽

Sequence Errors

Download Full-text

Reanalysis of deep-sequencing data from Austria points towards a small SARS-COV-2 transmission bottleneck on the order of one to three virions

10.1101/2021.02.22.432096 ◽

2021 ◽

Author(s):

Michael A. Martin ◽

Katia Koelle

Keyword(s):

Genetic Variation ◽

Deep Sequencing ◽

De Novo ◽

Low Frequency ◽

Variant Calling ◽

Population Level ◽

Sequencing Data ◽

Deep Sequencing Data ◽

Computational Analyses ◽

Transmission Bottleneck

An early analysis of SARS-CoV-2 deep-sequencing data that combined epidemiological and genetic data to characterize the transmission dynamics of the virus in and beyond Austria concluded that the size of the virus’s transmission bottleneck was large – on the order of 1000 virions. We performed new computational analyses using these deep-sequenced samples from Austria. Our analyses included characterization of transmission bottleneck sizes across a range of variant calling thresholds and examination of patterns of shared low-frequency variants between transmission pairs in cases where de novo genetic variation was present in the recipient. From these analyses, among others, we found that SARS-CoV-2 transmission bottlenecks are instead likely to be very tight, on the order of 1-3 virions. These findings have important consequences for understanding how SARS-CoV-2 evolves between hosts and the processes shaping genetic variation observed at the population level.

Download Full-text

Application of Trusight Myeloid Panel on Whole Genome Amplified DNA in Myelodysplastic Syndrome Patients

Blood ◽

10.1182/blood.v128.22.5519.5519 ◽

2016 ◽

Vol 128 (22) ◽

pp. 5519-5519

Author(s):

Laura Palomo ◽

Daniel Alvira ◽

Francisco Fuster-Tormo ◽

Vera Ademà ◽

Maria Pilar Armengol ◽

...

Keyword(s):

Deep Sequencing ◽

Low Frequency ◽

Targeted Sequencing ◽

Target Coverage ◽

Whole Genome ◽

Common Variants ◽

Sequencing Data ◽

Hematopoietic Stem ◽

Paired Samples ◽

Targeted Deep Sequencing

Abstract Background: Whole genome amplification (WGA) has become an invaluable method for working with small amounts of starting DNA and for preserving limited samples of precious stock material. Next-Generation Sequencing (NGS) techniques can benefit from WGA, but due to their high sensitivity, WGA reliability needs to be certified to ensure an unbiased and accurate amplification of whole genomes. Myelodysplastic Syndromes (MDS) are a group of clonal hematopoietic stem cell disorders characterized by presenting somatic mutations in several myeloid-related genes. We have performed whole exome sequencing (WES) and targeted deep sequencing in tumoral samples from MDS patients. With the aim to determine if Multiple Displacement Amplification-based WGA can be applied to perform NGS in these type of samples and to obtain valuable results, targeted deep sequencing was performed on both fresh-DNA and WGA-DNA from the same patients. Mehtods: Whole bone marrow samples from four MDS patients were included in the study. WGA was performed in tumoral DNA samples with REPLI-g (Qiagen). WES libraries were generated in tumoral-control paired samples using the SureSelect Human Exome Kit 51Mb v4 (Agilent) and sequenced on an Illumina HiSeq2000. Targeted sequencing libraries were prepared for fresh-DNA and WGA-DNA following the manufacturer specifications for TruSight Myeloid Sequencing Panel protocol (Illumina), and then sequenced on one single run on an Illumina MiSeq. WES sequencing data was analyzed using an in-house pipeline, as previously reported. Targeted sequencing data analysis was performed with theMiSeq Reporter Software (Illumina). Filtering was performed in all cases by eliminating sequencing and mapping errors and by discarding intronic or synonym variants, variants located at highly variable regions or with low coverage, as well as know polymorphisms. Additional filtering was performed by visualization on Integrative Genome Viewer Software v.2.3.72. Results: Regarding targeted sequencing, fresh-DNA samples generated 6 million reads (SD = 1.9 million), with 98.5 % (SD = 0.8) of the mapped reads on-target and a mean target coverage of 12148.8 (SD = 3872.9). WGA-DNA samples yielded about 5.2 million reads (SD = 1.5 million), with 98.3 % (SD = 0.4) of the mapped reads on-target and a mean target coverage of 10447.5x (SD = 2946.3). A mean of 77% of total bases displayed a Q score ≥30, which did not differ between fresh and WGA-DNA. Comparison of all filtered variants within the four pairs revealed a high level of discordance between fresh/WGA samples (Figure 1A). A mean of 86% of the detected variants, considering both fresh and WGA-DNA, were detected at a low frequency (<10%). Therefore, a stricter variant filtering was performed, in which all variants detected at a frequency <10% were removed from further analyses. The pairwise comparison across the paired samples showed a total of 48.1% (SD = 49.3) of common variants, 23.2% (SD = 30.1) of variants exclusively detected in fresh-DNA, and 28.7% (SD = 38.4) of variants exclusively detected in WGA-DNA (Figure 1B). Overall, 100% (n=9) of the common variants were also detected by WES. Regarding fresh-DNA specific variants, 63% (5/8) were seen by WES and 37% (3/8) were not. However, these three variants were detected by targeted sequencing at frequencies between 10-12%. This suggests that even a stricter filtering may be necessary when working with WGA-DNA, or that they were not detected by WES because it was performed at a mean coverage of 60x making it difficult to detect low frequency variants. None of the WGA-DNA specific variants were seen by WES. Taking all these factors into account, we used the fresh-DNA specific variants as the gold standard to further calculate the Positive Predictive Value (PPV) and the sensitivity of the WGA-DNA samples, and thus validate the accuracy of WGA technique in the sample preparation. This revealed a sensitivity of 61.7% (SD = 43.3) and a PPV of 53.3% (SD = 54.2). Conclusions: These findings suggest that WGA methods may introduce errors, that can be detected at a low frequency, and that some bias can be expected, explaining why some variants present the gDNA may be lost during the amplification process. Therefore, we believe that applying WGA before library preparation should be restrained to cases with very limited material source and should be followed by a more in-depth and strict bioinformatics analysis and filtering process. Disclosures Sole: Celgene: Membership on an entity's Board of Directors or advisory committees.

Download Full-text

Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

PLoS ONE ◽

10.1371/journal.pone.0151664 ◽

2016 ◽

Vol 11 (3) ◽

pp. e0151664 ◽

Cited By ~ 95

Author(s):

Anne Bruun Krøigård ◽

Mads Thomassen ◽

Anne-Vibeke Lænkholm ◽

Torben A. Kruse ◽

Martin Jakob Larsen

Keyword(s):

Deep Sequencing ◽

Somatic Mutations ◽

Sequencing Data ◽

Somatic Variant ◽

Deep Sequencing Data ◽

Targeted Deep Sequencing

Download Full-text

Intra-host Variation and Evolutionary Dynamics of SARS-CoV-2 Population in COVID-19 Patients

10.1101/2020.05.20.103549 ◽

2020 ◽

Cited By ~ 6

Author(s):

Yanqun Wang ◽

Daxi Wang ◽

Lu Zhang ◽

Wanying Sun ◽

Zhaoyong Zhang ◽

...

Keyword(s):

Genetic Diversity ◽

Evolutionary Dynamics ◽

Single Patient ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Host Interactions ◽

Genomic Variants ◽

Deep Sequencing Data ◽

Serial Samples

ABSTRACTAs of middle May 2020, the causative agent of COVID-19, SARS-CoV-2, has infected over 4 million people with more than 300 thousand death as official reports1,2. The key to understanding the biology and virus-host interactions of SARS-CoV-2 requires the knowledge of mutation and evolution of this virus at both inter- and intra-host levels. However, despite quite a few polymorphic sites identified among SARS-CoV-2 populations, intra-host variant spectra and their evolutionary dynamics remain mostly unknown. Here, using deep sequencing data, we achieved and characterized consensus genomes and intra-host genomic variants from 32 serial samples collected from eight patients with COVID-19. The 32 consensus genomes revealed the coexistence of different genotypes within the same patient. We further identified 40 intra-host single nucleotide variants (iSNVs). Most (30/40) iSNVs presented in single patient, while ten iSNVs were found in at least two patients or identical to consensus variants. Comparison of allele frequencies of the iSNVs revealed genetic divergence between intra-host populations of the respiratory tract (RT) and gastrointestinal tract (GIT), mostly driven by bottleneck events among intra-host transmissions. Nonetheless, we observed a maintained viral genetic diversity within GIT, showing an increased population with accumulated mutations developed in the tissue-specific environments. The iSNVs identified here not only show spatial divergence of intra-host viral populations, but also provide new insights into the complex virus-host interactions.

Download Full-text

Comparison of somatic variant detection algorithms using Ion Torrent targeted deep sequencing data

BMC Medical Genomics ◽

10.1186/s12920-019-0636-y ◽

2019 ◽

Vol 12 (S9) ◽

Cited By ~ 1

Author(s):

Qing Wang ◽

Vassiliki Kotoula ◽

Pei-Chen Hsu ◽

Kyriaki Papadopoulou ◽

Joshua W. K. Ho ◽

...

Keyword(s):

Deep Sequencing ◽

Ion Torrent ◽

Sequencing Data ◽

Somatic Variation ◽

Somatic Variant ◽

Detection Algorithms ◽

Deep Sequencing Data ◽

Targeted Deep Sequencing ◽

Variant Detection ◽

Paired Tumour

Abstract Background The application of next-generation sequencing in cancer has revealed the genomic landscape of many tumour types and is nowadays routinely used in research and clinical settings. Multiple algorithms have been developed to detect somatic variation from sequencing data using either paired tumour-blood or tumour-only samples. Most of these methods have been developed and evaluated for the identification of somatic variation using Illumina sequencing datasets of moderate coverage. However, a comprehensive evaluation of somatic variant detection algorithms on Ion Torrent targeted deep sequencing data has not been performed. Methods We have applied three somatic detection algorithms, Torrent Variant Caller, MuTect2 and VarScan2, on a large cohort of ovarian cancer patients comprising of 208 paired tumour-blood samples and 253 tumour-only samples sequenced deeply on Ion Torrent Proton platform across 330 amplicons. Subsequently, the concordance and performance of the three somatic variant callers were assessed. Results We have observed low concordance across the algorithms with only 0.5% of SNV and 0.02% of INDEL calls in common across all three methods. The intersection of all methods showed better performance when assessed using correlation with known mutational signatures, overlap with COSMIC variation and by examining the variant characteristics. The Torrent Variant Caller also performed well with the advantage of not eliminating a high number of variants that could lead to high type II error. Conclusions Our results suggest that caution should be taken when applying state-of-the-art somatic variant algorithms to Ion Torrent targeted deep sequencing data. Better quality control procedures and strategies that combine results from multiple methods should ensure that higher accuracy is achieved. This is essential to ensure that results from bioinformatics pipelines using Ion Torrent deep sequencing can be robustly applied in cancer research and in the clinic.

Download Full-text

Moss enables high sensitivity single-nucleotide variant calling from multiple bulk DNA tumor samples

Nature Communications ◽

10.1038/s41467-021-22466-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Chuanyi Zhang ◽

Mohammed El-Kebir ◽

Idoia Ochoa

Keyword(s):

Cancer Genomics ◽

Low Frequency ◽

Variant Calling ◽

High Sensitivity ◽

Single Sample ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Additional Time ◽

Single Nucleotide ◽

Multiple Samples

AbstractIntra-tumor heterogeneity renders the identification of somatic single-nucleotide variants (SNVs) a challenging problem. In particular, low-frequency SNVs are hard to distinguish from sequencing artifacts. While the increasing availability of multi-sample tumor DNA sequencing data holds the potential for more accurate variant calling, there is a lack of high-sensitivity multi-sample SNV callers that utilize these data. Here we report Moss, a method to identify low-frequency SNVs that recur in multiple sequencing samples from the same tumor. Moss provides any existing single-sample SNV caller the ability to support multiple samples with little additional time overhead. We demonstrate that Moss improves recall while maintaining high precision in a simulated dataset. On multi-sample hepatocellular carcinoma, acute myeloid leukemia and colorectal cancer datasets, Moss identifies new low-frequency variants that meet manual review criteria and are consistent with the tumor’s mutational signature profile. In addition, Moss detects the presence of variants in more samples of the same tumor than reported by the single-sample caller. Moss’ improved sensitivity in SNV calling will enable more detailed downstream analyses in cancer genomics.

Download Full-text

Faculty Opinions recommendation of Evaluation of nine somatic variant callers for detection of somatic mutations in exome and targeted deep sequencing data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726230299.793539032 ◽

2017 ◽

Author(s):

Isabelle Meyts

Keyword(s):

Deep Sequencing ◽

Somatic Mutations ◽

Sequencing Data ◽

Somatic Variant ◽

Deep Sequencing Data ◽

Targeted Deep Sequencing

Download Full-text