scholarly journals Identification of single nucleotide variants using position-specific error estimation in deep sequencing data

2018 ◽  
Author(s):  
Dimitrios Kleftogiannis ◽  
Marco Punta ◽  
Anuradha Jayaram ◽  
Shahneen Sandhu ◽  
Stephen Q. Wong ◽  
...  

AbstractBackgroundTargeted deep sequencing is a highly effective technology to identify known and novel single nucleotide variants (SNVs) with many applications in translational medicine, disease monitoring and cancer profiling. However, identification of SNVs using deep sequencing data is a challenging computational problem as different sequencing artifacts limit the analytical sensitivity of SNV detection, especially at low variant allele frequencies (VAFs).MethodsTo address the problem of relatively high noise levels in amplicon-based deep sequencing data (e.g. with the Ion AmpliSeq technology) in the context of SNV calling, we have developed a new bioinformatics tool called AmpliSolve. AmpliSolve uses a set of normal samples to model position-specific, strand-specific and nucleotide-specific background artifacts (noise), and deploys a Poisson model-based statistical framework for SNV detection.ResultsOur tests on both synthetic and real data indicate that AmpliSolve achieves a good trade-off between precision and sensitivity, even at VAF below 5% and as low as 1%. We further validate AmpliSolve by applying it to the detection of SNVs in 96 circulating tumor DNA samples at three clinically relevant genomic positions and compare the results to digital droplet PCR experiments.ConclusionsAmpliSolve is a new tool for in-silico estimation of background noise and for detection of low frequency SNVs in targeted deep sequencing data. Although AmpliSolve has been specifically designed for and tested on amplicon-based libraries sequenced with the Ion Torrent platform it can, in principle, be applied to other sequencing platforms as well. AmpliSolve is freely available at https://github.com/dkleftogi/AmpliSolve.

2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Dimitrios Kleftogiannis ◽  
Marco Punta ◽  
Anuradha Jayaram ◽  
Shahneen Sandhu ◽  
Stephen Q. Wong ◽  
...  

2013 ◽  
Vol 29 (10) ◽  
pp. 1361-1364 ◽  
Author(s):  
Gustavo H. Kijak ◽  
Phuc Pham ◽  
Eric Sanders-Buell ◽  
Elizabeth A. Harbolick ◽  
Leigh Anne Eller ◽  
...  

BMC Genomics ◽  
2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Richard J Orton ◽  
Caroline F Wright ◽  
Marco J Morelli ◽  
David J King ◽  
David J Paton ◽  
...  

2021 ◽  
Author(s):  
Michael A. Martin ◽  
Katia Koelle

An early analysis of SARS-CoV-2 deep-sequencing data that combined epidemiological and genetic data to characterize the transmission dynamics of the virus in and beyond Austria concluded that the size of the virus’s transmission bottleneck was large – on the order of 1000 virions. We performed new computational analyses using these deep-sequenced samples from Austria. Our analyses included characterization of transmission bottleneck sizes across a range of variant calling thresholds and examination of patterns of shared low-frequency variants between transmission pairs in cases where de novo genetic variation was present in the recipient. From these analyses, among others, we found that SARS-CoV-2 transmission bottlenecks are instead likely to be very tight, on the order of 1-3 virions. These findings have important consequences for understanding how SARS-CoV-2 evolves between hosts and the processes shaping genetic variation observed at the population level.


Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 5519-5519
Author(s):  
Laura Palomo ◽  
Daniel Alvira ◽  
Francisco Fuster-Tormo ◽  
Vera Ademà ◽  
Maria Pilar Armengol ◽  
...  

Abstract Background: Whole genome amplification (WGA) has become an invaluable method for working with small amounts of starting DNA and for preserving limited samples of precious stock material. Next-Generation Sequencing (NGS) techniques can benefit from WGA, but due to their high sensitivity, WGA reliability needs to be certified to ensure an unbiased and accurate amplification of whole genomes. Myelodysplastic Syndromes (MDS) are a group of clonal hematopoietic stem cell disorders characterized by presenting somatic mutations in several myeloid-related genes. We have performed whole exome sequencing (WES) and targeted deep sequencing in tumoral samples from MDS patients. With the aim to determine if Multiple Displacement Amplification-based WGA can be applied to perform NGS in these type of samples and to obtain valuable results, targeted deep sequencing was performed on both fresh-DNA and WGA-DNA from the same patients. Mehtods: Whole bone marrow samples from four MDS patients were included in the study. WGA was performed in tumoral DNA samples with REPLI-g (Qiagen). WES libraries were generated in tumoral-control paired samples using the SureSelect Human Exome Kit 51Mb v4 (Agilent) and sequenced on an Illumina HiSeq2000. Targeted sequencing libraries were prepared for fresh-DNA and WGA-DNA following the manufacturer specifications for TruSight Myeloid Sequencing Panel protocol (Illumina), and then sequenced on one single run on an Illumina MiSeq. WES sequencing data was analyzed using an in-house pipeline, as previously reported. Targeted sequencing data analysis was performed with theMiSeq Reporter Software (Illumina). Filtering was performed in all cases by eliminating sequencing and mapping errors and by discarding intronic or synonym variants, variants located at highly variable regions or with low coverage, as well as know polymorphisms. Additional filtering was performed by visualization on Integrative Genome Viewer Software v.2.3.72. Results: Regarding targeted sequencing, fresh-DNA samples generated 6 million reads (SD = 1.9 million), with 98.5 % (SD = 0.8) of the mapped reads on-target and a mean target coverage of 12148.8 (SD = 3872.9). WGA-DNA samples yielded about 5.2 million reads (SD = 1.5 million), with 98.3 % (SD = 0.4) of the mapped reads on-target and a mean target coverage of 10447.5x (SD = 2946.3). A mean of 77% of total bases displayed a Q score ≥30, which did not differ between fresh and WGA-DNA. Comparison of all filtered variants within the four pairs revealed a high level of discordance between fresh/WGA samples (Figure 1A). A mean of 86% of the detected variants, considering both fresh and WGA-DNA, were detected at a low frequency (<10%). Therefore, a stricter variant filtering was performed, in which all variants detected at a frequency <10% were removed from further analyses. The pairwise comparison across the paired samples showed a total of 48.1% (SD = 49.3) of common variants, 23.2% (SD = 30.1) of variants exclusively detected in fresh-DNA, and 28.7% (SD = 38.4) of variants exclusively detected in WGA-DNA (Figure 1B). Overall, 100% (n=9) of the common variants were also detected by WES. Regarding fresh-DNA specific variants, 63% (5/8) were seen by WES and 37% (3/8) were not. However, these three variants were detected by targeted sequencing at frequencies between 10-12%. This suggests that even a stricter filtering may be necessary when working with WGA-DNA, or that they were not detected by WES because it was performed at a mean coverage of 60x making it difficult to detect low frequency variants. None of the WGA-DNA specific variants were seen by WES. Taking all these factors into account, we used the fresh-DNA specific variants as the gold standard to further calculate the Positive Predictive Value (PPV) and the sensitivity of the WGA-DNA samples, and thus validate the accuracy of WGA technique in the sample preparation. This revealed a sensitivity of 61.7% (SD = 43.3) and a PPV of 53.3% (SD = 54.2). Conclusions: These findings suggest that WGA methods may introduce errors, that can be detected at a low frequency, and that some bias can be expected, explaining why some variants present the gDNA may be lost during the amplification process. Therefore, we believe that applying WGA before library preparation should be restrained to cases with very limited material source and should be followed by a more in-depth and strict bioinformatics analysis and filtering process. Disclosures Sole: Celgene: Membership on an entity's Board of Directors or advisory committees.


PLoS ONE ◽  
2016 ◽  
Vol 11 (3) ◽  
pp. e0151664 ◽  
Author(s):  
Anne Bruun Krøigård ◽  
Mads Thomassen ◽  
Anne-Vibeke Lænkholm ◽  
Torben A. Kruse ◽  
Martin Jakob Larsen

Author(s):  
Yanqun Wang ◽  
Daxi Wang ◽  
Lu Zhang ◽  
Wanying Sun ◽  
Zhaoyong Zhang ◽  
...  

ABSTRACTAs of middle May 2020, the causative agent of COVID-19, SARS-CoV-2, has infected over 4 million people with more than 300 thousand death as official reports1,2. The key to understanding the biology and virus-host interactions of SARS-CoV-2 requires the knowledge of mutation and evolution of this virus at both inter- and intra-host levels. However, despite quite a few polymorphic sites identified among SARS-CoV-2 populations, intra-host variant spectra and their evolutionary dynamics remain mostly unknown. Here, using deep sequencing data, we achieved and characterized consensus genomes and intra-host genomic variants from 32 serial samples collected from eight patients with COVID-19. The 32 consensus genomes revealed the coexistence of different genotypes within the same patient. We further identified 40 intra-host single nucleotide variants (iSNVs). Most (30/40) iSNVs presented in single patient, while ten iSNVs were found in at least two patients or identical to consensus variants. Comparison of allele frequencies of the iSNVs revealed genetic divergence between intra-host populations of the respiratory tract (RT) and gastrointestinal tract (GIT), mostly driven by bottleneck events among intra-host transmissions. Nonetheless, we observed a maintained viral genetic diversity within GIT, showing an increased population with accumulated mutations developed in the tissue-specific environments. The iSNVs identified here not only show spatial divergence of intra-host viral populations, but also provide new insights into the complex virus-host interactions.


2019 ◽  
Vol 12 (S9) ◽  
Author(s):  
Qing Wang ◽  
Vassiliki Kotoula ◽  
Pei-Chen Hsu ◽  
Kyriaki Papadopoulou ◽  
Joshua W. K. Ho ◽  
...  

Abstract Background The application of next-generation sequencing in cancer has revealed the genomic landscape of many tumour types and is nowadays routinely used in research and clinical settings. Multiple algorithms have been developed to detect somatic variation from sequencing data using either paired tumour-blood or tumour-only samples. Most of these methods have been developed and evaluated for the identification of somatic variation using Illumina sequencing datasets of moderate coverage. However, a comprehensive evaluation of somatic variant detection algorithms on Ion Torrent targeted deep sequencing data has not been performed. Methods We have applied three somatic detection algorithms, Torrent Variant Caller, MuTect2 and VarScan2, on a large cohort of ovarian cancer patients comprising of 208 paired tumour-blood samples and 253 tumour-only samples sequenced deeply on Ion Torrent Proton platform across 330 amplicons. Subsequently, the concordance and performance of the three somatic variant callers were assessed. Results We have observed low concordance across the algorithms with only 0.5% of SNV and 0.02% of INDEL calls in common across all three methods. The intersection of all methods showed better performance when assessed using correlation with known mutational signatures, overlap with COSMIC variation and by examining the variant characteristics. The Torrent Variant Caller also performed well with the advantage of not eliminating a high number of variants that could lead to high type II error. Conclusions Our results suggest that caution should be taken when applying state-of-the-art somatic variant algorithms to Ion Torrent targeted deep sequencing data. Better quality control procedures and strategies that combine results from multiple methods should ensure that higher accuracy is achieved. This is essential to ensure that results from bioinformatics pipelines using Ion Torrent deep sequencing can be robustly applied in cancer research and in the clinic.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Chuanyi Zhang ◽  
Mohammed El-Kebir ◽  
Idoia Ochoa

AbstractIntra-tumor heterogeneity renders the identification of somatic single-nucleotide variants (SNVs) a challenging problem. In particular, low-frequency SNVs are hard to distinguish from sequencing artifacts. While the increasing availability of multi-sample tumor DNA sequencing data holds the potential for more accurate variant calling, there is a lack of high-sensitivity multi-sample SNV callers that utilize these data. Here we report Moss, a method to identify low-frequency SNVs that recur in multiple sequencing samples from the same tumor. Moss provides any existing single-sample SNV caller the ability to support multiple samples with little additional time overhead. We demonstrate that Moss improves recall while maintaining high precision in a simulated dataset. On multi-sample hepatocellular carcinoma, acute myeloid leukemia and colorectal cancer datasets, Moss identifies new low-frequency variants that meet manual review criteria and are consistent with the tumor’s mutational signature profile. In addition, Moss detects the presence of variants in more samples of the same tumor than reported by the single-sample caller. Moss’ improved sensitivity in SNV calling will enable more detailed downstream analyses in cancer genomics.


Sign in / Sign up

Export Citation Format

Share Document