scholarly journals Detection of False-Positive Deletions from the Database of Genomic Variants

2019 ◽  
Vol 2019 ◽  
pp. 1-8
Author(s):  
Junbo Duan ◽  
Han Liu ◽  
Lanling Zhao ◽  
Xiguo Yuan ◽  
Yu-Ping Wang ◽  
...  

Next generation sequencing is an emerging technology that has been widely used in the detection of genomic variants. However, since its depth of coverage, a main signature used for variant calling, is affected greatly by biases such as GC content and mappability, some callings are false positives. In this study, we utilized paired-end read mapping, another signature that is not affected by the aforementioned biases, to detect false-positive deletions in the database of genomic variants. We first identified 1923 suspicious variants that may be false positives and then conducted validation studies on each suspicious variant, which detected 583 false-positive deletions. Finally we analysed the distribution of these false positives by chromosome, sample, and size. Hopefully, incorrect documentation and annotations in downstream studies can be avoided by correcting these false positives in public repositories.

2021 ◽  
pp. 231-238
Author(s):  
Neha M. Jain ◽  
Alison Culley ◽  
Christine M. Micheel ◽  
Travis J. Osterman ◽  
Mia A. Levy

PURPOSE Tumor next-generation sequencing reports typically generate trial recommendations for patients based on their diagnosis and genomic profile. However, these require additional refinement and prescreening, which can add to physician burden. We wanted to use human prescreening efforts to efficiently refine these trial options and also elucidate the high-value parameters that have a major impact on efficient trial matching. METHODS Clinical trial recommendations were generated based on diagnosis and biomarker criteria using an informatics platform and were further refined by manual prescreening. The refined results were then compared with the initial trial recommendations and the reasons for false-positive matches were evaluated. RESULTS Manual prescreening significantly reduced the number of false positives from the informatics generated trial recommendations, as expected. We found that trial-specific criteria, especially recruiting status for individual trial arms, were a high value parameter and led to the largest number of automated false-positive matches. CONCLUSION Reflex clinical trial matching approaches that refine trial recommendations based on the clinical details as well as trial-specific criteria have the potential to help alleviate physician burden for selecting the most appropriate trial for their patient. Investing in publicly available resources that capture the recruiting status of a trial at the cohort or arm level would, therefore, allow us to make meaningful contributions to increase the clinical trial enrollments by eliminating false positives.


2021 ◽  
Author(s):  
Tao Jiang ◽  
Martin Buchkovich ◽  
Alison Motsinger-Reif

Abstract Background: Same-species contamination detection is an important quality control step in genetic data analysis. Due to a scarcity of methods to detect and correct for this quality control issue, same-species contamination is more difficult to detect than cross-species contamination. We introduce a novel machine learning algorithm to detect same-species contamination in next-generation sequencing (NGS) data using a support vector machine (SVM) model. Our approach uniquely detects contamination using variant calling information stored in variant call format (VCF) files for DNA or RNA. Importantly, it can differentiate between same-species contamination and mixtures of tumor and normal cells.In the first stage, a change-point detection method is used to identify copy number variations (CNVs) and copy number aberrations (CNAs) for filtering. Next, single nucleotide polymorphism (SNP) data is used to test for same-species contamination using an SVM model. Based on the assumption that alternative allele frequencies in NGS follow the beta-binomial distribution, the deviation parameter ρ is estimated by the maximum likelihood method. All features of a radial basis function (RBF) kernel SVM are generated using publicly available or private training data. Results: We demonstrate our approach in simulation experiments. The datasets combine, in silico, exome sequencing data of DNA from two lymphoblastoid cell lines (NA12878 and NA10855). We generate VCF files using variants identified in these data and then evaluate the power and false-positive rate of our approach. Our approach can detect contamination levels as low as 5% with a reasonable false-positive rate. Results in real data have sensitivity above 99.99% and specificity of 90.24%, even in the presence of degraded samples with similar features as contaminated samples. We provide an R software implementation of our approach.Conclusions: Our approach addresses the gap in methods to test for same-species contamination in NGS. Due to its high sensitivity for degraded samples and tumor-normal samples, it represents an important tool that can be applied within the quality control process. Additionally, the user-friendly software has the unique ability to conduct quality control using the VCF format.


GigaScience ◽  
2019 ◽  
Vol 8 (7) ◽  
Author(s):  
Timothy H Webster ◽  
Madeline Couse ◽  
Bruno M Grande ◽  
Eric Karlins ◽  
Tanya N Phung ◽  
...  

Abstract Background Mammalian X and Y chromosomes share a common evolutionary origin and retain regions of high sequence similarity. Similar sequence content can confound the mapping of short next-generation sequencing reads to a reference genome. It is therefore possible that the presence of both sex chromosomes in a reference genome can cause technical artifacts in genomic data and affect downstream analyses and applications. Understanding this problem is critical for medical genomics and population genomic inference. Results Here, we characterize how sequence homology can affect analyses on the sex chromosomes and present XYalign, a new tool that (1) facilitates the inference of sex chromosome complement from next-generation sequencing data; (2) corrects erroneous read mapping on the sex chromosomes; and (3) tabulates and visualizes important metrics for quality control such as mapping quality, sequencing depth, and allele balance. We find that sequence homology affects read mapping on the sex chromosomes and this has downstream effects on variant calling. However, we show that XYalign can correct mismapping, resulting in more accurate variant calling. We also show how metrics output by XYalign can be used to identify XX and XY individuals across diverse sequencing experiments, including low- and high-coverage whole-genome sequencing, and exome sequencing. Finally, we discuss how the flexibility of the XYalign framework can be leveraged for other uses including the identification of aneuploidy on the autosomes. XYalign is available open source under the GNU General Public License (version 3). Conclusions Sex chromsome sequence homology causes the mismapping of short reads, which in turn affects downstream analyses. XYalign provides a reproducible framework to correct mismapping and improve variant calling on the sex chromsomes.


2020 ◽  
Vol 5 (4) ◽  
pp. 2473011420S0002
Author(s):  
Yoonjung Choi ◽  
Irvin Oh

Category: Other Introduction/Purpose: Foot infections are often polymicrobial with diverse microbiomes. Accurate identification of the main pathogen in diabetic foot ulcer (DFU) remain challenging due to contamination or negative cultures often leading to ineffective post-surgical antibiotic treatment. Application of molecular diagnostics, such as next generation sequencing (NGS) has been explored as an alternative to standard culture in orthopaedic infections. NGS is highly sensitive and detects an entire bacterial genome along with pharmacologic resistant genes in a given sample. We sought to investigate the potential use of NGS for accurate diagnosis and quantification of various species in infected DFU. We hypothesize that NGS will provide a more accurate means of diagnosing and profiling microorganisms in infected DFU compared to the standard culture method. Methods: We investigated 30 infected DFU patients who underwent surgical treatment by a single academic orthopaedic surgeon from October 2018 to September 2019. The average age of the patient was 60.4 (range 33-82) years-old. Surgical procedures performed were irrigation and debridement (12), toe or ray amputation (13), calcanectomies (4), and below-knee amputation (1). Infected bone specimens were obtained intraoperatively and processed for standard culture and NGS. Quantitative PCR was performed to determine the bacterial burden present in the sample. DNA was amplified by PCR from a highly conserved region of the rRNA gene in the bacteria (16S rRNA). Once a high level of DNA was generated and determined, it was compared against NIH GenBank database. Concordance between the standard culture and NGS was assessed. Results: In 28 of 29 patients, pathogens were identified by both NGS and culture, with complete consistency of organisms in 13 cases (concordance rate: 43.3%). NGS provided relative quantitative measures and the presence of antibiotic resistant genes for each pathogen. In NGS, Anaerococcus species (79.3%) was the most common organism, followed by Streptococcus species (44.8%), Prevotella species (44.8%), Finegoldia magna (44.8%). In culture, S. aureus (58.6%) was the most common, followed by Streptococcus species (34.5%), coagulase-negative Staphylococci (24.1%), Corynebacterium species (20.7%). On average, NGS revealed 5.1 (1-11) number of pathogens, whereas standard culture revealed 2.6 (1-6) pathogens in a given sample. NGS identified 2 cases with false positive standard culture and detected antibiotic resistant organisms in 15 specimens. Conclusion: NGS is an emerging method of microbial identification in orthopedic infection. It is particularly helpful in profiling diverse microbes in polymicrobial infected DFU. It can identify major pathogens and may correct false positive or false negative culture. NGS may allow a faster invitation of postoperative targeted antibiotic therapy. [Table: see text]


BioTechniques ◽  
2020 ◽  
Vol 68 (1) ◽  
pp. 48-51 ◽  
Author(s):  
Christopher R McEvoy ◽  
Timothy Semple ◽  
Bhargavi Yellapu ◽  
David Y Choong ◽  
Huiling Xu ◽  
...  

Tumor DNA sequencing results can have important clinical implications. However, its use is often limited by low DNA input, owing to small tumor biopsy size. To help overcome this limitation we have developed a simple improvement to a commonly used next-generation sequencing (NGS) capture-based library preparation method using formalin-fixed paraffin-embedded-derived tumor DNA. By using on-bead PCR for pre-capture library generation we show that library yields are dramatically increased, resulting in decreased sample failure rates. Improved yields allowed for a reduction in PCR cycles, which translated into improved sequencing parameters without affecting variant calling. This methodology should be applicable to any NGS system in which input DNA is a limiting factor.


2019 ◽  
Vol 66 (1) ◽  
pp. 117-123 ◽  
Author(s):  
Stephen J Salipante ◽  
Keith R Jerome

Abstract BACKGROUND The PCR and its variant, quantitative PCR (qPCR), have revolutionized the practice of clinical microbiology. Continued advancements in PCR have led to a new derivative, digital PCR (dPCR), which promises to address certain limitations inherent to qPCR. CONTENT Here we highlight the important technical differences between qPCR and dPCR, and the potential advantages and disadvantages of each. We then review specific situations in which dPCR has been implemented in clinical microbiology and the results of such applications. Finally, we attempt to place dPCR in the context of other emerging technologies relevant to the clinical laboratory, including next-generation sequencing. SUMMARY dPCR offers certain clear advantages over traditional qPCR, but these are to some degree offset by limitations of the technology, at least as currently practiced. Laboratories considering implementation of dPCR should carefully weigh the potential advantages and disadvantages of this powerful technique for each specific application planned.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Qiandong Zeng ◽  
Natalia T. Leach ◽  
Zhaoqing Zhou ◽  
Hui Zhu ◽  
Jean A. Smith ◽  
...  

Abstract Next-generation sequencing (NGS) is widely used in genetic testing for the highly sensitive detection of single nucleotide changes and small insertions or deletions. However, detection and phasing of structural variants, especially in repetitive or homologous regions, can be problematic due to uneven read coverage or genome reference bias, resulting in false calls. To circumvent this challenge, a computational approach utilizing customized scaffolds as supplementary reference sequences for read alignment was developed, and its effectiveness demonstrated with two CBS gene variants: NM_000071.2:c.833T>C and NM_000071.2:c.[833T>C; 844_845ins68]. Variant c.833T>C is a known causative mutation for homocystinuria, but is not pathogenic when in cis with the insertion, c.844_845ins68, because of alternative splicing. Using simulated reads, the custom scaffolds method resolved all possible combinations with 100% accuracy and, based on > 60,000 clinical specimens, exceeded the performance of current approaches that only align reads to GRCh37/hg19 for the detection of c.833T>C alone or in cis with c.844_845ins68. Furthermore, analysis of two 1000 Genomes Project trios revealed that the c.[833T>C; 844_845ins68] complex variant had previously been undetected in these datasets, likely due to the alignment method used. This approach can be configured for existing workflows to detect other challenging and potentially underrepresented variants, thereby augmenting accurate variant calling in clinical NGS testing.


PLoS ONE ◽  
2014 ◽  
Vol 9 (3) ◽  
pp. e90581 ◽  
Author(s):  
Wan-Ping Lee ◽  
Michael P. Stromberg ◽  
Alistair Ward ◽  
Chip Stewart ◽  
Erik P. Garrison ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document