scholarly journals A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference

2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Adam Cornish ◽  
Chittibabu Guda

High-throughput sequencing, especially of exomes, is a popular diagnostic tool, but it is difficult to determine which tools are the best at analyzing this data. In this study, we use the NIST Genome in a Bottle results as a novel resource for validation of our exome analysis pipeline. We use six different aligners and five different variant callers to determine which pipeline, of the 30 total, performs the best on a human exome that was used to help generate the list of variants detected by the Genome in a Bottle Consortium. Of these 30 pipelines, we found that Novoalign in conjunction with GATK UnifiedGenotyper exhibited the highest sensitivity while maintaining a low number of false positives for SNVs. However, it is apparent that indels are still difficult for any pipeline to handle with none of the tools achieving an average sensitivity higher than 33% or a Positive Predictive Value (PPV) higher than 53%. Lastly, as expected, it was found that aligners can play as vital a role in variant detection as variant callers themselves.

2019 ◽  
Author(s):  
Elena Nabieva ◽  
Satyarth Mishra Sharma ◽  
Yermek Kapushev ◽  
Sofya K. Garushyants ◽  
Anna V. Fedotova ◽  
...  

AbstractHigh-throughput sequencing of fetal DNA is a promising and increasingly common method for the discovery of all (or all coding) genetic variants in the fetus, either as part of prenatal screening or diagnosis, or for genetic diagnosis of spontaneous abortions. In many cases, the fetal DNA (from chorionic villi, amniotic fluid, or abortive tissue) can be contaminated with maternal cells, resulting in the mixture of fetal and maternal DNA. This maternal cell contamination (MCC) undermines the assumption, made by traditional variant callers, that each allele in a heterozygous site is covered, on average, by 50% of the reads, and therefore can lead to erroneous genotype calls. We present a panel of methods for reducing the genotyping error in the presence of MCC. All methods start with the output of GATK HaplotypeCaller on the sequencing data for the (contaminated) fetal sample and both of its parents, and additionally rely on information about the MCC fraction (which itself is readily estimated from the high-throughput sequencing data). The first of these methods uses a Bayesian probabilistic model to correct the fetal genotype calls produced by MCC-unaware HaplotypeCaller. The other two methods “learn” the genotype-correction model from examples. We use simulated contaminated fetal data to train and test the models. Using the test sets, we show that all three methods lead to substantially improved accuracy when compared with the original MCC-unaware HaplotypeCaller calls. We then apply the best-performing method to three chorionic villus samples from spontaneously terminated pregnancies.Code and training data availabilityhttps://github.com/bazykinlab/ML-maternal-cell-contamination


2021 ◽  
Vol 10 ◽  
Author(s):  
Yansheng Xu ◽  
Xin Ma ◽  
Xing Ai ◽  
Jiangping Gao ◽  
Yiming Liang ◽  
...  

BackgroundConventional clinical detection methods such as CT, urine cytology, and ureteroscopy display low sensitivity and/or are invasive in the diagnosis of upper tract urinary carcinoma (UTUC), a factor precluding their use. Previous studies on urine biopsy have not shown satisfactory sensitivity and specificity in the application of both gene mutation or gene methylation panels. Therefore, these unfavorable factors call for an urgent need for a sensitive and non-invasive method for the diagnosis of UTUC.MethodsIn this study, a total of 161 hematuria patients were enrolled with (n = 69) or without (n = 92) UTUC. High-throughput sequencing of 17 genes and methylation analysis for ONECUT2 CpG sites were combined as a liquid biopsy test panel. Further, a logistic regression prediction model that contained several significant features was used to evaluate the risk of UTUC in these patients.ResultsIn total, 86 UTUC− and 64 UTUC+ case samples were enrolled for the analysis. A logistic regression analysis of significant features including age, the mutation status of TERT promoter, and ONECUT2 methylation level resulted in an optimal model with a sensitivity of 94.0%, a specificity of 93.1%, the positive predictive value of 92.2% and a negative predictive value of 94.7%. Notably, the area under the curve (AUC) was 0.957 in the training dataset while internal validation produced an AUC of 0.962. It is worth noting that during follow-up, a patient diagnosed with ureteral inflammation at the time of diagnosis exhibiting both positive mutation and methylation test results was diagnosed with ureteral carcinoma 17 months after his enrollment.ConclusionThis work utilized the epigenetic biomarker ONECUT2 for the first time in the detection of UTUC and discovered its superior performance. To improve its sensitivity, we combined the biomarker with high-throughput sequencing of 17 genes test. It was found that the selected logistic regression model diagnosed with ureteral cancer can evaluate upper tract urinary carcinoma risk of patients with hematuria and outperform other existing panels in providing clinical recommendations for the diagnosis of UTUC. Moreover, its high negative predictive value is conducive to rule to exclude patients without UTUC.


2019 ◽  
Author(s):  
XM Shao ◽  
R Bhattacharya ◽  
J Huang ◽  
IKA Sivakumar ◽  
C Tokheim ◽  
...  

AbstractComputational prediction of binding between neoantigen peptides and major histocompatibility complex (MHC) proteins is an emerging biomarker for predicting patient response to cancer immunotherapy. Current neoantigen predictors focus onin silicoestimation of MHC binding affinity and are limited by low positive predictive value for actual peptide presentation, inadequate support for rare MHC alleles and poor scalability to high-throughput data sets. To address these limitations, we developed MHCnuggets, a deep neural network method to predict peptide-MHC binding. MHCnuggets is the only method to handle binding prediction for common or rare alleles of MHC Class I or II, with a single neural network architecture. Using a long short-term memory network (LSTM), MHCnuggets accepts peptides of variable length and is capable of faster performance than other methods. When compared to methods that integrate binding affinity and HLAp data from mass spectrometry, MHCnuggets yields a fourfold increase in positive predictive value on independent MHC-bound peptide (HLAp) data. We applied MHCnuggets to 26 cancer types in TCGA, processing 26.3 million allele-peptide comparisons in under 2.3 hours, yielding 101,326 unique candidate immunogenic missense mutations (IMMs). Predicted-IMM hotspots occurred in 38 genes, including 24 driver genes. Predicted-IMM load was significantly associated with increased immune cell infiltration (p<2e−16) including CD8+ T cells. Notably, only 0.16% of predicted immunogenic missense mutations were observed in >2 patients, with 61.7% of these derived from driver mutations. Our results provide a new method for neoantigen prediction with high performance characteristics and demonstrate its utility in large data sets across human cancers.SynopsisWe developed a newin silicopredictor of Major Histocompatibility Complex (MHC) ligand binding and demonstrated its utility to assess potential neoantigens and immunogenic missense mutations (IMMs) in 6613 TCGA patients.


Arrhythmia or irregular heart beat had wide range of clinical manifestations, from benign arrhythmia that not need any medication to life threatening condition. It can occur permanently or intermittently. Intermittent arrhythmia needs specific diagnostic tools that can record the electrocardiogram continuously. This research was sought to analysed the sensitivity, specificity, Positive Predictive Value, and Negative Predictive Value of arrhythmia monitoring device that based on neural network based artificial intelligent. The pivotal clinical trial was involved a total 103 people (health and stable arrhythmia patients). This research used a diagnostic test by comparing the electrocardiography (ECG) from prototype with standard ECG for diagnose arrhythmia. The Arrhythmia Monitoring System that we developed has three hardware components; smartphones, server for arrhythmia detection and patchable ECG recorder. All three components are connected with internet of things (IoT) technology. The architecture of Arrhythmia software monitoring included ECG signals pre-processing, beats detection, features extraction for detecting VT/VF, and classification for detecting VT/VF. Features extraction such as heart rate variability (HRV) and T wave alternans. We compared the ECG of arrhythmia prototype monitoring device with standard Holter monitoring. We enrolled 103 patients. There was no significant difference of heart rate between arrhythmia prototype monitoring device and standard Holter (87.26 ± 11.2 vs 86.07±9.15, P=0.43). There was significant different of maximum and minimum heart rate between arrhythmia prototype monitoring device and standard holter monitoring (121.3±31.7 vs 131.0±10.8, p= 0.000, and 65.1±13.5 vs 73.07±10.02, p=0.000). This device has low sensitivity 80% (95% CI 75% – 82%) and high specificity 91.8% (95% CI 85% – 92%) for detecting the abnormal ECG. The Positive Predictive Value (PPV) was 63.2% (95% CI 58.8% – 67.52%) and Negative Predictive Value (NVP) was 96.3% (95% CI 94.7% – 98.3%). This device demonstrates an ability to detect PVC and PAC (Sensitivity 71.4% (95% CI 66.4% - 76.4%) and 75% (95% CI 72%-78%), Specificity 97.8% (95% CI 95.8-99.8%) and 91.7% (95% CI 83.4%- 99.7%, respectively). The PPV of this device to detect PVC and PAC was 71.4% (95% CI 66.4%-76.4%) and 72.7%. (95% CI 68.7%-76.7%) The NPV of this device to detect PVC and PAC was 97.8% (95% CI 95.8%-99.85) and 98.9% (95% CI 98.1%-99.7%), respectively. This study found that the device to be a valuable diagnostic tool that has relatively low sensitivity but high specificity for diagnosing Abnormal ECG, PVC and PAC. According to the results of our study, we found that the device to be a valuable diagnostic tool that has relatively high sensitivity and specificity for diagnosing Abnormal ECG, PVC and PAC.


2017 ◽  
Author(s):  
Darrell O. Ricke ◽  
Anna Shcherbina ◽  
Adam Michaleas ◽  
Philip Fremont-Smith

AbstractHigh throughput DNA sequencing technologies enable improved characterization of forensic DNA samples enabling greater insights into DNA contributor(s). Current DNA forensics techniques rely upon allele sizing of short tandem repeats by capillary electrophoresis. High throughput sequencing enables forensic sample characterizations for large numbers of single nucleotide polymorphism loci. The slowest computational component of the DNA forensics analysis pipeline is the characterization of raw sequence data. This paper optimizes the SNP calling module of the DNA analysis pipeline with runtime results that scale linearly with the number of HTS sequences (patent pending)[1]. GrigoraSNPs can analyze 100 million reads in less than 5 minutes using 3 threads on a 4.0 GHz Intel i7-6700K laptop CPU.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Gwenna Breton ◽  
Anna C. V. Johansson ◽  
Per Sjödin ◽  
Carina M. Schlebusch ◽  
Mattias Jakobsson

Abstract Background Population genetic studies of humans make increasing use of high-throughput sequencing in order to capture diversity in an unbiased way. There is an abundance of sequencing technologies, bioinformatic tools and the available genomes are increasing in number. Studies have evaluated and compared some of these technologies and tools, such as the Genome Analysis Toolkit (GATK) and its “Best Practices” bioinformatic pipelines. However, studies often focus on a few genomes of Eurasian origin in order to detect technical issues. We instead surveyed the use of the GATK tools and established a pipeline for processing high coverage full genomes from a diverse set of populations, including Sub-Saharan African groups, in order to reveal challenges from human diversity and stratification. Results We surveyed 29 studies using high-throughput sequencing data, and compared their strategies for data pre-processing and variant calling. We found that processing of data is very variable across studies and that the GATK “Best Practices” are seldom followed strictly. We then compared three versions of a GATK pipeline, differing in the inclusion of an indel realignment step and with a modification of the base quality score recalibration step. We applied the pipelines on a diverse set of 28 individuals. We compared the pipelines in terms of count of called variants and overlap of the callsets. We found that the pipelines resulted in similar callsets, in particular after callset filtering. We also ran one of the pipelines on a larger dataset of 179 individuals. We noted that including more individuals at the joint genotyping step resulted in different counts of variants. At the individual level, we observed that the average genome coverage was correlated to the number of variants called. Conclusions We conclude that applying the GATK “Best Practices” pipeline, including their recommended reference datasets, to underrepresented populations does not lead to a decrease in the number of called variants compared to alternative pipelines. We recommend to aim for coverage of > 30X if identifying most variants is important, and to work with large sample sizes at the variant calling stage, also for underrepresented individuals and populations.


2020 ◽  
Vol 5 (3) ◽  
pp. 1196-1200
Author(s):  
Manish Raj Pathak ◽  
Mahesh Gautam ◽  
Rashmita Bhandari

Introduction: Breast carcinoma is the second leading cause of cancer related mortality in females around the world. Ultrasound plays a key role in differentiating cystic and solid lesions and is a convenient and non-invasive diagnostic tool to differentiate between benign and malignant lesions. Objectives: The aim of this study is to evaluate the diagnostic accuracy of ultrasound in palpable breast lesions. Methodology: A prospective cross-sectional study was carried out in patients with palpable breast lesions who presented in Department of radio diagnosis and imaging of Nobel Medical collegefor a period of one-year from February 2019- January 2020 using ultrasound. A total of 60 patientswereevaluated in the study. Sensitivity, specificity, positive predictive value, negative predictive value and accuracy were calculated. Results: Out of 60 patients evaluated, ultrasound showed 46 (76.7%) cases to be benign and 14 (23.3%) cases to be malignant. FNAC revealed benign disease in 47 (78.3%) patients and malignant disease in 13 (21.7%) patients. The most common benign lesion was fibroadenoma. We found nearly 91.7% of the malignant lesions had spiculated margins and microcalcification. The sensitivity of ultrasound was 95.74% and specificity 92.3% with diagnostic accuracy 95%. Conclusion: Ultrasound is a convenient and non-invasive diagnostic tool with good sensitivity, specificity, positive predictive value, negative predictive value and accuracy in palpable breast lesions.


2013 ◽  
Vol 7 (Suppl 6) ◽  
pp. S8 ◽  
Author(s):  
Takahiro Mimori ◽  
Naoki Nariai ◽  
Kaname Kojima ◽  
Mamoru Takahashi ◽  
Akira Ono ◽  
...  

2020 ◽  
Author(s):  
Yansheng Xu ◽  
Xin Ma ◽  
Xing Ai ◽  
Jiangping Gao ◽  
Yiming Liang ◽  
...  

Abstract Background: Conventional clinical detection methods such as CT, urine cytology, and ureteroscopy display low sensitivity and/or are invasive in the diagnosis of upper tract urinary carcinoma (UTUC), a factor precluding their use. Previous studies on urine biopsy have not shown satisfactory sensitivity and specificity in the application of both gene mutation or gene methylation panels. Therefore, these unfavorable factors call for an urgent need for a sensitive and non-invasive method for the diagnosis of UTUC.Methods: In this study, a total of 161 hematuria patients were enrolled with (n=69) or without (n=92) UTUC. High-throughput sequencing of 17 genes and methylation analysis for ONECUT2 CpG sites were combined as a liquid biopsy test panel. Further, a logistic regression prediction model that contained several significant features was used to evaluate the risk of UTUC in these patients.Results: In total, 86 UTUC- and 64 UTUC+ case samples were enrolled for the analysis. A logistic regression analysis of significant features including age, the mutation status of TERT promoter and ONECUT2 methylation level resulted inan optimal model with a sensitivity of 94.0%, a specificity of 93.1%, the positive predictive value of 92.2% and a negative predictive value of 94.7%. Notably, the area under the curve (AUC) was 0.957 in the training dataset while internal validation produced an AUC of 0.962. It is worth noting that during follow-up, a patient diagnosed with ureteral inflammation at the time of diagnosis exhibiting both positive mutation and methylation test results was diagnosed with ureteral carcinoma 17 months after his enrollment.Conclusion: This work utilized the epigenetic biomarker ONECUT2 for the first time in the detection of UTUC and discovered its superior performance. To improve its sensitivity, we combined the biomarker with a high-throughput sequencing of 17 genes test. It was found that the selected logistic regression model diagnosed with ureteral cancerThetcan evaluate upper tract urinary carcinoma risk of patients with hematuria and outperform other existing panels in providing clinical recommendations for the diagnosis of UTUC. Moreover, its high negative predictive value is conducive to rule to exclude patients without UTUC.


2018 ◽  
Author(s):  
Simon P Sadedin ◽  
Alicia Oshlack

AbstractBackgroundAs costs of high throughput sequencing have fallen, we are seeing vast quantities of short read genomic data being generated. Often, the data is exchanged and stored as aligned reads, which provides high compression and convenient access for many analyses. However, aligned data becomes outdated as new reference genomes and alignment methods become available. Moreover, some applications cannot utilise pre-aligned reads at all, necessitating conversion back to raw format (FASTQ) before they can be used. In both cases, the process of extraction and realignment is expensive and time consuming.FindingsWe describe Bazam, a tool that efficiently extracts the original paired FASTQ from reads stored in aligned form (BAM or CRAM format). Bazam extracts reads in a format that directly allows realignment with popular aligners with high concurrency. Through eliminating steps and increasing the accessible concurrency, Bazam facilitates up to a 90% reduction in the time required for realignment compared to standard methods. Bazam can support selective extraction of read pairs from focused genomic regions, further increasing efficiency for targeted analyses. Bazam is additionally suitable as a base for other applications that require efficient paired read information, such as quality control, structural variant calling and alignment comparison.ConclusionsBazam offers significant improvements for users needing to realign genomic data.


Sign in / Sign up

Export Citation Format

Share Document