Converting single nucleotide variants between genome builds: from cautionary tale to solution

Briefings in Bioinformatics ◽

10.1093/bib/bbab069 ◽

2021 ◽

Author(s):

Cathal Ormond ◽

Niamh M Ryan ◽

Aiden Corvin ◽

Elizabeth A Heron

Keyword(s):

Human Genome ◽

Simple Procedure ◽

Single Nucleotide Variants ◽

Simple Method ◽

Single Nucleotide ◽

Position Information ◽

Sequencing Studies ◽

Accurate Position ◽

Generation Sequencing ◽

Human Genome Reference

Abstract Next-generation sequencing studies are dependent on a high-quality reference genome for single nucleotide variant (SNV) calling. Although the two most recent builds of the human genome are widely used, position information is typically not directly comparable between them. Re-alignment gives the most accurate position information, but this procedure is often computationally expensive, and therefore, tools such as liftOver and CrossMap are used to convert data from one build to another. However, the positions of converted SNVs do not always match SNVs derived from aligned data, and in some instances, SNVs are known to change chromosome when converted. This is a significant problem when compiling sequencing resources or comparing results across studies. Here, we describe a novel algorithm to identify positions that are unstable when converting between human genome reference builds. These positions are detected independent of the conversion tools and are determined by the chain files, which provide a mapping of contiguous positions from one build to another. We also provide the list of unstable positions for converting between the two most commonly used builds GRCh37 and GRCh38. Pre-excluding SNVs at these positions, prior to conversion, results in SNVs that are stable to conversion. This simple procedure gives the same final list of stable SNVs as applying the algorithm and subsequently removing variants at unstable positions. This work highlights the care that must be taken when converting SNVs between genome builds and provides a simple method for ensuring higher confidence converted data. Unstable positions and algorithm code, available at https://github.com/cathaloruaidh/genomeBuildConversion

Download Full-text

Simultaneous identification of clinically relevant single nucleotide variants, copy number alterations and gene fusions in solid tumors by targeted next-generation sequencing

Oncotarget ◽

10.18632/oncotarget.25229 ◽

2018 ◽

Vol 9 (32) ◽

pp. 22749-22768 ◽

Cited By ~ 3

Author(s):

Duarte Mendes Oliveira ◽

Teresa Mirante ◽

Chiara Mignogna ◽

Marianna Scrima ◽

Simona Migliozzi ◽

...

Keyword(s):

Next Generation Sequencing ◽

Solid Tumors ◽

Copy Number ◽

Gene Fusions ◽

Copy Number Alterations ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Targeted Next Generation Sequencing ◽

Simultaneous Identification ◽

Generation Sequencing

Download Full-text

A Model Study of In Silico Proficiency Testing for Clinical Next-Generation Sequencing

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2016-0194-cp ◽

2016 ◽

Vol 140 (10) ◽

pp. 1085-1091 ◽

Cited By ~ 21

Author(s):

Eric J. Duncavage ◽

Haley J. Abel ◽

Jason D. Merker ◽

John B. Bodner ◽

Qin Zhao ◽

...

Keyword(s):

Next Generation Sequencing ◽

Proficiency Testing ◽

In Silico ◽

Absolute Difference ◽

Ion Torrent ◽

Next Generation ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Clinical Laboratories ◽

Generation Sequencing

Context.—Most current proficiency testing challenges for next-generation sequencing assays are methods-based proficiency testing surveys that use DNA from characterized reference samples to test both the wet-bench and bioinformatics/dry-bench aspects of the tests. Methods-based proficiency testing surveys are limited by the number and types of mutations that either are naturally present or can be introduced into a single DNA sample. Objective.—To address these limitations by exploring a model of in silico proficiency testing in which sequence data from a single well-characterized specimen are manipulated electronically. Design.—DNA from the College of American Pathologists reference genome was enriched using the Illumina TruSeq and Life Technologies AmpliSeq panels and sequenced on the MiSeq and Ion Torrent platforms, respectively. The resulting data were mutagenized in silico and 26 variants, including single-nucleotide variants, deletions, and dinucleotide substitutions, were added at variant allele fractions (VAFs) from 10% to 50%. Participating clinical laboratories downloaded these files and analyzed them using their clinical bioinformatics pipelines. Results.—Laboratories using the AmpliSeq/Ion Torrent and/or the TruSeq/MiSeq participated in the 2 surveys. On average, laboratories identified 24.6 of 26 variants (95%) overall and 21.4 of 22 variants (97%) with VAFs greater than 15%. No false-positive calls were reported. The most frequently missed variants were single-nucleotide variants with VAFs less than 15%. Across both challenges, reported VAF concordance was excellent, with less than 1% median absolute difference between the simulated VAF and mean reported VAF. Conclusions.—The results indicate that in silico proficiency testing is a feasible approach for methods-based proficiency testing, and demonstrate that the sensitivity and specificity of current next-generation sequencing bioinformatics across clinical laboratories are high.

Download Full-text

Abstract LB-227: Guideline adherent clinical validation of a comprehensive DNA/RNA panel (523 genes-TruSight Oncology 500) for determination of single nucleotide variants (SNV’s), small insertions or deletions (Indels), copy number variations (CNV’s), splice variations (SV’s), gene fusions (GF’s), tumor mutation burden (TMB) and micro-satellite instability (MSI) on anext-generation sequencing (NGS)platform in a CLIA setting

10.1158/1538-7445.sabcs18-lb-227 ◽

2019 ◽

Author(s):

Ravindra Kolhe ◽

Pankaj Ahluwalia ◽

Saleh Heneidi ◽

Sudha Ananth ◽

Vamsi Kota ◽

...

Keyword(s):

Copy Number ◽

Copy Number Variations ◽

Clinical Validation ◽

Gene Fusions ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Tumor Mutation Burden ◽

Mutation Burden ◽

Generation Sequencing

Download Full-text

Integrating Multiple Genomic Data to Predict Disease-Causing Nonsynonymous Single Nucleotide Variants in Exome Sequencing Studies

PLoS Genetics ◽

10.1371/journal.pgen.1004237 ◽

2014 ◽

Vol 10 (3) ◽

pp. e1004237 ◽

Cited By ~ 32

Author(s):

Jiaxin Wu ◽

Yanda Li ◽

Rui Jiang

Keyword(s):

Exome Sequencing ◽

Genomic Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Sequencing Studies

Download Full-text

PileLine: a toolbox to handle genome position information in next-generation sequencing studies

BMC Bioinformatics ◽

10.1186/1471-2105-12-31 ◽

2011 ◽

Vol 12 (1) ◽

Cited By ~ 7

Author(s):

Daniel Glez-Peña ◽

Gonzalo Gómez-López ◽

Miguel Reboiro-Jato ◽

Florentino Fdez-Riverola ◽

David G Pisano

Keyword(s):

Next Generation Sequencing ◽

Next Generation ◽

Position Information ◽

Sequencing Studies ◽

Generation Sequencing

Download Full-text

Prioritization Of Nonsynonymous Single Nucleotide Variants For Exome Sequencing Studies Via Integrative Learning On Multiple Genomic Data

Scientific Reports ◽

10.1038/srep14955 ◽

2015 ◽

Vol 5 (1) ◽

Cited By ~ 8

Author(s):

Mengmeng Wu ◽

Jiaxin Wu ◽

Ting Chen ◽

Rui Jiang

Keyword(s):

Exome Sequencing ◽

Genomic Data ◽

Integrative Learning ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Sequencing Studies

Download Full-text

Prostate cancer heterogeneity assessment with multi-regional sampling and alignment-free methods

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa062 ◽

2020 ◽

Vol 2 (3) ◽

Author(s):

Ross G Murphy ◽

Aideen C Roddy ◽

Shambhavi Srivastava ◽

Esther Baena ◽

David J Waugh ◽

...

Keyword(s):

Next Generation Sequencing ◽

Next Generation ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Cancer Heterogeneity ◽

Alignment Free ◽

Treatment Indications ◽

Patient Heterogeneity ◽

Genomic Locations ◽

Generation Sequencing

Abstract Combining alignment-free methods for phylogenetic analysis with multi-regional sampling using next-generation sequencing can provide an assessment of intra-patient tumour heterogeneity. From multi-regional sampling divergent branching, we validated two different lesions within a patient’s prostate. Where multi-regional sampling has not been used, a single sample from one of these areas could misguide as to which drugs or therapies would best benefit this patient, due to the fact these tumours appear to be genetically different. This application has the power to render, in a fraction of the time used by other approaches, intra-patient heterogeneity and decipher aberrant biomarkers. Another alignment-free method for calling single-nucleotide variants from raw next-generation sequencing samples has determined possible variants and genomic locations that may be able to characterize the differences between the two main branching patterns. Alignment-free approaches have been applied to relevant clinical multi-regional samples and may be considered as a valuable option for comparing and determining heterogeneity to help deliver personalized medicine through more robust efforts in identifying targetable pathways and therapeutic strategies. Our study highlights the application these tools could have on patient-aligned treatment indications.

Download Full-text

SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors

Bioinformatics ◽

10.1093/bioinformatics/btq040 ◽

2010 ◽

Vol 26 (6) ◽

pp. 730-736 ◽

Cited By ~ 157

Author(s):

Rodrigo Goya ◽

Mark G.F. Sun ◽

Ryan D. Morin ◽

Gillian Leung ◽

Gavin Ha ◽

...

Keyword(s):

Next Generation Sequencing ◽

Next Generation ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Generation Sequencing

Download Full-text

SNVHMM: predicting single nucleotide variants from next generation sequencing

BMC Bioinformatics ◽

10.1186/1471-2105-14-225 ◽

2013 ◽

Vol 14 (1) ◽

pp. 225 ◽

Cited By ~ 3

Author(s):

Jiawen Bian ◽

Chenglin Liu ◽

Hongyan Wang ◽

Jing Xing ◽

Priyanka Kachroo ◽

...

Keyword(s):

Next Generation Sequencing ◽

Next Generation ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Generation Sequencing

Download Full-text

UMI-Gen: a UMI-based reads simulator for variant calling evaluation in paired-end sequencing NGS libraries

10.1101/2020.04.22.027532 ◽

2020 ◽

Author(s):

Vincent Sater ◽

Pierre-Julien Viailly ◽

Thierry Lecroq ◽

Philippe Ruminy ◽

Caroline Bérard ◽

...

Keyword(s):

Variant Calling ◽

Copy Number Variations ◽

Biological Data ◽

Single Nucleotide Variants ◽

Background Error ◽

Single Nucleotide ◽

Low Frequencies ◽

Paired End Sequencing ◽

Very High ◽

Generation Sequencing

AbstractMotivationWith Next Generation Sequencing becoming more affordable every year, NGS technologies asserted themselves as the fastest and most reliable way to detect Single Nucleotide Variants (SNV) and Copy Number Variations (CNV) in cancer patients. These technologies can be used to sequence DNA at very high depths thus allowing to detect abnormalities in tumor cells with very low frequencies. A lot of different variant callers are publicly available and usually do a good job at calling out variants. However, when frequencies begin to drop under 1%, the specificity of these tools suffers greatly as true variants at very low frequencies can be easily confused with sequencing or PCR artifacts. The recent use of Unique Molecular Identifiers (UMI) in NGS experiments offered a way to accurately separate true variants from artifacts. UMI-based variant callers are slowly replacing raw-reads based variant callers as the standard method for an accurate detection of variants at very low frequencies. However, benchmarking done in the tools publication are usually realized on real biological data in which real variants are not known, making it difficult to assess their accuracy.ResultsWe present UMI-Gen, a UMI-based reads simulator for targeted sequencing paired-end data. UMI-Gen generates reference reads covering the targeted regions at a user customizable depth. After that, using a number of control files, it estimates the background error rate at each position and then modifies the generated reads to mimic real biological data. Finally, it will insert real variants in the reads from a list provided by the user.AvailabilityThe entire pipeline is available at https://gitlab.com/vincent-sater/umigen-master under MIT [email protected]

Download Full-text