False Alarms in Consumer Genomics Add to Public Fear and Potential Health Care Burden

Xiaoming Liu; Deborah Cragun; Jinyong Pang; Swamy R. Adapa; Renee Fonseca; Rays H. Y. Jiang

doi:10.3390/jpm10040187

False Alarms in Consumer Genomics Add to Public Fear and Potential Health Care Burden

Journal of Personalized Medicine ◽

10.3390/jpm10040187 ◽

2020 ◽

Vol 10 (4) ◽

pp. 187

Author(s):

Xiaoming Liu ◽

Deborah Cragun ◽

Jinyong Pang ◽

Swamy R. Adapa ◽

Renee Fonseca ◽

...

Keyword(s):

Genetic Diseases ◽

False Alarms ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Potential Health ◽

Single Nucleotide ◽

Direct To Consumer ◽

Success Stories ◽

Health Care Burden ◽

Genotyping Microarray

We have entered an era of direct-to-consumer (DTC) genomics. Patients have relayed many success stories of DTC genomics about finding causal mutations of genetic diseases before showing any symptoms and taking precautions. However, consumers may also take unnecessary medical actions based on false alarms of “pathogenic alleles”. The severity of this problem is not well known. Using publicly available data, we compared DTC microarray genotyping data with deep-sequencing data of 5 individuals and manually checked each inconsistently reported single nucleotide variants (SNVs). We estimated that, on average, a person would have ~5 “pathogenic” alleles reported due to wrongly reported genotypes if using a 23andMe genotyping microarray. We also found that the number of wrongly classified “pathogenic” alleles per person is at least as significant as those due to wrongly reported genotypes. We show that the scale of the false alarm problem could be large enough that the medical costs will become a burden to public health.

Download Full-text

A gain-of-function single nucleotide variant creates a new promoter which acts as an orientation-dependent enhancer-blocker

Nature Communications ◽

10.1038/s41467-021-23980-6 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Yavor K. Bozhilov ◽

Damien J. Downes ◽

Jelena Telenius ◽

A. Marieke Oudelaar ◽

Emmanuel N. Olivier ◽

...

Keyword(s):

Gene Expression ◽

Genetic Diseases ◽

Regulatory Elements ◽

Base Change ◽

Dependent Manner ◽

Globin Genes ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Super Enhancer ◽

Single Base Change

AbstractMany single nucleotide variants (SNVs) associated with human traits and genetic diseases are thought to alter the activity of existing regulatory elements. Some SNVs may also create entirely new regulatory elements which change gene expression, but the mechanism by which they do so is largely unknown. Here we show that a single base change in an otherwise unremarkable region of the human α-globin cluster creates an entirely new promoter and an associated unidirectional transcript. This SNV downregulates α-globin expression causing α-thalassaemia. Of note, the new promoter lying between the α-globin genes and their associated super-enhancer disrupts their interaction in an orientation-dependent manner. Together these observations show how both the order and orientation of the fundamental elements of the genome determine patterns of gene expression and support the concept that active genes may act to disrupt enhancer-promoter interactions in mammals as in Drosophila. Finally, these findings should prompt others to fully evaluate SNVs lying outside of known regulatory elements as causing changes in gene expression by creating new regulatory elements.

Download Full-text

Implications of Genetic Distance to Reference and De Novo Genome Assembly for Clinical Genomics in Africans

10.1101/2020.09.25.20201780 ◽

2020 ◽

Author(s):

Daniel Shriner ◽

Adebowale Adeyemo ◽

Charles Rotimi

Keyword(s):

Genetic Distance ◽

De Novo ◽

Reference Sequence ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

De Novo Genome Assembly ◽

Single Nucleotide ◽

Clinical Genomics ◽

Advantages And Disadvantages ◽

False Discovery

In clinical genomics, variant calling from short-read sequencing data typically relies on a pan-genomic, universal human reference sequence. A major limitation of this approach is that the number of reads that incorrectly map or fail to map increase as the reads diverge from the reference sequence. In the context of genome sequencing of genetically diverse Africans, we investigate the advantages and disadvantages of using a de novo assembly of the read data as the reference sequence in single sample calling. Conditional on sufficient read depth, the alignment-based and assembly-based approaches yielded comparable sensitivity and false discovery rates for single nucleotide variants when benchmarked against a gold standard call set. The alignment-based approach yielded coverage of an additional 270.8 Mb over which sensitivity was lower and the false discovery rate was higher. Although both approaches detected and missed clinically relevant variants, the assembly-based approach identified more such variants than the alignment-based approach. Of particular relevance to individuals of African descent, the assembly-based approach identified four heterozygous genotypes containing the sickle allele whereas the alignment-based approach identified no occurrences of the sickle allele. Variant annotation using dbSNP and gnomAD identified systematic biases in these databases due to underrepresentation of Africans. Using the counts of homozygous alternate genotypes from the alignment-based approach as a measure of genetic distance to the reference sequence GRCh38.p12, we found that the numbers of misassemblies, total variant sites, potentially novel single nucleotide variants (SNVs), and certain variant classes (e.g., splice acceptor variants, stop loss variants, missense variants, synonymous variants, and variants absent from gnomAD) were significantly correlated with genetic distance. In contrast, genomic coverage and other variant classes (e.g., ClinVar pathogenic or likely pathogenic variants, start loss variants, stop gain variants, splice donor variants, incomplete terminal codons, variants with CADD score ≥20) were not correlated with genetic distance. With improvement in coverage, the assembly-based approach can offer a viable alternative to the alignment-based approach, with the advantage that it can obviate the need to generate diverse human reference sequences or collections of alternate scaffolds.

Download Full-text

Highly multiplexed, fast and accurate nanopore sequencing for verification of synthetic DNA constructs and sequence libraries

Synthetic Biology ◽

10.1093/synbio/ysz025 ◽

2019 ◽

Vol 4 (1) ◽

Cited By ~ 4

Author(s):

Andrew Currin ◽

Neil Swainston ◽

Mark S Dunstan ◽

Adrian J Jervis ◽

Paul Mulherin ◽

...

Keyword(s):

Synthetic Biology ◽

Dna Sequencing ◽

Cost Effective ◽

Polymorphism Analysis ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Synthetic Dna ◽

Design Build ◽

Hardware Costs

Abstract Synthetic biology utilizes the Design–Build–Test–Learn pipeline for the engineering of biological systems. Typically, this requires the construction of specifically designed, large and complex DNA assemblies. The availability of cheap DNA synthesis and automation enables high-throughput assembly approaches, which generates a heavy demand for DNA sequencing to verify correctly assembled constructs. Next-generation sequencing is ideally positioned to perform this task, however with expensive hardware costs and bespoke data analysis requirements few laboratories utilize this technology in-house. Here a workflow for highly multiplexed sequencing is presented, capable of fast and accurate sequence verification of DNA assemblies using nanopore technology. A novel sample barcoding system using polymerase chain reaction is introduced, and sequencing data are analyzed through a bespoke analysis algorithm. Crucially, this algorithm overcomes the problem of high-error rate nanopore data (which typically prevents identification of single nucleotide variants) through statistical analysis of strand bias, permitting accurate sequence analysis with single-base resolution. As an example, 576 constructs (6 × 96 well plates) were processed in a single workflow in 72 h (from Escherichia coli colonies to analyzed data). Given our procedure’s low hardware costs and highly multiplexed capability, this provides cost-effective access to powerful DNA sequencing for any laboratory, with applications beyond synthetic biology including directed evolution, single nucleotide polymorphism analysis and gene synthesis.

Download Full-text

neoepiscope improves neoepitope prediction with multivariant phasing

Bioinformatics ◽

10.1093/bioinformatics/btz653 ◽

2019 ◽

Vol 36 (3) ◽

pp. 713-720 ◽

Cited By ~ 5

Author(s):

Mary A Wood ◽

Austin Nguyen ◽

Adam J Struck ◽

Kyle Ellrott ◽

Abhinav Nellore ◽

...

Keyword(s):

False Negative ◽

Supplementary Information ◽

Supplementary File ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Somatic Variant ◽

Negative Results ◽

Multiple Datasets ◽

False Negative Results

Abstract Motivation The vast majority of tools for neoepitope prediction from DNA sequencing of complementary tumor and normal patient samples do not consider germline context or the potential for the co-occurrence of two or more somatic variants on the same mRNA transcript. Without consideration of these phenomena, existing approaches are likely to produce both false-positive and false-negative results, resulting in an inaccurate and incomplete picture of the cancer neoepitope landscape. We developed neoepiscope chiefly to address this issue for single nucleotide variants (SNVs) and insertions/deletions (indels). Results Herein, we illustrate how germline and somatic variant phasing affects neoepitope prediction across multiple datasets. We estimate that up to ∼5% of neoepitopes arising from SNVs and indels may require variant phasing for their accurate assessment. neoepiscope is performant, flexible and supports several major histocompatibility complex binding affinity prediction tools. Availability and implementation neoepiscope is available on GitHub at https://github.com/pdxgx/neoepiscope under the MIT license. Scripts for reproducing results described in the text are available at https://github.com/pdxgx/neoepiscope-paper under the MIT license. Additional data from this study, including summaries of variant phasing incidence and benchmarking wallclock times, are available in Supplementary Files 1, 2 and 3. Supplementary File 1 contains Supplementary Table 1, Supplementary Figures 1 and 2, and descriptions of Supplementary Tables 2–8. Supplementary File 2 contains Supplementary Tables 2–6 and 8. Supplementary File 3 contains Supplementary Table 7. Raw sequencing data used for the analyses in this manuscript are available from the Sequence Read Archive under accessions PRJNA278450, PRJNA312948, PRJNA307199, PRJNA343789, PRJNA357321, PRJNA293912, PRJNA369259, PRJNA305077, PRJNA306070, PRJNA82745 and PRJNA324705; from the European Genome-phenome Archive under accessions EGAD00001004352 and EGAD00001002731; and by direct request to the authors. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Reference exome data for a Northern Brazilian population

Scientific Data ◽

10.1038/s41597-020-00703-y ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Alexia L. Weeks ◽

Richard W. Francis ◽

Joao I. C. F. Neri ◽

Nathaly M. C. Costa ◽

Nivea M. R. Arrais ◽

...

Keyword(s):

Rare Diseases ◽

Sequence Data ◽

Genetic Diseases ◽

Specific Reference ◽

Single Nucleotide Variants ◽

Data Set ◽

Single Nucleotide ◽

Rare Genetic Diseases ◽

Genetics And Genomics ◽

Brazilian Cohort

Abstract Exome sequencing is widely used in the diagnosis of rare genetic diseases and provides useful variant data for analysis of complex diseases. There is not always adequate population-specific reference data to assist in assigning a diagnostic variant to a specific clinical condition. Here we provide a catalogue of variants called after sequencing the exomes of 45 babies from Rio Grande do Nord in Brazil. Sequence data were processed using an ‘intersect-then-combine’ (ITC) approach, using GATK and SAMtools to call variants. A total of 612,761 variants were identified in at least one individual in this Brazilian Cohort, including 559,448 single nucleotide variants (SNVs) and 53,313 insertion/deletions. Of these, 58,111 overlapped with nonsynonymous (nsSNVs) or splice site (ssSNVs) SNVs in dbNSFP. As an aid to clinical diagnosis of rare diseases, we used the American College of Medicine Genetics and Genomics (ACMG) guidelines to assign pathogenic/likely pathogenic status to 185 (0.32%) of the 58,111 nsSNVs and ssSNVs. Our data set provides a useful reference point for diagnosis of rare diseases in Brazil. (169 words).

Download Full-text

The discrepancy among single nucleotide variants detected by DNA and RNA high throughput sequencing data

BMC Genomics ◽

10.1186/s12864-017-4022-x ◽

2017 ◽

Vol 18 (S6) ◽

Cited By ~ 16

Author(s):

Yan Guo ◽

Shilin Zhao ◽

Quanhu Sheng ◽

David C Samuels ◽

Yu Shyr

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Dna And Rna ◽

High Throughput Sequencing Data

Download Full-text

Misannotation of multiple-nucleotide variants risks misdiagnosis

Wellcome Open Research ◽

10.12688/wellcomeopenres.15420.1 ◽

2019 ◽

Vol 4 ◽

pp. 145

Author(s):

Matthew N. Wakeling ◽

Thomas W. Laver ◽

Kevin Colclough ◽

Andrew Parish ◽

Sian Ellard ◽

...

Keyword(s):

Best Practices ◽

False Negative ◽

Simulated Data ◽

Sequencing Analysis ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Public Resources ◽

Next Generation Sequencing Analysis ◽

Optimal Approach

Multiple Nucleotide Variants (MNVs) are miscalled by the most widely utilised next generation sequencing analysis (NGS) pipelines, presenting the potential for missing diagnoses that would previously have been made by standard Sanger (dideoxy) sequencing. These variants, which should be treated as a single insertion-deletion mutation event, are commonly called as separate single nucleotide variants. This can result in misannotation, incorrect amino acid predictions and potentially false positive and false negative diagnostic results. This risk will be increased as confirmatory Sanger sequencing of Single Nucleotide variants (SNVs) ceases to be standard practice. Using simulated data and re-analysis of sequencing data from a diagnostic targeted gene panel, we demonstrate that the widely adopted pipeline, GATK best practices, results in miscalling of MNVs and that alternative tools can call these variants correctly. The adoption of calling methods that annotate MNVs correctly would present a solution for individual laboratories, however GATK best practices are the basis for important public resources such as the gnomAD database. We suggest integrating a solution into these guidelines would be the optimal approach.

Download Full-text

Identification of single nucleotide variants using position-specific error estimation in deep sequencing data

10.1101/475947 ◽

2018 ◽

Author(s):

Dimitrios Kleftogiannis ◽

Marco Punta ◽

Anuradha Jayaram ◽

Shahneen Sandhu ◽

Stephen Q. Wong ◽

...

Keyword(s):

Deep Sequencing ◽

Low Frequency ◽

Poisson Model ◽

Real Data ◽

Analytical Sensitivity ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Deep Sequencing Data ◽

Targeted Deep Sequencing

AbstractBackgroundTargeted deep sequencing is a highly effective technology to identify known and novel single nucleotide variants (SNVs) with many applications in translational medicine, disease monitoring and cancer profiling. However, identification of SNVs using deep sequencing data is a challenging computational problem as different sequencing artifacts limit the analytical sensitivity of SNV detection, especially at low variant allele frequencies (VAFs).MethodsTo address the problem of relatively high noise levels in amplicon-based deep sequencing data (e.g. with the Ion AmpliSeq technology) in the context of SNV calling, we have developed a new bioinformatics tool called AmpliSolve. AmpliSolve uses a set of normal samples to model position-specific, strand-specific and nucleotide-specific background artifacts (noise), and deploys a Poisson model-based statistical framework for SNV detection.ResultsOur tests on both synthetic and real data indicate that AmpliSolve achieves a good trade-off between precision and sensitivity, even at VAF below 5% and as low as 1%. We further validate AmpliSolve by applying it to the detection of SNVs in 96 circulating tumor DNA samples at three clinically relevant genomic positions and compare the results to digital droplet PCR experiments.ConclusionsAmpliSolve is a new tool for in-silico estimation of background noise and for detection of low frequency SNVs in targeted deep sequencing data. Although AmpliSolve has been specifically designed for and tested on amplicon-based libraries sequenced with the Ion Torrent platform it can, in principle, be applied to other sequencing platforms as well. AmpliSolve is freely available at https://github.com/dkleftogi/AmpliSolve.

Download Full-text

CRISPR/Cas9 inhibits rather than induces non-targeted DNA cleavage more likely to cause off-target single-nucleotide variants

10.22541/au.163253413.33671876/v1 ◽

2021 ◽

Author(s):

Ze Zhang ◽

Yuanyuan Guo ◽

Rongjia Zhang ◽

Wuchen Yang ◽

Zhengqing Xie ◽

...

Keyword(s):

Dna Cleavage ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Free System ◽

Cell Free System ◽

Cell Extracts ◽

Target Dna ◽

Target Sequences

CRISPR/Cas9 gene targeting technology has become the most widely used gene editing technology in both plants and animals. However, substantial off-target effect remains as a major imperfection hindering its further application. Here, Nicotiana benthamiana leaf cell-free system was used to simulate in vivo environment. And the effects of different CRISPR/Cas9 components on DNA stability in cell-free system were studied to explore possible mechanisms causing CRISPR off-target. The results showed that overexpressing Cas9, nCas9 and dCas9 significantly inhibited DNA cleavage in the cell extracts. While overexpressing RNPs accelerated the target DNA cleavage but inhibited non-target DNA digestion in cell extracts, overexpressing nRNP and dRNP blocked the cleavage of either target or non-target sequences. Meanwhile, analysis of whole-genome sequencing data from mice and rice edited by different CRISPR tools revealed that the main off-target mutations were SNVs (single nucleotide variants), rather than Indels (insertions and deletions) that were readily induced by DNA double-strand breaks. The off-target sites did not match the conventionally predicted places but were PAM-rich sites preferred. Our study suggests that PAM-dependent binding without cleavage of CRISPR/Cas9 to non-target sequences may increase off-target mutation risks through impeding the necessary cleavage process for repairing spontaneous or environmentally induced non-targeted DNA mutations.

Download Full-text

DeepSVP: Integration of genotype and phenotype for structural variant prioritization using deep learning

10.1101/2021.01.28.428557 ◽

2021 ◽

Author(s):

Azza Althagafi ◽

Lamia Alsubaie ◽

Nagarajan Kathiresan ◽

Katsuhiko Mineta ◽

Taghrid Aloraini ◽

...

Keyword(s):

Research Group ◽

Genetic Diseases ◽

Computational Method ◽

Structural Variants ◽

Single Nucleotide Variants ◽

Structural Genomic ◽

Single Nucleotide ◽

Coding Regions ◽

Molecular Features ◽

Gene Functions

AbstractMotivationStructural genomic variants account for much of human variability and are involved in several diseases. Structural variants are complex and may affect coding regions of multiple genes, or affect the functions of genomic regions in different ways from single nucleotide variants. Interpreting the phenotypic consequences of structural variants relies on information about gene functions, haploinsufficiency or triplosensitivity, and other genomic features. Phenotype-based methods to identifying variants that are involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been applied successfully to single nucleotide variants, as well as short insertions and deletions, the complexity of structural variants makes it more challenging to link them to phenotypes. Furthermore, structural variants can affect a large number of coding regions, and phenotype information may not be available for all of them.ResultsWe developed DeepSVP, a computational method to prioritize structural variants involved in genetic diseases by combining genomic information with information about gene functions. We incorporate phenotypes linked to genes, functions of gene products, gene expression in individual celltypes, and anatomical sites of expression, and systematically relate them to their phenotypic consequences through ontologies and machine learning. DeepSVP significantly improves the success rate of finding causative variants in several benchmarks and can identify novel pathogenic structural variants in consanguineous families.Availabilityhttps://github.com/bio-ontology-research-group/[email protected]

Download Full-text