Complementary information on single nucleotide variants, INDELs and functional translocations can be obtained with RNAseq using different library preparations

Mapping Intimacies ◽

10.1101/301010 ◽

2018 ◽

Author(s):

Riccardo Panero ◽

Maddalena Arigoni ◽

Martina Olivero ◽

Francesca Cordero ◽

Alessandro Weisz ◽

...

Keyword(s):

The Other ◽

Rna Seq ◽

Single Nucleotide Variants ◽

Complementary Information ◽

Single Nucleotide ◽

Sequencing Platform ◽

Variant Frequency ◽

Selection Protocol ◽

Variant Detection ◽

Selection Of

AbstractBackgroundRNA-seq represents an attractive methodology for the detection of functional genomic variants because it allows the integration of variant frequency and their expression. However, although specific statistic frameworks have been designed to detect SNVs/INDELS/gene fusions in RNA-seq data, very little has been done to understand the effect of library preparation protocols on transcript variant detection in RNA-seq data.ResultsHere, we compared RNA-seq results obtained on short reads sequencing platform with two protocols: one based on polyA+ RNA selection protocol (POLYA) and the other based on exonic regions capturing protocol (ACCESS). Our data indicate that ACCESS detects 10% more coding SNV/INDELs with respect to POLYA, making this protocol more suitable for this goal. Furthermore, ACCESS requires less reads for coding SNV detection with respect to POLYA. On the other hand, if the analysis aims at identifying SNV/INDELs also in the 5’and 3’ UTRs, POLYA is definitively the preferred method. No particular advantage comes from the usage of ACCESS or POLYA in the detection of fusion transcripts.ConclusionData show that a careful selection of the “wet” protocol adds specific features that cannot be obtained with bioinformatics alone.

Download Full-text

A Second Generation, Multiple Myeloma-Specific, Targeted Sequencing Platform for Detecting Translocations, Copy Number Alterations, and Single Nucleotide Variants

Blood ◽

10.1182/blood.v126.23.4207.4207 ◽

2015 ◽

Vol 126 (23) ◽

pp. 4207-4207

Author(s):

Brian S White ◽

Irena Lanc ◽

Daniel Auclair ◽

Robert Fulton ◽

Mark A Fiala ◽

...

Keyword(s):

Multiple Myeloma ◽

Cell Biology ◽

Copy Number ◽

Targeted Sequencing ◽

Rna Seq ◽

Copy Number Alterations ◽

Single Nucleotide Variants ◽

Data Set ◽

Single Nucleotide ◽

Sequencing Platform

Abstract Background: Multiple myeloma (MM) is a hematologic cancer characterized by a diversity of genetic lesions-translocations, copy number alterations (CNAs), and single nucleotide variants (SNVs). The prognostic value of translocations and of CNAs has been well established. Determining the clinical significance of SNVs, which are recurrently mutated at much lower frequencies, and how this significance is impacted by translocations and CNAs requires additional, large-scale correlative studies. Such studies can be facilitated by cost-effective targeted sequencing approaches. Hence, we designed a single-platform targeted sequencing approach capable of detecting all three variant types. Methods: We designed oligonucleotide probes complementary to the coding regions of 467 genes and to the IgH and MYC loci, allowing a probe to closely match at most 5 regions within the genome. Genes were selected if they were expressed in an independent RNA-seq MM data set and harbored germline SNP-filtered variants that: (1) occurred with frequency >3%, (2) were clustered in hotspots, (3) occurred in recurrently mutated "cancer genes" (as annotated in COSMIC or MutSig), or (4) occurred in genes involved in DNA repair and/or B-cell biology. IgH and MYC tiling was unbiased (with respect to annotated features within the loci) and spanned from 50 kilobasepairs (kbps) upstream of both regions to 50 kbps downstream of IgH and 100 kbps downstream of MYC. Results: We performed targeted sequencing of 96 CD138-enriched samples derived from MM patients, as well as matched peripheral blood leukocyte normal controls. Sequencing depth (mean 107X) was commensurate with that of available exome sequencing data from these samples (mean 71X). Samples harbored a mean of 25 non-silent variants, including those in known MM-associated genes: NRAS (24%), KRAS (22%), FAM46C (17%), TP53 (10%), DIS3 (8%), and BRAF (3%). Variants detected by both platforms showed a strong correlation (r^2 = 0.8). The capture array detected activating, oncogenic variants in NRAS Q61K (n=3 patients) and KRAS G12C/D/R/V (n=5) that were not detected in exome data. Additionally, we found non-silent, capture-specific variants in MTOR (3%) and in two transcription-related genes that have been previously implicated in cancer: ZFHX4 (5%) and CHD3 (5%). To assess the potential role of deep subclonal variants and our ability to detect them, we performed additional sequencing (mean 565X) on six of the tumor/normal pairs. This revealed 14 manually-reviewed, non-silent variants that were not detected by the initial targeted sequencing. These had a mean variant allele frequency of 2.8% and included mutations in DNMT3A and FAM46C. At least one of these 14 variants occurred in five of the six re-sequenced samples. This highlights the importance of this additional depth, which will be used in future studies. Our approach successfully detected CNAs near expected frequencies, including hyperdiploidy (52%), del(13) (43%), and gain of 1q (35%). Similarly, it inferred IgH translocations at expected frequencies: t(4;14) (14%), t(6;14) (3%), t(11;14) (15%), and t(14;20) (1%). As expected, translocations occur predominantly within the IgH constant region, but also frequently 5' (i.e., telomeric) of the IGHM switch region, and occasionally within the V and D regions. We detected MYC -associated translocations, whose frequencies have been the subject of debate, at 10% (n=9 patients), with five involving IgH, three having both partners in or near MYC, and one having both types. Finally, our platform detected novel IgH translocations with partners near DERL3 (n=2), MYCN (n=1), and FLT3 (n=1). Additional evidence suggests that DERL3 and MYCN may be targets of IgH-induced overexpression: of 84 RNA-seq patient samples, six exhibited outlying expression of DERL3, including one sample in which we detected the translocation in corresponding DNA, and one exhibited outlying expression of MYCN. Conclusion: Our MM-specific targeted sequencing strategy is capable of detecting deeply subclonal SNVs, in addition to CNAs and IgH and MYC translocations. Though additional validation is required, particularly with respect to translocation detection, we anticipate that such technology will soon enable clinical testing on a single sequencing platform. Disclosures Vij: Celgene, Onyx, Takeda, Novartis, BMS, Sanofi, Janssen, Merck: Consultancy; Takeda, Onyx: Research Funding.

Download Full-text

scSNV: accurate dscRNA-seq SNV co-expression analysis using duplicate tag collapsing

Genome Biology ◽

10.1186/s13059-021-02364-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Gavin W. Wilson ◽

Mathieu Derouet ◽

Gail E. Darling ◽

Jonathan C. Yeung

Keyword(s):

Genetic Variants ◽

False Positive ◽

Variant Calling ◽

Call Rate ◽

Rna Seq ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Variant Call ◽

Two Samples ◽

Co Detection

AbstractIdentifying single nucleotide variants has become common practice for droplet-based single-cell RNA-seq experiments; however, presently, a pipeline does not exist to maximize variant calling accuracy. Furthermore, molecular duplicates generated in these experiments have not been utilized to optimally detect variant co-expression. Herein, we introduce scSNV designed from the ground up to “collapse” molecular duplicates and accurately identify variants and their co-expression. We demonstrate that scSNV is fast, with a reduced false-positive variant call rate, and enables the co-detection of genetic variants and A>G RNA edits across twenty-two samples.

Download Full-text

Ultra-sensitive mutation detection and genome-wide DNA copy number reconstruction by error corrected circulating tumour DNA sequencing

10.1101/213306 ◽

2017 ◽

Cited By ~ 3

Author(s):

Sonia Mansukhani ◽

Louise J. Barber ◽

Sing Yu Moorcraft ◽

Michael Davidson ◽

Andrew Woolston ◽

...

Keyword(s):

Error Correction ◽

Copy Number ◽

Parallel Evolution ◽

Tumour Tissue ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Target Region ◽

Circulating Free Dna ◽

Genome Wide ◽

Variant Frequency

AbstractMinimally invasive circulating free DNA (cfDNA) analysis can portray cancer genome landscapes but highly sensitive and specific genetic approaches are necessary to accurately detect mutations with often low variant frequencies. We developed a targeted cfDNA sequencing technology using novel off-the-shelf molecular barcodes for error correction, in combination with custom solution hybrid capture enrichment. Modelling based on cfDNA yields from 58 patients shows that our assay, which requires 25ng of cfDNA input, should be applicable to >95% of patients with metastatic colorectal cancer. Sequencing of a 163.3 kb target region including 32 genes detected 100% of single nucleotide variants with 0.15% variant frequency in cfDNA spike-in experiments. Molecular barcode error correction reduced false positive mutation calls by 98.6%. In a series of 28 patients with metastatic colorectal cancers, 80 out of 91 (88%) mutations previously detected by tumour tissue sequencing were called in the cfDNA. Call rates were similar for single nucleotide variants and small insertions/deletions. Mutations only called in cfDNA but not detectable in matched tumour tissue included, among others, a subclonal resistance driver mutation to anti-EGFR antibodies in theKRASgene, multiple activatingPIK3CAmutations in each of two patients (indicative of parallel evolution), andTP53mutations originating from clonal haematopoiesis. Furthermore, we demonstrate that cfDNA off-target read analysis allows the reconstruction of genome wide copy number aberration profiles from 71% of these 28 cases. This error-corrected ultra-deep cfDNA sequencing assay with a target region that can be readily customized enables broad insights into cancer genomes and evolution.

Download Full-text

Single-nucleotide variants in human RNA: RNA editing and beyond

Briefings in Functional Genomics ◽

10.1093/bfgp/ely032 ◽

2018 ◽

Vol 18 (1) ◽

pp. 30-39 ◽

Cited By ~ 4

Author(s):

Yan Guo ◽

Hui Yu ◽

David C Samuels ◽

Wei Yue ◽

Scott Ness ◽

...

Keyword(s):

Rna Editing ◽

Rna Seq ◽

Rna Modifications ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Genomic Variants ◽

High Prevalence ◽

History Of ◽

Gene Expression Quantification

Abstract Through analysis of paired high-throughput DNA-Seq and RNA-Seq data, researchers quickly recognized that RNA-Seq can be used for more than just gene expression quantification. The alternative applications of RNA-Seq data are abundant, and we are particularly interested in its usefulness for detecting single-nucleotide variants, which arise from RNA editing, genomic variants and other RNA modifications. A stunning discovery made from RNA-Seq analyses is the unexpectedly high prevalence of RNA-editing events, many of which cannot be explained by known RNA-editing mechanisms. Over the past 6–7 years, substantial efforts have been made to maximize the potential of RNA-Seq data. In this review we describe the controversial history of mining RNA-editing events from RNA-Seq data and the corresponding development of methodologies to identify, predict, assess the quality of and catalog RNA-editing events as well as genomic variants.

Download Full-text

A short plus long-amplicon based sequencing approach improves genomic coverage and variant detection in the SARS-CoV-2 genome

PLoS ONE ◽

10.1371/journal.pone.0261014 ◽

2022 ◽

Vol 17 (1) ◽

pp. e0261014

Author(s):

Carlos Arana ◽

Chaoying Liang ◽

Matthew Brock ◽

Bo Zhang ◽

Jinchun Zhou ◽

...

Keyword(s):

Virus Genome ◽

Positive Control ◽

Nasopharyngeal Swab ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Spike Gene ◽

Synonymous Mutations ◽

Variant Detection ◽

Variant Analysis ◽

New Mutations

High viral transmission in the COVID-19 pandemic has enabled SARS‐CoV‐2 to acquire new mutations that may impact genome sequencing methods. The ARTIC.v3 primer pool that amplifies short amplicons in a multiplex-PCR reaction is one of the most widely used methods for sequencing the SARS-CoV-2 genome. We observed that some genomic intervals are poorly captured with ARTIC primers. To improve the genomic coverage and variant detection across these intervals, we designed long amplicon primers and evaluated the performance of a short (ARTIC) plus long amplicon (MRL) sequencing approach. Sequencing assays were optimized on VR-1986D-ATCC RNA followed by sequencing of nasopharyngeal swab specimens from fifteen COVID-19 positive patients. ARTIC data covered 94.47% of the virus genome fraction in the positive control and patient samples. Variant analysis in the ARTIC data detected 217 mutations, including 209 single nucleotide variants (SNVs) and eight insertions & deletions. On the other hand, long-amplicon data detected 156 mutations, of which 80% were concordant with ARTIC data. Combined analysis of ARTIC + MRL data improved the genomic coverage to 97.03% and identified 214 high confidence mutations. The combined final set of 214 mutations included 203 SNVs, 8 deletions and 3 insertions. Analysis showed 26 SARS-CoV-2 lineage defining mutations including 4 known variants of concern K417N, E484K, N501Y, P618H in spike gene. Hybrid analysis identified 7 nonsynonymous and 5 synonymous mutations across the genome that were either ambiguous or not called in ARTIC data. For example, G172V mutation in the ORF3a protein and A2A mutation in Membrane protein were missed by the ARTIC assay. Thus, we show that while the short amplicon (ARTIC) assay provides good genomic coverage with high throughput, complementation of poorly captured intervals with long amplicon data can significantly improve SARS-CoV-2 genomic coverage and variant detection.

Download Full-text

Unexpected CRISPR off-target mutation pattern in vivo are not typically germline-like

10.1101/193565 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zhiting Wei ◽

Funan He ◽

Guohui Chuai ◽

Hanhui Ma ◽

Zhixi Su ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Cas9 Nuclease ◽

Mutation Pattern ◽

And Control ◽

Selection Of

To the EditorSchaefer et al.1 (referred to as Study_1) recently presented the provocative conclusion that CRISPR-Cas9 nuclease can induce many unexpected off-target mutations across the genome that arise from the sites with poor homology to the gRNA. As Wilson et al.2 pointed out, however, the selection of a co-housed mouse as the control is insufficient to attribute the observed mutation differences between the CRISPR-treated mice and control mice. Therefore, the causes of these mutations need to be further investigated. In 2015, Iyer et al.3 (referred to as Study_2) used Cas9 and a pair of sgRNAs to mutate the Ar gene in vivo and off-target mutations were investigated by comparison the control mice and the offspring of the modified mice. After analyzing the whole genome sequencing (WGS) of the offspring and the control mice, they claimed that off-target mutations are rare from CRISPR-Cas9 engineering. Notably, their study only focused on indel off-target mutations. We re-analyzed the WGS data of these two studies and detected both single nucleotide variants (SNVs) and indel mutations.

Download Full-text

Finding a suitable library size to call variants in RNA-seq

10.1101/2019.12.18.881870 ◽

2019 ◽

Author(s):

Anna Quaglieri ◽

Christoffer Flensburg ◽

Terence P Speed ◽

Ian J Majewski

Keyword(s):

Gene Expression ◽

Cancer Biology ◽

Expression Patterns ◽

Cancer Type ◽

Rna Seq ◽

Single Nucleotide Variants ◽

Library Size ◽

Single Nucleotide ◽

Fundamental Factor ◽

Comparable Performance

AbstractBackgroundRNA-Seq allows the study of both gene expression changes and transcribed mutations, providing a highly effective way to gain insight into cancer biology. When planning the sequencing of a large cohort of samples, library size is a fundamental factor affecting both the overall cost and the quality of the results. While several studies analyse the effect that library size has on differential expression analyses, sensitivity analysis for variant detection has received far less attention.ResultsWe simulated shallower sequencing depths by downsampling 45 AML samples that are part of the Leucegene project, which were originally sequenced at high depth. We compared the sensitivity of six methods of recovering validated mutations on the same samples. The methods compared are a combination of three popular callers (MuTect, VarScan, and VarDict) and two filtering strategies. We observed an incremental loss in sensitivity when simulating libraries of 80M, 50M, 40M, 30M and 20M fragments, with the largest loss detected with less than 30M fragments (below 90%). The sensitivity in recovering indels varied markedly between callers, with VarDict showing the highest sensitivity (60%). Single nucleotide variant sensitivity is relatively consistent across methods, apart from MuTect, whose default filters need adjustment when using RNA-Seq. We also analysed 136 RNA-Seq samples from the TCGA-LAML cohort, assessing the change in sensitivity between the initial libraries (average 59M fragments) and after downsampling to 40M fragments. When considering single nucleotide variants in recurrently mutated myeloid genes we found a comparable performance, with a 3% average loss in sensitivity using 40M fragments.ConclusionsBetween 30M and 40M fragments are needed to recover 90%-95% of the initial variants on recurrently mutated myeloid genes. To extend this result to another cancer type, an exploration of the characteristics of its mutations and gene expression patterns is suggested.

Download Full-text

Genome-Wide Development and Validation of Cost-Effective KASP Marker Assays for Genetic Dissection of Heat Stress Tolerance in Maize

International Journal of Molecular Sciences ◽

10.3390/ijms21197386 ◽

2020 ◽

Vol 21 (19) ◽

pp. 7386

Author(s):

Ashok Babadev Jagtap ◽

Yogesh Vikal ◽

Gurmukh Singh Johal

Keyword(s):

Heat Stress ◽

Stress Tolerance ◽

Cost Effective ◽

Genetic Dissection ◽

Rna Seq ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Genome Wide ◽

Heat Stress Tolerance ◽

Kasp Markers

Maize is the third most important cereal crop worldwide. However, its production is vulnerable to heat stress, which is expected to become more and more severe in coming years. Germplasm resilient to heat stress has been identified, but its underlying genetic basis remains poorly understood. Genomic mapping technologies can fill the void, provided robust markers are available to tease apart the genotype-phenotype relationship. In the present investigation, we used data from an RNA-seq experiment to identify single nucleotide polymorphisms (SNPs) between two contrasting lines, LM11 and CML25, sensitive and tolerant to heat stress, respectively. The libraries for RNA-seq were made following heat stress treatment from three separate tissues/organs, comprising the top leaf, ovule, and pollen, all of which are highly vulnerable to damage by heat stress. The single nucleotide variants (SNVs) calling used STAR mapper and GATK caller pipelines in a combined approach to identify highly accurate SNPs between the two lines. A total of 554,423, 410,698, and 596,868 SNVs were discovered between LM11 and CML25 after comparing the transcript sequence reads from the leaf, pollen, and ovule libraries, respectively. Hundreds of these SNPs were then selected to develop into genome-wide Kompetitive Allele-Specific PCR (KASP) markers, which were validated to be robust with a successful SNP conversion rate of 71%. Subsequently, these KASP markers were used to effectively genotype an F2 mapping population derived from a cross of LM11 and CML25. Being highly cost-effective, these KASP markers provide a reliable molecular marker toolkit to not only facilitate the genetic dissection of the trait of heat stress tolerance but also to accelerate the breeding of heat-resilient maize by marker-assisted selection (MAS).

Download Full-text

Investigation of somatic mutations in human brains targeting genes associated with Parkinson's disease

10.1101/2020.05.15.20094722 ◽

2020 ◽

Author(s):

Melissa Leija-Salazar ◽

Alan Pittman ◽

Katya Mokretar ◽

Huw Morris ◽

Anthony HV Schapira ◽

...

Keyword(s):

Somatic Mutations ◽

Monozygotic Twin ◽

Brain Regions ◽

Somatic Variation ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Somatic Variant ◽

Lrrk2 G2019s ◽

Variant Detection ◽

Human Brains

Background: Somatic mutations occur in neurons but their role in synucleinopathies is unknown. Aim: We aimed to identify disease-relevant low-level somatic single nucleotide variants (SNVs) in brains from sporadic patients with synucleinopathies and a monozygotic twin carrying LRRK2 G2019S, whose penetrance could be explained by somatic variation. Methods and Results: We included different brain regions from 26 Parkinsons disease (PD), 1 Incidental Lewy body, 3 multiple system atrophy cases and 12 controls. The whole SNCA locus and exons of other genes associated with PD and neurodegeneration were deeply sequenced using molecular barcodes to improve accuracy. We selected 21 variants at 0.33-5% allele frequencies for validation using accurate methods for somatic variant detection. Conclusions: We could not detect disease-relevant somatic SNVs, however we cannot exclude their presence at earlier stages of degeneration. Our results support that coding somatic SNVs in neurodegeneration are rare, but other types of somatic variants may hold pathological consequences in synucleinopathies.

Download Full-text

A robust benchmark for evaluating and improving mosaic variant calling strategies

10.21203/rs.3.rs-871399/v1 ◽

2022 ◽

Author(s):

Yoo-Jin Ha ◽

Jisoo Kim ◽

Seungseok Kang ◽

Junhan Kim ◽

Se-Young Jo ◽

...

Keyword(s):

Best Practices ◽

Variant Calling ◽

Reference Standard ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Deletion Mutations ◽

Genomic Variants ◽

Variant Detection ◽

Control Samples ◽

Conceptual Difficulties

Abstract The rapid advances in sequencing and analysis technologies have enabled the accurate detection of diverse forms of genomic variants, including germline, somatic, and mosaic mutations. However, unlike for the former two mutations, the best practices for mosaic variant calling still remain chaotic due to the technical and conceptual difficulties faced in evaluation. Here, we present our benchmark of nine feasible strategies for mosaic variant detection based on a systematically designed reference standard that mimics mosaic samples, with 390,153 control positive and 35,208,888 negative single-nucleotide variants and insertion–deletion mutations. We identified the condition-dependent strengths and weaknesses of the current strategies, instead of a single winner, regarding variant allele frequencies, variant sharing, and the usage of control samples. Moreover, feature-level investigation directs the way for immediate to prolonged improvements in mosaic variant calling. Our results will guide researchers in selecting suitable calling algorithms and suggest future strategies for developers.

Download Full-text