CRISPR-Cas off-target detection using Oxford Nanopore sequencing - is the mitochondrial genome more vulnerable to off-targets?

Mapping Intimacies ◽

10.1101/741322 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sandeep Chakraborty

Keyword(s):

Mitochondrial Genome ◽

Human Subjects ◽

Pcr Amplification ◽

Distal Region ◽

Error Rates ◽

Fast Method ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Oxford Nanopore ◽

Bona Fide

AbstractOxford Nanopore sequencing of DNA molecules is fast gaining popularity for generating longer reads, albeit with higher error rates, in much lesser time, and without the error introduced by PCR-amplification. Recently, CRISPR-Cas9 has been used to enrich genomic regions (nCATS [1]). This was applied on 10 genomic loci (median length=18kb). Here, using the sequencing data (Accid:PRJNA531320), it is shown that the same flow can be used to identify CRISPR-Cas9 off-target edits (OTE). OTEs are an important, but unfortunately underestimated, aspect of CRISPR-Cas gene-editing. An OTE in the mitochondrial genome is shown having 7 mismatches with one of the 10 gRNAs used (GPX1), having as much enrichment as the targeted genomic loci in some samples. Previous study has shown that Cas9 bind to off-targets having as many as 10 mismatches in the PAM-distal region. This OTE has not been reported in the original study (still a pre-print), which states that sequences from parts other than the target locations arise ‘from ligation of nanopore adaptors to random breakage points, with no clear evidence of off-target cleavage by Cas9’ [1], Furthermore, a lot of reads aligning to the mitochondrial genome (sometimes full length) are inverted after the edit. It remains to be seen if these are bona fide translocations after the Cas9 edit, or ONP sequencing artifacts. This also raises the question whether the mitochondrial genome is more prone to off-targets by virtue of being non-nuclear. Another locus in ChrX (13121412) has only 1 mismatch with the second BRAF gRNA (GACCAAGGATTTCGTGGTGA). Although the number of reads for this OTE is less, its very unlikely this is random since it happens 8 out of 11 samples. With the increasing use of (TALEN/ZFN/CRISPR-Cas9) on human subjects, this provides a fast method to quickly query gRNAs for off-targets in cells obtained from the patient, which will have their own unique off-targets due to single nucleotide polymorphism or other variants.

Download Full-text

SACall: a neural network basecaller for Oxford Nanopore sequencing data based on self-attention mechanism

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2020.3039244 ◽

2020 ◽

pp. 1-1

Author(s):

Neng Huang ◽

Fan Nie ◽

Peng Ni ◽

Feng Luo ◽

Jianxin Wang

Keyword(s):

Neural Network ◽

Attention Mechanism ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Oxford Nanopore

Download Full-text

Evaluation of Germline Structural Variant Calling Methods for Nanopore Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2021.761791 ◽

2021 ◽

Vol 12 ◽

Author(s):

Davide Bolognini ◽

Alberto Magi

Keyword(s):

Variant Calling ◽

Research Report ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Factors Affecting ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Sequencing Studies ◽

Long Read

Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at https://github.com/davidebolo1993/EViNCe and can be adjusted to further evaluate future nanopore sequencing datasets.

Download Full-text

Detection of Clinically Relevant Molecular Alterations in Chronic Lymphocytic Leukemia (CLL) By Nanopore Sequencing

Blood ◽

10.1182/blood-2018-99-110948 ◽

2018 ◽

Vol 132 (Supplement 1) ◽

pp. 1847-1847 ◽

Cited By ~ 1

Author(s):

Adam Burns ◽

David Robert Bruce ◽

Pauline Robbe ◽

Adele Timbs ◽

Basile Stamatopoulos ◽

...

Keyword(s):

Error Correction ◽

Low Cost ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Mutation Status ◽

Short Read ◽

Short Read Sequencing ◽

Oxford Nanopore ◽

Low Coverage ◽

Oxford Nanopore Technologies

Abstract Introduction Chronic Lymphocytic Leukaemia (CLL) is the most prevalent leukaemia in the Western world and characterised by clinical heterogeneity. IgHV mutation status, mutations in the TP53 gene and deletions of the p-arm of chromosome 17 are currently used to predict an individual patient's response to therapy and give an indication as to their long-term prognosis. Current clinical guidelines recommend screening patients prior to initial, and any subsequent, treatment. Routine clinical laboratory practices for CLL involve three separate assays, each of which are time-consuming and require significant investment in equipment. Nanopore sequencing offers a rapid, low-cost alternative, generating a full prognostic dataset on a single platform. In addition, Nanopore sequencing also promises low failure rates on degraded material such as FFPE and excellent detection of structural variants due to long read length of sequencing. Importantly, Nanopore technology does not require expensive equipment, is low-maintenance and ideal for patient-near testing, making it an attractive DNA sequencing device for low-to-middle-income countries. Methods Eleven untreated CLL samples were selected for the analysis, harbouring both mutated (n=5) and unmutated (n=6) IgHV genes, seven TP53 mutations (five missense, one stop gain and one frameshift) and two del(17p) events. Primers were designed to amplify all exons of TP53, along with the IgHV locus, and each primer included universal tails for individual sample barcoding. The resulting PCR amplicons were prepared for sequencing using a ligation sequencing kit (SQK-LSK108, Oxford Nanopore Technologies, Oxford, UK). All IgHV libraries were pooled and sequenced on one R9.4 flowcell, with the TP53 libraries pooled and sequenced on a second R9.4 flowcell. Whole genome libraries were prepared from 400ng genomic DNA for each sample using a rapid sequencing kit (SQK-RAD004, Oxford Nanopore Technologies, Oxford, UK), and each sample sequenced on individual flowcells on a MinION mk1b instrument (Oxford Nanopore Technologies, Oxford, UK). We developed a bespoke bioinformatics pipeline to detect copy-number changes, TP53 mutations and IgHV mutation status from the Nanopore sequencing data. Results were compared to short-read sequencing data obtained earlier by targeted deep sequencing (MiSeq, Illumina Inc, San Diego, CA, USA) and whole genome sequencing (HiSeq 2500, Illumina Inc, San Diego CA, USA). Results Following basecalling and adaptor trimming, the raw data were submitted to the IMGT database. In the absence of error correction, it was possible to identify the correct VH family for each sample; however the germline homology was not sufficient to differentiate between IgHVmut and IgHVunmut CLL cases. Following bio-informatic error correction and consensus building, the percentage to germline homology was the same as that obtained from short-read sequencing and nanopore sequencing also called the same productive rearrangements in all cases. A total of 77 TP53 variants were identified, including 68 in non-coding regions, and three synonymous SNVs. The remaining 6 were predicted to be functional variants (eight missense and two stop-gains) and had all been identified in early MiSeq targeted sequencing. However, the frameshift mutation was not called by the analysis pipeline, although it is present in the aligned reads. Using the low-coverage WGS data, we were able to identify del(17p) events, of 19Mb and 20Mb length, in both patients with high confidence. Conclusions Here we demonstrate that characterization of the IgHV locus in CLL cases is possible using the MinION platform, provided sufficient downstream analysis, including error correction, is applied. Furthermore, somatic SNVs in TP53 can be identified, although similar to second generation sequencing, variant calling of small insertions and deletions is more problematic. Identification of del(17p) is possible from low-coverage WGS on the MinION and is inexpensive. Our data demonstrates that Nanopore sequencing can be a viable, patient-near, low-cost alternative to established screening methods, with the potential of diagnostic implementation in resource-poor regions of the world. Disclosures Schuh: Giles, Roche, Janssen, AbbVie: Honoraria.

Download Full-text

Nanocall: An Open Source Basecaller for Oxford Nanopore Sequencing Data

10.1101/046086 ◽

2016 ◽

Cited By ~ 10

Author(s):

Matei David ◽

L.J. Dursi ◽

Delia Yao ◽

Paul C. Boutros ◽

Jared T. Simpson

Keyword(s):

Open Source ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Computing Platform ◽

Internet Connection ◽

Oxford Nanopore ◽

Efficient Processing ◽

Cloud Computing Platform ◽

New Applications ◽

Single Dna Molecule

ABSTRACTMotivationThe highly portable Oxford Nanopore MinlON sequencer has enabled new applications of genome sequencing directly in the field. However, the MinlON currently relies on a cloud computing platform, Metrichor (metrichor.com), for translating locally generated sequencing data into basecalls.ResultsTo allow offline and private analysis of MinlON data, we created Nanocall. Nanocall is the first freely-available, open-source basecaller for Oxford Nanopore sequencing data and does not require an internet connection. On two E.coli and two human samples, with natural as well as PCR-amplified DNA, Nanocall reads have ~68% identity, directly comparable to Metrichor ”1D” data. Further, Nanocall is efficient, processing ~500Kbp of sequence per core hour, and fully parallelized. Using 8 cores, Nanocall could basecall a MinlON sequencing run in real time. Metrichor provides the ability to integrate the ”1D” sequencing of template and complement strands of a single DNA molecule, and create a ”2D” read. Nanocall does not currently integrate this technology, and addition of this capability will be an important future development. In summary, Nanocall is the first open-source, freely available, off-line basecaller for Oxford Nanopore sequencing data.AvailabilityNanocall is available at github.com/mateidavid/nanocall, released under the MIT license.Contactmatei.david at oicr.on.ca

Download Full-text

An attention-based neural network basecaller for Oxford Nanopore sequencing data

2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm47256.2019.8983231 ◽

2019 ◽

Author(s):

Neng Huang ◽

Fan Nie ◽

Peng Ni ◽

Feng Luo ◽

Jianxin Wang

Keyword(s):

Neural Network ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Oxford Nanopore

Download Full-text

A comparative analysis of computational tools for the prediction of epigenetic DNA methylation from long-read sequencing data

10.1101/2021.04.24.441281 ◽

2021 ◽

Author(s):

Shruta Sandesh Pai ◽

Aimee Rachel Mathew ◽

Roy Anindya

Keyword(s):

Dna Methylation ◽

Dna Modification ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Computational Tools ◽

Specific Location ◽

Oxford Nanopore ◽

Long Read ◽

Dna Methylations ◽

Higher Eukaryotes

AbstractRecent development of Oxford Nanopore long-read sequencing has opened new avenues of identifying epigenetic DNA methylation. Among the different epigenetic DNA methylations, N6-methyladenosine is the most prevalent DNA modification in prokaryotes and 5-methylcytosine is common in higher eukaryotes. Here we investigated if N6-methyladenosine and 5-methylcytosine modifications could be predicted from the nanopore sequencing data. Using publicly available genome sequencing data of Saccharomyces cerevisiae, we compared the open-access computational tools, including Tombo, mCaller, Nanopolish and DeepSignal for predicting 6mA and 5mC. Our results suggest that Tombo and mCaller can predict DNA N6-methyladenosine modifications at a specific location, whereas, Tombo dampened fraction, Nanopolish methylation likelihood and DeepSignal methylation probability have comparable efficiency for 5-methylcytosine prediction from Oxford Nanopore sequencing data.

Download Full-text

A highly contiguous genome for the Golden-fronted Woodpecker (Melanerpes aurifrons) via a hybrid Oxford Nanopore and short read assembly

10.1101/2020.01.03.894444 ◽

2020 ◽

Author(s):

Graham Wiley ◽

Matthew J. Miller

Keyword(s):

Molecular Evolution ◽

Mitochondrial Genome ◽

Transposable Elements ◽

Comparative Studies ◽

Low Cost ◽

Bird Species ◽

Hybrid Zones ◽

Sequencing Data ◽

Short Read ◽

Oxford Nanopore

AbstractBackgroundWoodpeckers are found in nearly every part of the world, absent only from Antarctica, Australasia, and Madagascar. Woodpeckers have been important for studies of biogeography, phylogeography, and macroecology. Woodpeckers hybrid zones are often studied to understand the dynamics of introgression between bird species. Notably, woodpeckers are gaining attention for their enriched levels of transposable elements (TEs) relative to most other birds. This enrichment of TEs may have substantial effects on woodpecker molecular evolution. The Golden-fronted Woodpecker (Melanerpes aurifrons) is a member of the largest radiation of New World woodpeckers. However, comparative studies of woodpecker genomes are hindered by the fact that no high-contiguity genome exists for any woodpecker species.FindingsUsing hybrid assembly methods that combine long-read Oxford Nanopore and short-read Illumina sequencing data, we generated a highly contiguous genome assembly for the Golden-fronted Woodpecker. The final assembly is 1.31 Gb and comprises 441 contigs plus a full mitochondrial genome. Half of the assembly is represented by 28 contigs (contig N50), each of these contigs is at least 16 Mb in size (contig L50). High recovery (92.6%) of bird-specific BUSCO genes suggests our assembly is both relatively complete and relatively accurate. Accuracy is also demonstrated by the recovery of a putatively error-free mitochondrial genome. Over a quarter (25.8%) of the genome consists of repetitive elements, with 287 Mb (21.9%) of those elements assignable to the CR1 superfamily of transposable elements, the highest proportion of CR1 repeats reported for any bird genome to date.ConclusionOur assembly provides a useful tool for comparative studies of molecular evolution and genomics in woodpeckers and allies, a group emerging as important for studies on the role that TEs may play in avian evolution. Additionally, the sequencing and bioinformatic resources used to generate this assembly were relatively low-cost and should provide a direction for the development of high-quality genomes for future studies of animal biodiversity.

Download Full-text

Benchmarking Long-Read Assemblers for Genomic Analyses of Bacterial Pathogens Using Oxford Nanopore Sequencing

International Journal of Molecular Sciences ◽

10.3390/ijms21239161 ◽

2020 ◽

Vol 21 (23) ◽

pp. 9161

Author(s):

Zhao Chen ◽

David L. Erickson ◽

Jianghong Meng

Keyword(s):

Virulence Genes ◽

Bacterial Pathogens ◽

Error Rates ◽

Nanopore Sequencing ◽

Long Reads ◽

Oxford Nanopore ◽

Genomic Analyses ◽

Long Read ◽

Genome Analyses ◽

Assembly Algorithms

Oxford Nanopore sequencing can be used to achieve complete bacterial genomes. However, the error rates of Oxford Nanopore long reads are greater compared to Illumina short reads. Long-read assemblers using a variety of assembly algorithms have been developed to overcome this deficiency, which have not been benchmarked for genomic analyses of bacterial pathogens using Oxford Nanopore long reads. In this study, long-read assemblers, namely Canu, Flye, Miniasm/Racon, Raven, Redbean, and Shasta, were thus benchmarked using Oxford Nanopore long reads of bacterial pathogens. Ten species were tested for mediocre- and low-quality simulated reads, and 10 species were tested for real reads. Raven was the most robust assembler, obtaining complete and accurate genomes. All Miniasm/Racon and Raven assemblies of mediocre-quality reads provided accurate antimicrobial resistance (AMR) profiles, while the Raven assembly of Klebsiella variicola with low-quality reads was the only assembly with an accurate AMR profile among all assemblers and species. All assemblers functioned well for predicting virulence genes using mediocre-quality and real reads, whereas only the Raven assemblies of low-quality reads had accurate numbers of virulence genes. Regarding multilocus sequence typing (MLST), Miniasm/Racon was the most effective assembler for mediocre-quality reads, while only the Raven assemblies of Escherichia coli O157:H7 and K. variicola with low-quality reads showed positive MLST results. Miniasm/Racon and Raven were the best performers for MLST using real reads. The Miniasm/Racon and Raven assemblies showed accurate phylogenetic inference. For the pan-genome analyses, Raven was the strongest assembler for simulated reads, whereas Miniasm/Racon and Raven performed the best for real reads. Overall, the most robust and accurate assembler was Raven, closely followed by Miniasm/Racon.

Download Full-text

Nanocall: an open source basecaller for Oxford Nanopore sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btw569 ◽

2016 ◽

Vol 33 (1) ◽

pp. 49-55 ◽

Cited By ~ 53

Author(s):

Matei David ◽

L. J. Dursi ◽

Delia Yao ◽

Paul C. Boutros ◽

Jared T. Simpson

Keyword(s):

Open Source ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Oxford Nanopore

Download Full-text

Detecting and phasing minor single-nucleotide variants from long-read sequencing data

Nature Communications ◽

10.1038/s41467-021-23289-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Zhixing Feng ◽

Jose C. Clemente ◽

Brandon Wong ◽

Eric E. Schadt

Keyword(s):

Genetic Heterogeneity ◽

Error Rates ◽

Metagenomic Data ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Long Read ◽

Technological Limitations

AbstractCellular genetic heterogeneity is common in many biological conditions including cancer, microbiome, and co-infection of multiple pathogens. Detecting and phasing minor variants play an instrumental role in deciphering cellular genetic heterogeneity, but they are still difficult tasks because of technological limitations. Recently, long-read sequencing technologies, including those by Pacific Biosciences and Oxford Nanopore, provide an opportunity to tackle these challenges. However, high error rates make it difficult to take full advantage of these technologies. To fill this gap, we introduce iGDA, an open-source tool that can accurately detect and phase minor single-nucleotide variants (SNVs), whose frequencies are as low as 0.2%, from raw long-read sequencing data. We also demonstrate that iGDA can accurately reconstruct haplotypes in closely related strains of the same species (divergence ≥0.011%) from long-read metagenomic data.

Download Full-text