Nanopore sequencing and assembly of a human genome with ultra-long reads

Miten Jain; Sergey Koren; Karen H Miga; Josh Quick; Arthur C Rand; Thomas A Sasani; John R Tyson; Andrew D Beggs; Alexander T Dilthey; Ian T Fiddes; Sunir Malla; Hannah Marriott; Tom Nieto; Justin O'Grady; Hugh E Olsen; Brent S Pedersen; Arang Rhie; Hollian Richardson; Aaron R Quinlan; Terrance P Snutch; Louise Tee; Benedict Paten; Adam M Phillippy; Jared T Simpson; Nicholas J Loman; Matthew Loose

doi:10.1038/nbt.4060

Faculty Opinions recommendation of Nanopore sequencing and assembly of a human genome with ultra-long reads.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.732576831.793542122 ◽

2018 ◽

Author(s):

James Coker

Keyword(s):

Human Genome ◽

Nanopore Sequencing ◽

Long Reads

Download Full-text

Fast and sensitive mapping of error-prone nanopore sequencing reads with GraphMap

10.1101/020719 ◽

2015 ◽

Cited By ~ 1

Author(s):

Ivan Sovic ◽

Mile Sikic ◽

Andreas Wilm ◽

Shannon Nicole Fenlon ◽

Swaine Chen ◽

...

Keyword(s):

Human Genome ◽

Variant Calling ◽

Error Rates ◽

Nanopore Sequencing ◽

Structural Variants ◽

Specific Identification ◽

Long Reads ◽

Long Read ◽

Specific Error ◽

Very High

Exploiting the power of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. We present the first nanopore read mapper (GraphMap) that uses a read-funneling paradigm to robustly handle variable error rates and fast graph traversal to align long reads with speed and very high precision (>95%). Evaluation on MinION sequencing datasets against short and long-read mappers indicates that GraphMap increases mapping sensitivity by at least 15-80%. GraphMap alignments are the first to demonstrate consensus calling with <1 error in 100,000 bases, variant calling on the human genome with 76% improvement in sensitivity over the next best mapper (BWA-MEM), precise detection of structural variants from 100bp to 4kbp in length and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.

Download Full-text

Faculty Opinions recommendation of Nanopore sequencing and assembly of a human genome with ultra-long reads.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.732576831.793544828 ◽

2018 ◽

Author(s):

Kevin Yip

Keyword(s):

Human Genome ◽

Nanopore Sequencing ◽

Long Reads

Download Full-text

Nanopore sequencing and assembly of a human genome with ultra-long reads

10.1101/128835 ◽

2017 ◽

Cited By ~ 51

Author(s):

Miten Jain ◽

S Koren ◽

J Quick ◽

AC Rand ◽

TA Sasani ◽

...

Keyword(s):

Human Genome ◽

Cancer Progression ◽

De Novo ◽

Sequence Data ◽

Point Of Care ◽

Genetic Diseases ◽

Nanopore Sequencing ◽

Repeat Structure ◽

Long Reads ◽

Amazon Web Services

AbstractNanopore sequencing is a promising technique for genome sequencing due to its portability, ability to sequence long reads from single molecules, and to simultaneously assay DNA methylation. However until recently nanopore sequencing has been mainly applied to small genomes, due to the limited output attainable. We present nanopore sequencing and assembly of the GM12878 Utah/Ceph human reference genome generated using the Oxford Nanopore MinION and R9.4 version chemistry. We generated 91.2 Gb of sequence data (∼30× theoretical coverage) from 39 flowcells. De novo assembly yielded a highly complete and contiguous assembly (NG50 ∼3Mb). We observed considerable variability in homopolymeric tract resolution between different basecallers. The data permitted sensitive detection of both large structural variants and epigenetic modifications. Further we developed a new approach exploiting the long-read capability of this system and found that adding an additional 5×-coverage of ‘ultra-long’ reads (read N50 of 99.7kb) more than doubled the assembly contiguity. Modelling the repeat structure of the human genome predicts extraordinarily contiguous assemblies may be possible using nanopore reads alone. Portable de novo sequencing of human genomes may be important for rapid point-of-care diagnosis of rare genetic diseases and cancer, and monitoring of cancer progression. The complete dataset including raw signal is available as an Amazon Web Services Open Dataset at: https://github.com/nanopore-wgs-consortium/NA12878.

Download Full-text

Faculty Opinions recommendation of Nanopore sequencing and assembly of a human genome with ultra-long reads.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.732576831.793554178 ◽

2018 ◽

Author(s):

Roderic Guigo

Keyword(s):

Human Genome ◽

Nanopore Sequencing ◽

Long Reads

Download Full-text

De novo Genome Assembly of the indica Rice Variety IR64 Using Linked-Read Sequencing and Nanopore Sequencing

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400871 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1495-1501 ◽

Cited By ~ 1

Author(s):

Tsuyoshi Tanaka ◽

Ryo Nishijima ◽

Shota Teramoto ◽

Yuka Kitomi ◽

Takeshi Hayashi ◽

...

Keyword(s):

Functional Genomics ◽

Genome Assembly ◽

De Novo ◽

Rice Variety ◽

Rice Genome ◽

High Yield ◽

Nanopore Sequencing ◽

Long Reads ◽

A Genome ◽

Modern Varieties

IR64 is a rice variety with high-yield that has been widely cultivated around the world. IR64 has been replaced by modern varieties in most growing areas. Given that modern varieties are mostly progenies or relatives of IR64, genetic analysis of IR64 is valuable for rice functional genomics. However, chromosome-level genome sequences of IR64 have not been available previously. Here, we sequenced the IR64 genome using synthetic long reads obtained by linked-read sequencing and ultra-long reads obtained by nanopore sequencing. We integrated these data and generated the de novo assembly of the IR64 genome of 367 Mb, equivalent to 99% of the estimated size. Continuity of the IR64 genome assembly was improved compared with that of a publicly available IR64 genome assembly generated by short reads only. We annotated 41,458 protein-coding genes, including 657 IR64-specific genes, that are missing in other high-quality rice genome assemblies IRGSP-1.0 of japonica cultivar Nipponbare or R498 of indica cultivar Shuhui498. The IR64 genome assembly will serve as a genome resource for rice functional genomics as well as genomics-driven and/or molecular breeding.

Download Full-text

Long reads nanopore sequencing to recover SARS-CoV-2 whole genome v1 (protocols.io.bfe4jjgw)

protocols.io ◽

10.17504/protocols.io.bfe4jjgw ◽

2020 ◽

Author(s):

Paola Resende

Keyword(s):

Whole Genome ◽

Nanopore Sequencing ◽

Long Reads

Download Full-text

Evaluation of Germline Structural Variant Calling Methods for Nanopore Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2021.761791 ◽

2021 ◽

Vol 12 ◽

Author(s):

Davide Bolognini ◽

Alberto Magi

Keyword(s):

Variant Calling ◽

Research Report ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Factors Affecting ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Sequencing Studies ◽

Long Read

Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at https://github.com/davidebolo1993/EViNCe and can be adjusted to further evaluate future nanopore sequencing datasets.

Download Full-text

Long read nanopore sequencing for detection of HLA and CYP2D6 variants and haplotypes

F1000Research ◽

10.12688/f1000research.6037.2 ◽

2015 ◽

Vol 4 ◽

pp. 17 ◽

Cited By ~ 55

Author(s):

Ron Ammar ◽

Tara A. Paton ◽

Dax Torti ◽

Adam Shlien ◽

Gary D. Bader

Keyword(s):

Medical Decision ◽

Nanopore Sequencing ◽

Clinical Environment ◽

Sequencing Technologies ◽

Long Reads ◽

Long Read ◽

Complete Genomics ◽

Nanopore Sequencer ◽

Actionable Findings ◽

Haplotype Information

Haplotypes are often critical for the interpretation of genetic laboratory observations into medically actionable findings. Current massively parallel DNA sequencing technologies produce short sequence reads that are often unable to resolve haplotype information. Phasing short read data typically requires supplemental statistical phasing based on known haplotype structure in the population or parental genotypic data. Here we demonstrate that the MinION nanopore sequencer is capable of producing very long reads to resolve both variants and haplotypes of HLA-A, HLA-B and CYP2D6 genes important in determining patient drug response in sample NA12878 of CEPH/UTAH pedigree 1463, without the need for statistical phasing. Long read data from a single 24-hour nanopore sequencing run was used to reconstruct haplotypes, which were confirmed by HapMap data and statistically phased Complete Genomics and Sequenom genotypes. Our results demonstrate that nanopore sequencing is an emerging standalone technology with potential utility in a clinical environment to aid in medical decision-making.

Download Full-text

Nanopore sequencing detects structural variants in cancer

10.1101/028290 ◽

2015 ◽

Cited By ~ 3

Author(s):

Alexis L. Norris ◽

Rachael E. Workman ◽

Yunfan Fan ◽

James R. Eshleman ◽

Winston Timp

Keyword(s):

Detection Efficiency ◽

Electrical Current ◽

Read Length ◽

Therapeutic Monitoring ◽

Base Substitution ◽

Nanopore Sequencing ◽

Structural Variants ◽

Large Deletions ◽

Long Reads ◽

Generation Sequencing

Despite advances in sequencing, structural variants (SVs) remain difficult to reliably detect due to the short read length (<300bp) of 2nd generation sequencing. Not only do the reads (or paired-end reads) need to straddle a breakpoint, but repetitive elements often lead to ambiguities in the alignment of short reads. We propose to use the long-reads (up to 20kb) possible with 3rd generation sequencing, specifically nanopore sequencing on the MinION. Nanopore sequencing relies on a similar concept to a Coulter counter, reading the DNA sequence from the change in electrical current resulting from a DNA strand being forced through a nanometer-sized pore embedded in a membrane. Though nanopore sequencing currently has a relatively high mismatch rate that precludes base substitution and small frameshift mutation detection, its accuracy is sufficient for SV detection because of its long reads. In fact, long reads in some cases may improve SV detection efficiency. We have tested nanopore sequencing to detect a series of well-characterized SVs, including large deletions, inversions, and translocations that inactivate the CDKN2A/p16 and SMAD4/DPC4 tumor suppressor genes in pancreatic cancer. Using PCR amplicon mixes, we have demonstrated that nanopore sequencing can detect large deletions, translocations and inversions at dilutions as low as 1:100, with as few as 500 reads per sample. Given the speed, small footprint, and low capital cost, nanopore sequencing could become the ideal tool for the low-level detection of cancer-associated SVs needed for molecular relapse, early detection, or therapeutic monitoring.

Download Full-text