Efficient data structures for mobile de novo genome assembly by third-generation sequencing

The immune North American grapevine species Vitis rotundifolia Michaux (subgen. Muscadinia Planch.) is regarded as a potential donor of disease resistance genes, withstanding such dangerous diseases of grapes as powdery and downy mildews. The cultivar ‘Dixie’ is the only representative of this species preserved ex situ in Russia: it is maintained by the N.I. Vavilov All-Russian Institute of Plant Genetic Resources (VIR) in the orchards of its branch, Krymsk Experiment Breeding Station. Third-generation sequencing on the MinION platform was performed to obtain information on the primary structure of the cultivar’s genomic DNA, employing also the results of Illumina sequencing available in databases. A detailed description of the technique with modifications at various stages is presented, as it was used for grapevine genome sequencing and whole-genome sequence assembly. The modified technique included the main stages of the original protocol recommended by the MinION producer: 1) DNA extraction; 2) preparation of libraries for sequencing; 3) MinION sequencing and bioinformatic data processing; 4) de novo whole-genome sequence assembly using only MinION data or hybrid assembly (MinION+Illumina data); and 5) functional annotation of the whole-genome assembly. Stage 4 included not only de novo sequencing, but also the analysis of the available bioinformatic data, thus minimizing errors and increasing precision during the assembly of the studied genome. The DNA isolated from the leaves of cv. ‘Dixie’ was sequenced using two MinION flow cells (R9.4.1).

Download Full-text

Third-Generation Sequencing: The Spearhead towards the Radical Transformation of Modern Genomics

Life ◽

10.3390/life12010030 ◽

2021 ◽

Vol 12 (1) ◽

pp. 30

Author(s):

Konstantina Athanasopoulou ◽

Michaela A. Boti ◽

Panagiotis G. Adamopoulos ◽

Paraskevi C. Skourou ◽

Andreas Scorilas

Keyword(s):

De Novo ◽

Direct Detection ◽

Transcriptional Profiling ◽

Third Generation ◽

De Novo Genome Assembly ◽

Rna Molecules ◽

Third Generation Sequencing ◽

Long Reads ◽

Long Read ◽

Generation Sequencing

Although next-generation sequencing (NGS) technology revolutionized sequencing, offering a tremendous sequencing capacity with groundbreaking depth and accuracy, it continues to demonstrate serious limitations. In the early 2010s, the introduction of a novel set of sequencing methodologies, presented by two platforms, Pacific Biosciences (PacBio) and Oxford Nanopore Sequencing (ONT), gave birth to third-generation sequencing (TGS). The innovative long-read technologies turn genome sequencing into an ease-of-handle procedure by greatly reducing the average time of library construction workflows and simplifying the process of de novo genome assembly due to the generation of long reads. Long sequencing reads produced by both TGS methodologies have already facilitated the decipherment of transcriptional profiling since they enable the identification of full-length transcripts without the need for assembly or the use of sophisticated bioinformatics tools. Long-read technologies have also provided new insights into the field of epitranscriptomics, by allowing the direct detection of RNA modifications on native RNA molecules. This review highlights the advantageous features of the newly introduced TGS technologies, discusses their limitations and provides an in-depth comparison regarding their scientific background and available protocols as well as their potential utility in research and clinical applications.

Download Full-text

De novo Genome Assembly from Next-Generation Sequencing (NGS) Reads

Next-Generation Sequencing Data Analysis ◽

10.1201/b19532-11 ◽

2016 ◽

pp. 144-155

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Next Generation ◽

De Novo Genome Assembly ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Download Full-text

De Novo Genome Assembly of Next-Generation Sequencing Data

Compendium of Plant Genomes - The Brassica rapa Genome ◽

10.1007/978-3-662-47901-8_4 ◽

2015 ◽

pp. 41-51

Author(s):

Min Liu ◽

Dongyuan Liu ◽

Hongkun Zheng

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Generation Sequencing

Download Full-text

de novo repeat detection based on the third generation sequencing reads

2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm47256.2019.8982959 ◽

2019 ◽

Cited By ~ 1

Author(s):

Xingyu Liao ◽

Xiankai Zhang ◽

Fang-Xiang Wu ◽

Jianxin Wang

Keyword(s):

De Novo ◽

Third Generation ◽

The Third ◽

Third Generation Sequencing ◽

Generation Sequencing ◽

Repeat Detection

Download Full-text

A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies

PLoS ONE ◽

10.1371/journal.pone.0017915 ◽

2011 ◽

Vol 6 (3) ◽

pp. e17915 ◽

Cited By ~ 144

Author(s):

Wenyu Zhang ◽

Jiajia Chen ◽

Yang Yang ◽

Yifei Tang ◽

Jing Shang ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Software Tools ◽

Next Generation ◽

De Novo Genome Assembly ◽

Sequencing Technologies ◽

Generation Sequencing ◽

Assembly Software

Download Full-text

Using Apache Spark on genome assembly for scalable overlap-graph reduction

Human Genomics ◽

10.1186/s40246-019-0227-1 ◽

2019 ◽

Vol 13 (S1) ◽

Cited By ~ 1

Author(s):

Alexander J. Paul ◽

Dylan Lawrence ◽

Myoungkyu Song ◽

Seung-Hwan Lim ◽

Chongle Pan ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Time Frame ◽

Apache Spark ◽

Reference Sequence ◽

Graph Reduction ◽

De Novo Genome Assembly ◽

String Graph ◽

Edge Graph ◽

Generation Sequencing

Abstract Background De novo genome assembly is a technique that builds the genome of a specimen using overlaps of genomic fragments without additional work with reference sequence. Sequence fragments (called reads) are assembled as contigs and scaffolds by the overlaps. The quality of the de novo assembly depends on the length and continuity of the assembly. To enable faster and more accurate assembly of species, existing sequencing techniques have been proposed, for example, high-throughput next-generation sequencing and long-reads-producing third-generation sequencing. However, these techniques require a large amounts of computer memory when very huge-size overlap graphs are resolved. Also, it is challenging for parallel computation. Results To address the limitations, we propose an innovative algorithmic approach, called Scalable Overlap-graph Reduction Algorithms (SORA). SORA is an algorithm package that performs string graph reduction algorithms by Apache Spark. The SORA’s implementations are designed to execute de novo genome assembly on either a single machine or a distributed computing platform. SORA efficiently compacts the number of edges on enormous graphing paths by adapting scalable features of graph processing libraries provided by Apache Spark, GraphX and GraphFrames. Conclusions We shared the algorithms and the experimental results at our project website, https://github.com/BioHPC/SORA. We evaluated SORA with the human genome samples. First, it processed a nearly one billion edge graph on a distributed cloud cluster. Second, it processed mid-to-small size graphs on a single workstation within a short time frame. Overall, SORA achieved the linear-scaling simulations for the increased computing instances.

Download Full-text

Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework

BMC Genomics ◽

10.1186/1471-2164-16-s12-s9 ◽

2015 ◽

Vol 16 (Suppl 12) ◽

pp. S9 ◽

Cited By ~ 4

Author(s):

Chih-Hao Fang ◽

Yu-Jung Chang ◽

Wei-Chun Chung ◽

Ping-Heng Hsieh ◽

Chung-Yen Lin ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Subset Selection ◽

Next Generation ◽

Mapreduce Framework ◽

De Novo Genome Assembly ◽

Generation Sequencing ◽

High Depth ◽

Selection Of

Download Full-text