scholarly journals Efficient data structures for mobile de novo genome assembly by third-generation sequencing

2017 ◽  
Vol 110 ◽  
pp. 440-447 ◽  
Author(s):  
Franco Milicchio ◽  
Mattia Prosperi
2021 ◽  
Vol 182 (2) ◽  
pp. 63-71
Author(s):  
M. M. Agakhanov ◽  
E. A. Grigoreva ◽  
E. K. Potokina ◽  
P. S. Ulianich ◽  
Y. V. Ukhatova

The immune North American grapevine species Vitis rotundifolia Michaux (subgen. Muscadinia Planch.) is regarded as a potential donor of disease resistance genes, withstanding such dangerous diseases of grapes as powdery and downy mildews. The cultivar ‘Dixie’ is the only representative of this species preserved ex situ in Russia: it is maintained by the N.I. Vavilov All-Russian Institute of Plant Genetic Resources (VIR) in the orchards of its branch, Krymsk Experiment Breeding Station. Third-generation sequencing on the MinION platform was performed to obtain information on the primary structure of the cultivar’s genomic DNA, employing also the results of Illumina sequencing available in databases. A detailed description of the technique with modifications at various stages is presented, as it was used for grapevine genome sequencing and whole-genome sequence assembly. The modified technique included the main stages of the original protocol recommended by the MinION producer: 1) DNA extraction; 2) preparation of libraries for sequencing; 3) MinION sequencing and bioinformatic data processing; 4) de novo whole-genome sequence assembly using only MinION data or hybrid assembly (MinION+Illumina data); and 5) functional annotation of the whole-genome assembly. Stage 4 included not only de novo sequencing, but also the analysis of the available bioinformatic data, thus minimizing errors and increasing precision during the assembly of the studied genome. The DNA isolated from the leaves of cv. ‘Dixie’ was sequenced using two MinION flow cells (R9.4.1).


Life ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 30
Author(s):  
Konstantina Athanasopoulou ◽  
Michaela A. Boti ◽  
Panagiotis G. Adamopoulos ◽  
Paraskevi C. Skourou ◽  
Andreas Scorilas

Although next-generation sequencing (NGS) technology revolutionized sequencing, offering a tremendous sequencing capacity with groundbreaking depth and accuracy, it continues to demonstrate serious limitations. In the early 2010s, the introduction of a novel set of sequencing methodologies, presented by two platforms, Pacific Biosciences (PacBio) and Oxford Nanopore Sequencing (ONT), gave birth to third-generation sequencing (TGS). The innovative long-read technologies turn genome sequencing into an ease-of-handle procedure by greatly reducing the average time of library construction workflows and simplifying the process of de novo genome assembly due to the generation of long reads. Long sequencing reads produced by both TGS methodologies have already facilitated the decipherment of transcriptional profiling since they enable the identification of full-length transcripts without the need for assembly or the use of sophisticated bioinformatics tools. Long-read technologies have also provided new insights into the field of epitranscriptomics, by allowing the direct detection of RNA modifications on native RNA molecules. This review highlights the advantageous features of the newly introduced TGS technologies, discusses their limitations and provides an in-depth comparison regarding their scientific background and available protocols as well as their potential utility in research and clinical applications.


2019 ◽  
Vol 13 (S1) ◽  
Author(s):  
Alexander J. Paul ◽  
Dylan Lawrence ◽  
Myoungkyu Song ◽  
Seung-Hwan Lim ◽  
Chongle Pan ◽  
...  

Abstract Background De novo genome assembly is a technique that builds the genome of a specimen using overlaps of genomic fragments without additional work with reference sequence. Sequence fragments (called reads) are assembled as contigs and scaffolds by the overlaps. The quality of the de novo assembly depends on the length and continuity of the assembly. To enable faster and more accurate assembly of species, existing sequencing techniques have been proposed, for example, high-throughput next-generation sequencing and long-reads-producing third-generation sequencing. However, these techniques require a large amounts of computer memory when very huge-size overlap graphs are resolved. Also, it is challenging for parallel computation. Results To address the limitations, we propose an innovative algorithmic approach, called Scalable Overlap-graph Reduction Algorithms (SORA). SORA is an algorithm package that performs string graph reduction algorithms by Apache Spark. The SORA’s implementations are designed to execute de novo genome assembly on either a single machine or a distributed computing platform. SORA efficiently compacts the number of edges on enormous graphing paths by adapting scalable features of graph processing libraries provided by Apache Spark, GraphX and GraphFrames. Conclusions We shared the algorithms and the experimental results at our project website, https://github.com/BioHPC/SORA. We evaluated SORA with the human genome samples. First, it processed a nearly one billion edge graph on a distributed cloud cluster. Second, it processed mid-to-small size graphs on a single workstation within a short time frame. Overall, SORA achieved the linear-scaling simulations for the increased computing instances.


BMC Genomics ◽  
2015 ◽  
Vol 16 (Suppl 12) ◽  
pp. S9 ◽  
Author(s):  
Chih-Hao Fang ◽  
Yu-Jung Chang ◽  
Wei-Chun Chung ◽  
Ping-Heng Hsieh ◽  
Chung-Yen Lin ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document