The bioinformatics tools for the genome assembly and analysis based on third-generation sequencing

AbstractBackgroundGraph-based representation of genome assemblies has been recently used in different applications — from gene finding to haplotype separation. While most of these applications are based on the alignment of molecular sequences to assembly graphs, existing software tools for finding such alignments have important limitations.ResultsWe present a novel SPAligner tool for aligning long diverged molecular sequences to assembly graphs and demonstrate that SPAligner is an efficient solution for mapping third generation sequencing data and can also facilitate the identification of known genes in complex metagenomic datasets.ConclusionsOur work will facilitate accelerating the development of graph-based approaches in solving sequence to genome assembly alignment problem. SPAligner is implemented as a part of SPAdes tools library and is available on https://github.com/ablab/spades/archive/spaligner-paper.zip.

Download Full-text

Genome assembly of Vitis rotundifolia Michx. using third-generation sequencing (Oxford Nanopore Technologies)

PROCEEDINGS ON APPLIED BOTANY GENETICS AND BREEDING ◽

10.30901/2227-8834-2021-2-63-71 ◽

2021 ◽

Vol 182 (2) ◽

pp. 63-71

Author(s):

M. M. Agakhanov ◽

E. A. Grigoreva ◽

E. K. Potokina ◽

P. S. Ulianich ◽

Y. V. Ukhatova

Keyword(s):

Genome Sequence ◽

Genome Assembly ◽

De Novo ◽

Whole Genome Sequence ◽

Whole Genome ◽

Third Generation ◽

Vitis Rotundifolia ◽

Third Generation Sequencing ◽

Genome Sequence Assembly ◽

Generation Sequencing

The immune North American grapevine species Vitis rotundifolia Michaux (subgen. Muscadinia Planch.) is regarded as a potential donor of disease resistance genes, withstanding such dangerous diseases of grapes as powdery and downy mildews. The cultivar ‘Dixie’ is the only representative of this species preserved ex situ in Russia: it is maintained by the N.I. Vavilov All-Russian Institute of Plant Genetic Resources (VIR) in the orchards of its branch, Krymsk Experiment Breeding Station. Third-generation sequencing on the MinION platform was performed to obtain information on the primary structure of the cultivar’s genomic DNA, employing also the results of Illumina sequencing available in databases. A detailed description of the technique with modifications at various stages is presented, as it was used for grapevine genome sequencing and whole-genome sequence assembly. The modified technique included the main stages of the original protocol recommended by the MinION producer: 1) DNA extraction; 2) preparation of libraries for sequencing; 3) MinION sequencing and bioinformatic data processing; 4) de novo whole-genome sequence assembly using only MinION data or hybrid assembly (MinION+Illumina data); and 5) functional annotation of the whole-genome assembly. Stage 4 included not only de novo sequencing, but also the analysis of the available bioinformatic data, thus minimizing errors and increasing precision during the assembly of the studied genome. The DNA isolated from the leaves of cv. ‘Dixie’ was sequenced using two MinION flow cells (R9.4.1).

Download Full-text

Efficient data structures for mobile de novo genome assembly by third-generation sequencing

Procedia Computer Science ◽

10.1016/j.procs.2017.06.115 ◽

2017 ◽

Vol 110 ◽

pp. 440-447 ◽

Cited By ~ 2

Author(s):

Franco Milicchio ◽

Mattia Prosperi

Keyword(s):

Data Structures ◽

Genome Assembly ◽

De Novo ◽

Third Generation ◽

De Novo Genome Assembly ◽

Third Generation Sequencing ◽

Efficient Data ◽

Efficient Data Structures ◽

Generation Sequencing

Download Full-text

New insights on Pseudoalteromonas haloplanktis TAC125 genome organization and benchmarks of genome assembly applications using next and third generation sequencing technologies

Scientific Reports ◽

10.1038/s41598-019-52832-z ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 3

Author(s):

Weihong Qi ◽

Andrea Colarusso ◽

Miriam Olombrada ◽

Ermenegilda Parrilli ◽

Andrea Patrignani ◽

...

Keyword(s):

Genome Assembly ◽

Model Organisms ◽

Nanopore Sequencing ◽

Third Generation ◽

Small Plasmid ◽

Pseudoalteromonas Haloplanktis ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Oxford Nanopore ◽

Generation Sequencing

Abstract Pseudoalteromonas haloplanktis TAC125 is among the most commonly studied bacteria adapted to cold environments. Aside from its ecological relevance, P. haloplanktis has a potential use for biotechnological applications. Due to its importance, we decided to take advantage of next generation sequencing (Illumina) and third generation sequencing (PacBio and Oxford Nanopore) technologies to resequence its genome. The availability of a reference genome, obtained using whole genome shotgun sequencing, allowed us to study and compare the results obtained by the different technologies and draw useful conclusions for future de novo genome assembly projects. We found that assembly polishing using Illumina reads is needed to achieve a consensus accuracy over 99.9% when using Oxford Nanopore sequencing, but not in PacBio sequencing. However, the dependency of consensus accuracy on coverage is lower in Oxford Nanopore than in PacBio, suggesting that a cost-effective solution might be the use of low coverage Oxford Nanopore sequencing together with Illumina reads. Despite the differences in consensus accuracy, all sequencing technologies revealed the presence of a large plasmid, pMEGA, which was undiscovered until now. Among the most interesting features of pMEGA is the presence of a putative error-prone polymerase regulated through the SOS response. Aside from the characterization of the newly discovered plasmid, we confirmed the sequence of the small plasmid pMtBL and uncovered the presence of a potential partitioning system. Crucially, this study shows that the combination of next and third generation sequencing technologies give us an unprecedented opportunity to characterize our bacterial model organisms at a very detailed level.

Download Full-text

Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms

10.1101/2020.03.16.993428 ◽

2020 ◽

Cited By ~ 1

Author(s):

Nadège Guiglielmoni ◽

Antoine Houtain ◽

Alessandro Derzelle ◽

Karine van Doninck ◽

Jean-François Flot

Keyword(s):

Genome Assembly ◽

Model Organism ◽

Model Organisms ◽

Third Generation ◽

Post Processing ◽

Third Generation Sequencing ◽

Long Reads ◽

Long Read ◽

Generation Sequencing ◽

Chromosome Level

Third-generation sequencing, also called long-read sequencing, is revolutionizing genome assembly: as PacBio and Nanopore technologies become more accessible in technicity and in cost, long-read assemblers flourish and are starting to deliver chromosome-level assemblies. However, these long reads are also error-prone, making the generation of a haploid reference out of a diploid genome a difficult enterprise. Although failure to properly collapse haplotypes results in fragmented and/or structurally incorrect assemblies and wreaks havoc on orthology inference pipelines, this serious issue is rarely acknowledged and dealt with in genomic projects, and an independent, comparative benchmark of the capacity of assemblers and post-processing tools to properly collapse or purge haplotypes is still lacking. To fill this gap, we tested different assembly strategies on the genome of the rotifer Adineta vaga, a non-model organism for which high coverages of both PacBio and Nanopore reads were available. The assemblers we tested (Canu, Flye, NextDenovo, Ra, Raven, Shasta and wtdbg2) exhibited strikingly different behaviors when dealing with highly heterozygous regions, resulting in variable amounts of uncollapsed haplotypes. Filtering out shorter reads generally improved haploid assemblies, and we also benchmarked three post-processing tools aimed at detecting and purging uncollapsed haplotypes in long-read assemblies: HaploMerger2, purge_haplotigs and purge_dups. Testing these strategies separately and in combination revealed several approaches able to generate haploid assemblies with genome sizes, coverage distributions, and completeness close to expectations.

Download Full-text

Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly

Molecular Medicine Reports ◽

10.3892/mmr.2021.11890 ◽

2021 ◽

Vol 23 (4) ◽

Author(s):

Marios Gavrielatos ◽

Konstantinos Kyriakidis ◽

Demetrios Spandidos ◽

Ioannis Michalopoulos

Keyword(s):

Genome Assembly ◽

De Novo ◽

Third Generation ◽

De Novo Genome Assembly ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text

BUILDING CATALOGUE OF LIFE: ULTRAHIGH THROUGHPUT DNA BARCODING USING THIRD GENERATION SEQUENCING

MOLECULAR PHYLOGENETICS ◽

10.30826/molphy2018-05 ◽

2018 ◽

Author(s):

P.D.N. HEBERT ◽

◽

T.W.A. BRAUKMANN ◽

S.W.J. PROSSER ◽

S. RATNASINGHAM ◽

...

Keyword(s):

Dna Barcoding ◽

Third Generation ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text

IsoDetect: Detection of splice isoforms from third generation long reads based on short feature sequences

Current Bioinformatics ◽

10.2174/1574893615666200316101205 ◽

2020 ◽

Vol 15 ◽

Author(s):

Hongdong Li ◽

Wenjing Zhang ◽

Yuwen Luo ◽

Jianxin Wang

Keyword(s):

Sequence Similarity ◽

Detection Methods ◽

Sequence Information ◽

Third Generation ◽

Sequencing Data ◽

Splice Isoforms ◽

Third Generation Sequencing ◽

Long Reads ◽

Feature Sequence ◽

Generation Sequencing

Aims: Accurately detect isoforms from third generation sequencing data. Background: Transcriptome annotation is the basis for the analysis of gene expression and regulation. The transcriptome annotation of many organisms such as humans is far from incomplete, due partly to the challenge in the identification of isoforms that are produced from the same gene through alternative splicing. Third generation sequencing (TGS) reads provide unprecedented opportunity for detecting isoforms due to their long length that exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection methods is that they are exclusively based on sequence reads, without incorporating the sequence information of known isoforms. Objective: Develop an efficient method for isoform detection. Method: Based on annotated isoforms, we propose a splice isoform detection method called IsoDetect. First, the sequence at exon-exon junction is extracted from annotated isoforms as the “short feature sequence”, which is used to distinguish different splice isoforms. Second, we aligned these feature sequences to long reads and divided long reads into groups that contain the same set of feature sequences, thereby avoiding the pair-wise comparison among the large number of long reads. Third, clustering and consensus generation are carried out based on sequence similarity. For the long reads that do not contain any short feature sequence, clustering analysis based on sequence similarity is performed to identify isoforms. Result: Tested on two datasets from Calypte Anna and Zebra Finch, IsoDetect showed higher speed and compelling accuracy compared with four existing methods. Conclusion: IsoDetect is a promising method for isoform detection. Other: This paper was accepted by the CBC2019 conference.

Download Full-text

Comparative and comprehensive analysis on bacterial communities of two full-scale wastewater treatment plants by second and third-generation sequencing

Bioresource Technology Reports ◽

10.1016/j.biteb.2020.100450 ◽

2020 ◽

Vol 11 ◽

pp. 100450

Author(s):

Bin Ji ◽

Shulian Wang ◽

Dabin Guo ◽

Heliang Pang

Keyword(s):

Wastewater Treatment ◽

Bacterial Communities ◽

Wastewater Treatment Plants ◽

Comprehensive Analysis ◽

Full Scale ◽

Third Generation ◽

Third Generation Sequencing ◽

Generation Sequencing

Download Full-text