Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data

Vasanthan Jayakumar; Yasubumi Sakakibara

doi:10.1093/bib/bbx147

Improved Genome Assembly and Annotation of the Soybean Aphid (Aphis glycines Matsumura)

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400954 ◽

2020 ◽

Vol 10 (3) ◽

pp. 899-906 ◽

Cited By ~ 8

Author(s):

Thomas C. Mathers

Keyword(s):

Genome Assembly ◽

Sequence Data ◽

Parasitoid Wasp ◽

Single Copy ◽

Aphid Species ◽

Soybean Aphid ◽

Aphis Glycines ◽

Data Sets ◽

Conserved Genes ◽

Long Read

Aphids are an economically important insect group due to their role as plant disease vectors. Despite this economic impact, genomic resources have only been generated for a small number of aphid species. The soybean aphid (Aphis glycines Matsumura) was the third aphid species to have its genome sequenced and the first to use long-read sequence data. However, version 1 of the soybean aphid genome assembly has low contiguity (contig N50 = 57 Kb, scaffold N50 = 174 Kb), poor representation of conserved genes and the presence of genomic scaffolds likely derived from parasitoid wasp contamination. Here, I use recently developed methods to reassemble the soybean aphid genome. The version 2 genome assembly is highly contiguous, containing half of the genome in only 40 scaffolds (contig N50 = 2.00 Mb, scaffold N50 = 2.51 Mb) and contains 11% more conserved single-copy arthropod genes than version 1. To demonstrate the utility of this improved assembly, I identify a region of conserved synteny between aphids and Drosophila containing members of the Osiris gene family that was split over multiple scaffolds in the original assembly. The improved genome assembly and annotation of A. glycines demonstrates the benefit of applying new methods to old data sets and will provide a useful resource for future comparative genome analysis of aphids.

Download Full-text

NanoGalaxy: Nanopore long-read sequencing data analysis in Galaxy

GigaScience ◽

10.1093/gigascience/giaa105 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 1

Author(s):

Willem de Koning ◽

Milad Miladi ◽

Saskia Hiltemann ◽

Astrid Heikema ◽

John P Hays ◽

...

Keyword(s):

Genome Assembly ◽

Bioinformatics Analysis ◽

De Novo ◽

Sequence Data ◽

Ease Of Use ◽

Easy Access ◽

Complex Data ◽

Sequencing Data ◽

Long Read ◽

Sequencing Platforms

Abstract Background Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies–based long-read sequencing “nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers. Results The Galaxy platform provides a user-friendly interface to computational command line–based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed “NanoGalaxy" is a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads. Conclusions A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.

Download Full-text

Nanopore Long Read DNA Sequencing of Protozoan Parasites: Hybrid Genome Assembly of Trypanosoma cruzi

Methods in Molecular Biology - Parasite Genomics ◽

10.1007/978-1-0716-1681-9_1 ◽

2021 ◽

pp. 3-13

Author(s):

Florencia Díaz-Viraqué ◽

Gonzalo Greif ◽

Luisa Berná ◽

Carlos Robello

Keyword(s):

Trypanosoma Cruzi ◽

Dna Sequencing ◽

Genome Assembly ◽

Protozoan Parasites ◽

Hybrid Genome ◽

Long Read

Download Full-text

Improved genome assembly and annotation of the soybean aphid (Aphis glycines Matsumura)

10.1101/781617 ◽

2019 ◽

Author(s):

Thomas C. Mathers

Keyword(s):

Genome Assembly ◽

Sequence Data ◽

Parasitoid Wasp ◽

Single Copy ◽

Aphid Species ◽

Soybean Aphid ◽

Aphis Glycines ◽

Data Sets ◽

Conserved Genes ◽

Long Read

AbstractAphids are an economically important insect group due to their role as plant disease vectors. Despite this economic impact, genomic resources have only been generated for a small number of aphid species. The soybean aphid (Aphis glycines Matsumura) was the third aphid species to have its genome sequenced and the first to use long-read sequence data. However, version 1 of the soybean aphid genome assembly has low contiguity (contig N50 = 57 KB, scaffold N50 = 174 KB), poor representation of conserved genes and the presence of genomic scaffolds likely derived from parasitoid wasp contamination. Here, I use recently developed methods to reassemble the soybean aphid genome. The version 2 genome assembly is highly contiguous, containing half of the genome in only 40 scaffolds (contig N50 = 2.00 Mb, scaffold N50 = 2.51 Mb) and contains 11% more conserved single copy arthropod genes than version 1. To demonstrate the utility this improved assembly, I identify a region of conserved synteny between aphids and Drosophila containing members of the Osiris gene family that was split over multiple scaffolds in the original assembly. The improved genome assembly and annotation of A. glycines demonstrates the benefit of applying new methods to old data sets and will provide a useful resource for future comparative genome analysis of aphids.

Download Full-text

Construction of a new chromosome-scale, long-read reference genome assembly of the Syrian hamster, Mesocricetus auratus

10.1101/2021.07.05.451071 ◽

2021 ◽

Author(s):

R. Alan Harris ◽

Muthuswamy Raveendran ◽

Dustin T Lyfoung ◽

Fritz J Sedlazeck ◽

Medhat Mahmoud ◽

...

Keyword(s):

Genome Assembly ◽

Syrian Hamster ◽

Reference Genome ◽

Sequence Data ◽

Mesocricetus Auratus ◽

Protein Coding ◽

Protein Coding Genes ◽

Sequencing Technologies ◽

Long Read ◽

Short Read Sequence

Background The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was published in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and higher continuity. Findings Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) using Oxford Nanopore Technologies long-read sequencing to produce a chromosome-scale assembly. The total length of the new assembly is 2.46 Gbp, similar to the 2.50 Gbp length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits significantly improved continuity with a scaffold N50 that is 6.7 times greater than MesAur1.0. Furthermore, 21,616 protein coding genes and 10,459 noncoding genes were annotated in BCM_Maur_2.0 compared to 20,495 protein coding genes and 4,168 noncoding genes in MesAur1.0. This new assembly also improves the unresolved regions as measured by nucleotide ambiguities, where approximately 17.11% of bases in MesAur1.0 were unresolved compared to BCM_Maur_2.0 in which the number of unresolved bases is reduced to 3.00%. Conclusions Access to a more complete reference genome with improved accuracy and continuity will facilitate more detailed, comprehensive, and meaningful research results for a wide variety of future studies using Syrian hamsters as models.

Download Full-text

Long-Read Genome Assembly of Saccharomyces uvarum Strain CBS 7001

Microbiology Resource Announcements ◽

10.1128/mra.00972-21 ◽

2022 ◽

Author(s):

Jingxuan Chen ◽

David J. Garfinkel ◽

Casey M. Bergman

Keyword(s):

Genome Assembly ◽

Reference Genome ◽

Sequence Data ◽

Sensu Stricto ◽

Whole Genome ◽

Saccharomyces Uvarum ◽

Content Type ◽

Whole Genome Shotgun Sequence ◽

Long Read ◽

Genome Shotgun Sequence

Here, we report a long-read genome assembly for Saccharomyces uvarum strain CBS 7001 based on PacBio whole-genome shotgun sequence data. Our assembly provides an improved reference genome for an important yeast in the Saccharomyces sensu stricto clade.

Download Full-text

Genome Sequence Data of Peronophythora litchii, an Oomycete Pathogen Causing Litchi Downy Blight

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-11-20-0303-a ◽

2021 ◽

Author(s):

Hengyuan Guo ◽

Jiandong Bao ◽

Lianyu Lin ◽

Zhixin Wang ◽

Mingyue Shi ◽

...

Keyword(s):

Genome Assembly ◽

Sequence Data ◽

Protein Coding ◽

Rxlr Effectors ◽

Oxford Nanopore ◽

Long Read ◽

Infection Mechanisms ◽

Oomycete Pathogen ◽

High Quality Genome ◽

Oxford Nanopore Technologies

Peronophythora litchii is an oomycete pathogen that exclusively infects litchi, with infection stages affecting a broad range of tissues. In this study, we obtained a near chromosome-level genome assembly of P. litchii strain ZL2018 from China using Oxford Nanopore Technologies (ONT) long-read sequencing and Illumina short-read sequencing. The genome assembly was 64.15 Mb in size and consisted of 81 contigs with an N50 of 1.43 Mb and a maximum length of 4.74 Mb. Excluding 34.67% of repeat sequences, a total of 14,857 protein-coding genes were identified, among which 14,447 genes were annotated. We also predicted 306 candidate RXLR effectors in the assembly. The high-quality genome assembly and annotation resources reported in this study will provide new insight into the infection mechanisms of P. litchii.

Download Full-text

A reference genome sequence resource for the sugar beet root rot pathogen Aphanomyces cochlioides

10.1101/2021.08.11.456025 ◽

2021 ◽

Author(s):

Jacob Botkin ◽

Ashok K Chanda ◽

Frank N Martin ◽

Cory D Hirsch

Keyword(s):

Sugar Beet ◽

Genome Assembly ◽

Root Rot ◽

De Novo ◽

Sequence Data ◽

Aphanomyces Cochlioides ◽

Beet Root ◽

Long Read ◽

Illumina Sequence ◽

Sugar Beet Root

Aphanomyces cochlioides, the causal agent of damping-off and root rot of sugar beet (Beta vulgaris L.), is a soil-dwelling oomycete responsible for yield losses in all major sugar beet growing regions. Currently, genomic resources for A. cochlioides are limited. Here we report a de novo genome assembly using a combination of long-read MinION (Oxford Nanopore Technologies) and short-read Illumina sequence data for A. cochlioides isolate 103-1, from Breckenridge, MN. The assembled genome was 76.3 Mb, with a contig N50 of 2.6 Mb. The reference assembly was annotated and was composed of 32.1% repetitive elements and 20,274 gene models. This high-quality genome assembly of A. cochlioides will be a valuable resource for understanding genetic variation, virulence factors, and comparative genomics of this important sugar beet pathogen.

Download Full-text

Using MinION nanopore sequencing to generate a de novo eukaryotic draft genome: preliminary physiological and genomic description of the extremophilic red alga Galdieria sulphuraria strain SAG 107.79

10.1101/076208 ◽

2016 ◽

Cited By ~ 5

Author(s):

Amanda M. Davis ◽

Manuela Iovinella ◽

Sally James ◽

Thomas Robshaw ◽

Jennifer R. Dodson ◽

...

Keyword(s):

Dna Sequences ◽

Genome Assembly ◽

De Novo ◽

Sequence Data ◽

Draft Genome ◽

Carbon Sources ◽

Red Alga ◽

Full Genome Sequencing ◽

Galdieria Sulphuraria ◽

Long Read

AbstractWe report here the de novo assembly of a eukaryotic genome using only MinION nanopore DNA sequence data by examining a novel Galdieria sulphuraria genome: strain SAG 107.79. This extremophilic red alga was targeted for full genome sequencing as we found that it could grow on a wide variety of carbon sources and could uptake several precious and rare-earth metals, which places it as an interesting biological target for disparate industrial biotechnological uses. Phylogenetic analysis clearly places this as a species of G. sulphuraria. Here we additionally show that the genome assembly generated via nanopore long read data was of a high quality with regards to low total number of contiguous DNA sequences and long length of assemblies. Collectively, the MinION platform looks to rival other competing approaches for de novo genome acquisition with available informatics tools for assembly. The genome assembly is publically released as NCBI BioProject PRJNA330791. Further work is needed to reduce small insertion-deletion errors, relative to short-read assemblies.

Download Full-text

A new standard for crustacean genomes: the highly contiguous, annotated genome assembly of the clam shrimp Eulimnadia texana reveals HOX gene order and identifies the sex chromosome

10.1101/222869 ◽

2017 ◽

Author(s):

James G. Baldwin-Brown ◽

Stephen C. Weeks ◽

Anthony D. Long

Keyword(s):

Genome Assembly ◽

Hox Genes ◽

Sex Chromosome ◽

Sequence Data ◽

Adult Males ◽

Hox Gene ◽

Clam Shrimp ◽

Long Reads ◽

Long Read ◽

Sex Determination System

AbstractVernal pool clam shrimp (Eulimnadia texana) are a promising model system due to their ease of lab culture, short generation time, modest sized genome, a somewhat rare stable androdioecious sex determination system, and a requirement to reproduce via desiccated diapaused eggs. We generated a highly contiguous genome assembly using 46X of PacBio long read data and 216X of Illumina short reads, and annotated using Illumina RNAseq obtained from adult males or hermaphrodites. 85% of the 120Mb genome is contained in the largest 8 contigs, the smallest of which is 4.6Mb. The assembly contains 98% of transcripts predicted via RNAseq. This assembly is qualitatively different from scaffolded Illumina assemblies: it is produced from long reads that contain sequence data along their entire length, and is thus gap free. The contiguity of the assembly allows us to order the HOX genes within the genome, identifying two loci that contain HOX gene orthologs, and which approximately maintain the order observed in other arthropods. We identified a partial duplication of the Antennapedia gene adjacent to the few genes homologous to the Bithorax locus. Because the sex chromosome of an androdioecious species is of special interest, we used existing allozyme and microsatellite markers to identify the E. texana sex chromosome, and find that it comprises nearly half of the genome of this species. Linkage patterns indicate that recombination is extremely rare and perhaps absent in hermaphrodites, and as a result the location of the sex determining locus will be difficult to refine using recombination mapping.

Download Full-text