scholarly journals Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data

2017 ◽  
Vol 20 (3) ◽  
pp. 866-876 ◽  
Author(s):  
Vasanthan Jayakumar ◽  
Yasubumi Sakakibara
2020 ◽  
Vol 10 (3) ◽  
pp. 899-906 ◽  
Author(s):  
Thomas C. Mathers

Aphids are an economically important insect group due to their role as plant disease vectors. Despite this economic impact, genomic resources have only been generated for a small number of aphid species. The soybean aphid (Aphis glycines Matsumura) was the third aphid species to have its genome sequenced and the first to use long-read sequence data. However, version 1 of the soybean aphid genome assembly has low contiguity (contig N50 = 57 Kb, scaffold N50 = 174 Kb), poor representation of conserved genes and the presence of genomic scaffolds likely derived from parasitoid wasp contamination. Here, I use recently developed methods to reassemble the soybean aphid genome. The version 2 genome assembly is highly contiguous, containing half of the genome in only 40 scaffolds (contig N50 = 2.00 Mb, scaffold N50 = 2.51 Mb) and contains 11% more conserved single-copy arthropod genes than version 1. To demonstrate the utility of this improved assembly, I identify a region of conserved synteny between aphids and Drosophila containing members of the Osiris gene family that was split over multiple scaffolds in the original assembly. The improved genome assembly and annotation of A. glycines demonstrates the benefit of applying new methods to old data sets and will provide a useful resource for future comparative genome analysis of aphids.


GigaScience ◽  
2020 ◽  
Vol 9 (10) ◽  
Author(s):  
Willem de Koning ◽  
Milad Miladi ◽  
Saskia Hiltemann ◽  
Astrid Heikema ◽  
John P Hays ◽  
...  

Abstract Background Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies–based long-read sequencing “nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers. Results The Galaxy platform provides a user-friendly interface to computational command line–based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed “NanoGalaxy" is a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads. Conclusions A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.


2019 ◽  
Author(s):  
Thomas C. Mathers

AbstractAphids are an economically important insect group due to their role as plant disease vectors. Despite this economic impact, genomic resources have only been generated for a small number of aphid species. The soybean aphid (Aphis glycines Matsumura) was the third aphid species to have its genome sequenced and the first to use long-read sequence data. However, version 1 of the soybean aphid genome assembly has low contiguity (contig N50 = 57 KB, scaffold N50 = 174 KB), poor representation of conserved genes and the presence of genomic scaffolds likely derived from parasitoid wasp contamination. Here, I use recently developed methods to reassemble the soybean aphid genome. The version 2 genome assembly is highly contiguous, containing half of the genome in only 40 scaffolds (contig N50 = 2.00 Mb, scaffold N50 = 2.51 Mb) and contains 11% more conserved single copy arthropod genes than version 1. To demonstrate the utility this improved assembly, I identify a region of conserved synteny between aphids and Drosophila containing members of the Osiris gene family that was split over multiple scaffolds in the original assembly. The improved genome assembly and annotation of A. glycines demonstrates the benefit of applying new methods to old data sets and will provide a useful resource for future comparative genome analysis of aphids.


2021 ◽  
Author(s):  
R. Alan Harris ◽  
Muthuswamy Raveendran ◽  
Dustin T Lyfoung ◽  
Fritz J Sedlazeck ◽  
Medhat Mahmoud ◽  
...  

Background The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was published in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and higher continuity. Findings Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) using Oxford Nanopore Technologies long-read sequencing to produce a chromosome-scale assembly. The total length of the new assembly is 2.46 Gbp, similar to the 2.50 Gbp length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits significantly improved continuity with a scaffold N50 that is 6.7 times greater than MesAur1.0. Furthermore, 21,616 protein coding genes and 10,459 noncoding genes were annotated in BCM_Maur_2.0 compared to 20,495 protein coding genes and 4,168 noncoding genes in MesAur1.0. This new assembly also improves the unresolved regions as measured by nucleotide ambiguities, where approximately 17.11% of bases in MesAur1.0 were unresolved compared to BCM_Maur_2.0 in which the number of unresolved bases is reduced to 3.00%. Conclusions Access to a more complete reference genome with improved accuracy and continuity will facilitate more detailed, comprehensive, and meaningful research results for a wide variety of future studies using Syrian hamsters as models.


Author(s):  
Jingxuan Chen ◽  
David J. Garfinkel ◽  
Casey M. Bergman

Here, we report a long-read genome assembly for Saccharomyces uvarum strain CBS 7001 based on PacBio whole-genome shotgun sequence data. Our assembly provides an improved reference genome for an important yeast in the Saccharomyces sensu stricto clade.


Author(s):  
Hengyuan Guo ◽  
Jiandong Bao ◽  
Lianyu Lin ◽  
Zhixin Wang ◽  
Mingyue Shi ◽  
...  

Peronophythora litchii is an oomycete pathogen that exclusively infects litchi, with infection stages affecting a broad range of tissues. In this study, we obtained a near chromosome-level genome assembly of P. litchii strain ZL2018 from China using Oxford Nanopore Technologies (ONT) long-read sequencing and Illumina short-read sequencing. The genome assembly was 64.15 Mb in size and consisted of 81 contigs with an N50 of 1.43 Mb and a maximum length of 4.74 Mb. Excluding 34.67% of repeat sequences, a total of 14,857 protein-coding genes were identified, among which 14,447 genes were annotated. We also predicted 306 candidate RXLR effectors in the assembly. The high-quality genome assembly and annotation resources reported in this study will provide new insight into the infection mechanisms of P. litchii.


2021 ◽  
Author(s):  
Jacob Botkin ◽  
Ashok K Chanda ◽  
Frank N Martin ◽  
Cory D Hirsch

Aphanomyces cochlioides, the causal agent of damping-off and root rot of sugar beet (Beta vulgaris L.), is a soil-dwelling oomycete responsible for yield losses in all major sugar beet growing regions. Currently, genomic resources for A. cochlioides are limited. Here we report a de novo genome assembly using a combination of long-read MinION (Oxford Nanopore Technologies) and short-read Illumina sequence data for A. cochlioides isolate 103-1, from Breckenridge, MN. The assembled genome was 76.3 Mb, with a contig N50 of 2.6 Mb. The reference assembly was annotated and was composed of 32.1% repetitive elements and 20,274 gene models. This high-quality genome assembly of A. cochlioides will be a valuable resource for understanding genetic variation, virulence factors, and comparative genomics of this important sugar beet pathogen.


2016 ◽  
Author(s):  
Amanda M. Davis ◽  
Manuela Iovinella ◽  
Sally James ◽  
Thomas Robshaw ◽  
Jennifer R. Dodson ◽  
...  

AbstractWe report here the de novo assembly of a eukaryotic genome using only MinION nanopore DNA sequence data by examining a novel Galdieria sulphuraria genome: strain SAG 107.79. This extremophilic red alga was targeted for full genome sequencing as we found that it could grow on a wide variety of carbon sources and could uptake several precious and rare-earth metals, which places it as an interesting biological target for disparate industrial biotechnological uses. Phylogenetic analysis clearly places this as a species of G. sulphuraria. Here we additionally show that the genome assembly generated via nanopore long read data was of a high quality with regards to low total number of contiguous DNA sequences and long length of assemblies. Collectively, the MinION platform looks to rival other competing approaches for de novo genome acquisition with available informatics tools for assembly. The genome assembly is publically released as NCBI BioProject PRJNA330791. Further work is needed to reduce small insertion-deletion errors, relative to short-read assemblies.


2017 ◽  
Author(s):  
James G. Baldwin-Brown ◽  
Stephen C. Weeks ◽  
Anthony D. Long

AbstractVernal pool clam shrimp (Eulimnadia texana) are a promising model system due to their ease of lab culture, short generation time, modest sized genome, a somewhat rare stable androdioecious sex determination system, and a requirement to reproduce via desiccated diapaused eggs. We generated a highly contiguous genome assembly using 46X of PacBio long read data and 216X of Illumina short reads, and annotated using Illumina RNAseq obtained from adult males or hermaphrodites. 85% of the 120Mb genome is contained in the largest 8 contigs, the smallest of which is 4.6Mb. The assembly contains 98% of transcripts predicted via RNAseq. This assembly is qualitatively different from scaffolded Illumina assemblies: it is produced from long reads that contain sequence data along their entire length, and is thus gap free. The contiguity of the assembly allows us to order the HOX genes within the genome, identifying two loci that contain HOX gene orthologs, and which approximately maintain the order observed in other arthropods. We identified a partial duplication of the Antennapedia gene adjacent to the few genes homologous to the Bithorax locus. Because the sex chromosome of an androdioecious species is of special interest, we used existing allozyme and microsatellite markers to identify the E. texana sex chromosome, and find that it comprises nearly half of the genome of this species. Linkage patterns indicate that recombination is extremely rare and perhaps absent in hermaphrodites, and as a result the location of the sex determining locus will be difficult to refine using recombination mapping.


Sign in / Sign up

Export Citation Format

Share Document