Improved genome assembly and annotation of the soybean aphid (Aphis glycines Matsumura)

Mapping Intimacies ◽

10.1101/781617 ◽

2019 ◽

Author(s):

Thomas C. Mathers

Keyword(s):

Genome Assembly ◽

Sequence Data ◽

Parasitoid Wasp ◽

Single Copy ◽

Aphid Species ◽

Soybean Aphid ◽

Aphis Glycines ◽

Data Sets ◽

Conserved Genes ◽

Long Read

AbstractAphids are an economically important insect group due to their role as plant disease vectors. Despite this economic impact, genomic resources have only been generated for a small number of aphid species. The soybean aphid (Aphis glycines Matsumura) was the third aphid species to have its genome sequenced and the first to use long-read sequence data. However, version 1 of the soybean aphid genome assembly has low contiguity (contig N50 = 57 KB, scaffold N50 = 174 KB), poor representation of conserved genes and the presence of genomic scaffolds likely derived from parasitoid wasp contamination. Here, I use recently developed methods to reassemble the soybean aphid genome. The version 2 genome assembly is highly contiguous, containing half of the genome in only 40 scaffolds (contig N50 = 2.00 Mb, scaffold N50 = 2.51 Mb) and contains 11% more conserved single copy arthropod genes than version 1. To demonstrate the utility this improved assembly, I identify a region of conserved synteny between aphids and Drosophila containing members of the Osiris gene family that was split over multiple scaffolds in the original assembly. The improved genome assembly and annotation of A. glycines demonstrates the benefit of applying new methods to old data sets and will provide a useful resource for future comparative genome analysis of aphids.

Download Full-text

Improved Genome Assembly and Annotation of the Soybean Aphid (Aphis glycines Matsumura)

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400954 ◽

2020 ◽

Vol 10 (3) ◽

pp. 899-906 ◽

Cited By ~ 8

Author(s):

Thomas C. Mathers

Keyword(s):

Genome Assembly ◽

Sequence Data ◽

Parasitoid Wasp ◽

Single Copy ◽

Aphid Species ◽

Soybean Aphid ◽

Aphis Glycines ◽

Data Sets ◽

Conserved Genes ◽

Long Read

Aphids are an economically important insect group due to their role as plant disease vectors. Despite this economic impact, genomic resources have only been generated for a small number of aphid species. The soybean aphid (Aphis glycines Matsumura) was the third aphid species to have its genome sequenced and the first to use long-read sequence data. However, version 1 of the soybean aphid genome assembly has low contiguity (contig N50 = 57 Kb, scaffold N50 = 174 Kb), poor representation of conserved genes and the presence of genomic scaffolds likely derived from parasitoid wasp contamination. Here, I use recently developed methods to reassemble the soybean aphid genome. The version 2 genome assembly is highly contiguous, containing half of the genome in only 40 scaffolds (contig N50 = 2.00 Mb, scaffold N50 = 2.51 Mb) and contains 11% more conserved single-copy arthropod genes than version 1. To demonstrate the utility of this improved assembly, I identify a region of conserved synteny between aphids and Drosophila containing members of the Osiris gene family that was split over multiple scaffolds in the original assembly. The improved genome assembly and annotation of A. glycines demonstrates the benefit of applying new methods to old data sets and will provide a useful resource for future comparative genome analysis of aphids.

Download Full-text

Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data

Briefings in Bioinformatics ◽

10.1093/bib/bbx147 ◽

2017 ◽

Vol 20 (3) ◽

pp. 866-876 ◽

Cited By ~ 30

Author(s):

Vasanthan Jayakumar ◽

Yasubumi Sakakibara

Keyword(s):

Genome Assembly ◽

Comprehensive Evaluation ◽

Sequence Data ◽

Third Generation ◽

Hybrid Genome ◽

Long Read

Download Full-text

A chromosome-anchored genome assembly for Lake Trout (Salvelinus namaycush)

10.22541/au.161792605.50398568/v1 ◽

2021 ◽

Author(s):

Seth Smith ◽

Eric Normandeau ◽

Haig Djambazian ◽

Pubudu Nawarathna ◽

Pierre Berube ◽

...

Keyword(s):

Genome Assembly ◽

Lake Trout ◽

Single Copy ◽

Salvelinus Namaycush ◽

Genomic Research ◽

Total Size ◽

Salmonid Species ◽

Long Read ◽

Potential Applications ◽

Conservation Concern

Here we present an annotated, chromosome-anchored, genome assembly for Lake Trout (Salvelinus namaycush) – a highly diverse salmonid species of notable conservation concern and an excellent model for research on adaptation and speciation. We leveraged Pacific Biosciences long-read sequencing, paired-end Illumina sequencing, proximity ligation (Hi-C), and a previously published linkage map to produce a highly contiguous assembly composed of 7,378 contigs (contig N50 = 1.8 mb) assigned to 4,120 scaffolds (scaffold N50 = 44.975 mb). 84.7% of the genome was assigned to 42 chromosome-sized scaffolds and 93.2% of Benchmarking Universal Single Copy Orthologs were recovered, putting this assembly on par with the best currently available salmonid genomes. Estimates of genome size based on k-mer frequency analysis were highly similar to the total size of the finished genome, suggesting that the entirety of the genome was recovered. A mitome assembly was also produced. Self-vs-self synteny analysis allowed us to identify homeologs resulting from the Salmonid specific autotetraploid event (Ss4R) and alignment with three other salmonid species allowed us to identify homologous chromosomes in other species. We also generated multiple resources useful for future genomic research on Lake Trout including a repeat library and a sex averaged recombination map. A novel RNA sequencing dataset was also used to produce a publicly available set of gene annotations using the National Center for Biotechnology Information Eukaryotic Genome Annotation Pipeline. Potential applications of these resources to population genetics and the conservation of native populations are discussed.

Download Full-text

Highly accurate long-read HiFi sequencing data for five complex genomes

Scientific Data ◽

10.1038/s41597-020-00743-4 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Ting Hon ◽

Kristin Mars ◽

Greg Young ◽

Yu-Chih Tsai ◽

Joseph W. Karalius ◽

...

Keyword(s):

Sequence Data ◽

Genome Structure ◽

Data Sets ◽

Sequencing Data ◽

Complex Samples ◽

Bioinformatic Tools ◽

Long Reads ◽

Sequencing Method ◽

Sample Data ◽

Long Read

AbstractThe PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.

Download Full-text

NanoGalaxy: Nanopore long-read sequencing data analysis in Galaxy

GigaScience ◽

10.1093/gigascience/giaa105 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 1

Author(s):

Willem de Koning ◽

Milad Miladi ◽

Saskia Hiltemann ◽

Astrid Heikema ◽

John P Hays ◽

...

Keyword(s):

Genome Assembly ◽

Bioinformatics Analysis ◽

De Novo ◽

Sequence Data ◽

Ease Of Use ◽

Easy Access ◽

Complex Data ◽

Sequencing Data ◽

Long Read ◽

Sequencing Platforms

Abstract Background Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies–based long-read sequencing “nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers. Results The Galaxy platform provides a user-friendly interface to computational command line–based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed “NanoGalaxy" is a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads. Conclusions A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.

Download Full-text

Acquisition and Transmissibility of U.S. Soybean dwarf virus Isolates by the Soybean Aphid, Aphis glycines

Plant Disease ◽

10.1094/pdis-10-10-0726 ◽

2011 ◽

Vol 95 (8) ◽

pp. 945-950 ◽

Cited By ~ 11

Author(s):

V. D. Damsteegt ◽

A. L. Stone ◽

M. Kuhlmann ◽

F. E. Gildow ◽

L. L. Domier ◽

...

Keyword(s):

The United States ◽

Aphid Species ◽

Soybean Aphid ◽

Aphis Glycines ◽

Dwarf Virus ◽

Yield Losses ◽

Vector Specificity ◽

Soybean Dwarf Virus ◽

Virus Isolates ◽

Aphid Populations

Soybean dwarf virus (SbDV) exists as several distinct strains based on symptomatology, vector specificity, and host range. Originally characterized Japanese isolates of SbDV were specifically transmitted by Aulacorthum solani. More recently, additional Japanese isolates and endemic U.S. isolates have been shown to be transmitted by several different aphid species. The soybean aphid, Aphis glycines, the only aphid that colonizes soybean, has been shown to be a very inefficient vector of some SbDV isolates from Japan and the United States. Transmission experiments have shown that the soybean aphid can transmit certain isolates of SbDV from soybean to soybean and clover species and from clover to clover and soybean with long acquisition and inoculation access periods. Although transmission of SbDV by the soybean aphid is very inefficient, the large soybean aphid populations that develop on soybean may have epidemiological potential to produce serious SbDV-induced yield losses.

Download Full-text

The genome of the endangered Macadamia jansenii displays little diversity but represents an important genetic resource for plant breeding

10.1101/2021.09.08.459545 ◽

2021 ◽

Cited By ~ 1

Author(s):

Priyanka Sharma ◽

Valentine Murigneux ◽

Jasmine Haimovitz ◽

Catherine J. Nock ◽

Wei Tian ◽

...

Keyword(s):

Genome Assembly ◽

Morphological Characteristics ◽

Single Copy ◽

Protein Coding ◽

Core Eudicots ◽

Wide Range ◽

Genes Encoding ◽

Long Read ◽

In The Wild

SummaryMacadamia, a recently domesticated expanding nut crop in the tropical and subtropical regions of the world, is one of the most economically important genera in the diverse and widely adapted Proteaceae family. All four species of Macadamia are rare in the wild with the most recently discovered, M. jansenii, being endangered. The M. jansenii genome has been used as a model for testing sequencing methods using a wide range of long read sequencing techniques. Here we report a chromosome level genome assembly, generated using a combination of Pacific Biosciences sequencing and Hi-C, comprising 14 pseudo-molecules, with a N50 of 58 Mb and a total 758 Mb genome assembly size of which 56% is repetitive. Completeness assessment revealed that the assembly covered 96.9% of the conserved single copy genes. Annotation predicted 31,591 protein coding genes and allowed the characterization of genes encoding biosynthesis of cyanogenic glycosides, fatty acid metabolism and anti-microbial proteins. Re-sequencing of seven other genotypes confirmed low diversity and low heterozygosity within this endangered species. Important morphological characteristics of this species such as small tree size and high kernel recovery suggest that M. jansenii is an important source of these commercial traits for breeding. As a member of a small group of families that are sister to the core eudicots, this high-quality genome also provides a key resource for evolutionary and comparative genomics studies.

Download Full-text

An improved Plasmodium cynomolgi genome assembly reveals an unexpected methyltransferase gene expansion

Wellcome Open Research ◽

10.12688/wellcomeopenres.11864.1 ◽

2017 ◽

Vol 2 ◽

pp. 42 ◽

Cited By ~ 22

Author(s):

Erica M Pasini ◽

Ulrike Böhme ◽

Gavin G. Rutledge ◽

Annemarie Voorberg-Van der Wel ◽

Mandy Sanders ◽

...

Keyword(s):

Genome Sequence ◽

Genome Assembly ◽

Malaria Parasite ◽

Reference Genome ◽

Sequence Data ◽

Single Copy ◽

Chromosome 9 ◽

Reference Genome Sequence ◽

Plasmodium Cynomolgi ◽

Average Gene

Background: Plasmodium cynomolgi, a non-human primate malaria parasite species, has been an important model parasite since its discovery in 1907. Similarities in the biology of P. cynomolgi to the closely related, but less tractable, human malaria parasite P. vivax make it the model parasite of choice for liver biology and vaccine studies pertinent to P. vivax malaria. Molecular and genome-scale studies of P. cynomolgi have relied on the current reference genome sequence, which remains highly fragmented with 1,649 unassigned scaffolds and little representation of the subtelomeres. Methods: Using long-read sequence data (Pacific Biosciences SMRT technology), we assembled and annotated a new reference genome sequence, PcyM, sourced from an Indian rhesus monkey. We compare the newly assembled genome sequence with those of several other Plasmodium species, including a re-annotated P. coatneyi assembly. Results: The new PcyM genome assembly is of significantly higher quality than the existing reference, comprising only 56 pieces, no gaps and an improved average gene length. Detailed manual curation has ensured a comprehensive annotation of the genome with 6,632 genes, nearly 1,000 more than previously attributed to P. cynomolgi. The new assembly also has an improved representation of the subtelomeric regions, which account for nearly 40% of the sequence. Within the subtelomeres, we identified more than 1300 Plasmodium interspersed repeat (pir) genes, as well as a striking expansion of 36 methyltransferase pseudogenes that originated from a single copy on chromosome 9. Conclusions: The manually curated PcyM reference genome sequence is an important new resource for the malaria research community. The high quality and contiguity of the data have enabled the discovery of a novel expansion of methyltransferase in the subtelomeres, and illustrates the new comparative genomics capabilities that are being unlocked by complete reference genomes.

Download Full-text

The Importance of an Invasive Aphid Species in Vectoring a Persistently Transmitted Potato Virus: Aphis glycines Is a Vector of Potato leafroll virus

Plant Disease ◽

10.1094/pdis-92-11-1515 ◽

2008 ◽

Vol 92 (11) ◽

pp. 1515-1523 ◽

Cited By ~ 16

Author(s):

J. A. Davis ◽

E. B. Radcliffe

Keyword(s):

Seed Potato ◽

Potato Virus ◽

Green Peach Aphid ◽

Potato Leafroll Virus ◽

Aphid Species ◽

Soybean Aphid ◽

Aphis Glycines ◽

Electrical Penetration Graph ◽

Phloem Sap ◽

Leafroll Virus

Experiments were undertaken to determine soybean aphid (i) landing rates in potato fields, (ii) population dynamics on potato, (iii) feeding behavior compared with green peach aphid on potato using the electrical penetration graph technique (EPG), (iv) acquisition, retention, and transmission of Potato leafroll virus (PLRV), and (v) if soybean aphid–infested crop borders could increase PLRV spread in seed potato. Soybean aphid (Aphis glycines) landed on potato but failed to establish colonies. EPG showed no significant differences between the aphid species in preprobe, xylem phase, sieve element salivation, and phloem sap ingestion durations on potato. Soybean aphid acquired PLRV 78% of the time, and 75 and 70% of individual aphids retained infectivity after 72 and 144 h, respectively. Soybean aphid transmitted PLRV to susceptible potato with 6 to 9% efficiency. Prior to the invasion of this exotic pest, soybean borders were commonly used in Minnesota and North Dakota to protect seed potato against spread of Potato virus Y. In 2002 and 2004, PLRV incidence was not different in potatoes with soybean borders whether treated with insecticide or not. In 2005, with extreme soybean aphid pressure, potatoes with untreated (no insecticide) borders had significantly greater PLRV spread. This is the first report of soybean aphid transmitting PLRV.

Download Full-text

Construction of a new chromosome-scale, long-read reference genome assembly of the Syrian hamster, Mesocricetus auratus

10.1101/2021.07.05.451071 ◽

2021 ◽

Author(s):

R. Alan Harris ◽

Muthuswamy Raveendran ◽

Dustin T Lyfoung ◽

Fritz J Sedlazeck ◽

Medhat Mahmoud ◽

...

Keyword(s):

Genome Assembly ◽

Syrian Hamster ◽

Reference Genome ◽

Sequence Data ◽

Mesocricetus Auratus ◽

Protein Coding ◽

Protein Coding Genes ◽

Sequencing Technologies ◽

Long Read ◽

Short Read Sequence

Background The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was published in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and higher continuity. Findings Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) using Oxford Nanopore Technologies long-read sequencing to produce a chromosome-scale assembly. The total length of the new assembly is 2.46 Gbp, similar to the 2.50 Gbp length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits significantly improved continuity with a scaffold N50 that is 6.7 times greater than MesAur1.0. Furthermore, 21,616 protein coding genes and 10,459 noncoding genes were annotated in BCM_Maur_2.0 compared to 20,495 protein coding genes and 4,168 noncoding genes in MesAur1.0. This new assembly also improves the unresolved regions as measured by nucleotide ambiguities, where approximately 17.11% of bases in MesAur1.0 were unresolved compared to BCM_Maur_2.0 in which the number of unresolved bases is reduced to 3.00%. Conclusions Access to a more complete reference genome with improved accuracy and continuity will facilitate more detailed, comprehensive, and meaningful research results for a wide variety of future studies using Syrian hamsters as models.

Download Full-text