Hybrid Genome Assembly of a Neotropical Mutualistic Ant

Juliane Hartke; Tilman Schell; Evelien Jongepier; Hanno Schmidt; Philipp P Sprenger; Juraj Paule; Erich Bornberg-Bauer; Thomas Schmitt; Florian Menzel; Markus Pfenninger; Barbara Feldmeyer

doi:10.1093/gbe/evz159

Hybrid Genome Assembly of a Neotropical Mutualistic Ant

Genome Biology and Evolution ◽

10.1093/gbe/evz159 ◽

2019 ◽

Vol 11 (8) ◽

pp. 2306-2311

Author(s):

Juliane Hartke ◽

Tilman Schell ◽

Evelien Jongepier ◽

Hanno Schmidt ◽

Philipp P Sprenger ◽

...

Keyword(s):

Communication System ◽

De Novo ◽

Draft Genome ◽

Cuticular Hydrocarbon ◽

Mechanistic Explanation ◽

High Quality ◽

Great Opportunity ◽

Long Reads ◽

Oxford Nanopore ◽

Hybrid Genome

Abstract The success of social insects is largely intertwined with their highly advanced chemical communication system that facilitates recognition and discrimination of species and nest-mates, recruitment, and division of labor. Hydrocarbons, which cover the cuticle of insects, not only serve as waterproofing agents but also constitute a major component of this communication system. Two cryptic Crematogaster species, which share their nest with Camponotus ants, show striking diversity in their cuticular hydrocarbon (CHC) profile. This mutualistic system therefore offers a great opportunity to study the genetic basis of CHC divergence between sister species. As a basis for further genome-wide studies high-quality genomes are needed. Here, we present the annotated draft genome for Crematogaster levior A. By combining the three most commonly used sequencing techniques—Illumina, PacBio, and Oxford Nanopore—we constructed a high-quality de novo ant genome. We show that even low coverage of long reads can add significantly to overall genome contiguity. Annotation of desaturase and elongase genes, which play a role in CHC biosynthesis revealed one of the largest repertoires in ants and a higher number of desaturases in general than in other Hymenoptera. This may provide a mechanistic explanation for the high diversity observed in C. levior CHC profiles.

Download Full-text

First de novo draft genome sequence of Oryza coarctata, the only halophytic species in the genus Oryza

F1000Research ◽

10.12688/f1000research.12414.1 ◽

2017 ◽

Vol 6 ◽

pp. 1750 ◽

Cited By ~ 8

Author(s):

Tapan Kumar Mondal ◽

Hukam Chand Rawal ◽

Kishor Gaikwad ◽

Tilak Raj Sharma ◽

Nagendra Kumar Singh

Keyword(s):

West Bengal ◽

De Novo ◽

Draft Genome ◽

Nanopore Sequencing ◽

Sequencing Technology ◽

Genome Sequences ◽

Oxford Nanopore ◽

Hybrid Genome ◽

Genus Oryza ◽

First Time

Oryza coarctata plants, collected from Sundarban delta of West Bengal, India, have been used in the present study to generate draft genome sequences, employing the hybrid genome assembly with Illumina reads and third generation Oxford Nanopore sequencing technology. We report for the first time that more than 85.71 % of the genome coverage and the data have been deposited in NCBI SRA, with BioProject ID PRJNA396417.

Download Full-text

Fast and accurate de novo genome assembly from long uncorrected reads

10.1101/068122 ◽

2016 ◽

Cited By ~ 8

Author(s):

Robert Vaser ◽

Ivan Sović ◽

Niranjan Nagarajan ◽

Mile Šikić

Keyword(s):

Error Correction ◽

De Novo ◽

High Quality ◽

De Novo Genome Assembly ◽

Consensus Sequences ◽

Long Reads ◽

Oxford Nanopore ◽

Order Of Magnitude ◽

Correction Step ◽

Consensus Module

The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource intensive error correction and consensus generation steps to obtain high quality assemblies. We show that the error correction step can be omitted and high quality consensus sequences can be generated efficiently with a SIMD accelerated, partial order alignment based stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore datasets we show that Racon coupled with Miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.Racon is available open source under the MIT license at https://github.com/isovic/racon.git.

Download Full-text

LongStitch: High-quality genome assembly correction and scaffolding using long reads

10.1101/2021.06.17.448848 ◽

2021 ◽

Author(s):

Lauren Coombe ◽

Janet X Li ◽

Theodora Lo ◽

Johnathan Wong ◽

Vladimir Nikolic ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Draft Genome ◽

Model Organisms ◽

High Quality ◽

De Novo Genome Assembly ◽

Long Reads ◽

Long Read ◽

Genomic Regions ◽

Genome Assemblies

Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 2.0-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently runs in under five hours using less than 23GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.

Download Full-text

First de novo draft genome sequence of Oryza coarctata, the only halophytic species in the genus Oryza

F1000Research ◽

10.12688/f1000research.12414.2 ◽

2017 ◽

Vol 6 ◽

pp. 1750 ◽

Cited By ~ 5

Author(s):

Tapan Kumar Mondal ◽

Hukam Chand Rawal ◽

Kishor Gaikwad ◽

Tilak Raj Sharma ◽

Nagendra Kumar Singh

Keyword(s):

West Bengal ◽

De Novo ◽

Draft Genome ◽

Nanopore Sequencing ◽

Sequencing Technology ◽

Genome Sequences ◽

Oxford Nanopore ◽

Hybrid Genome ◽

Genus Oryza ◽

First Time

Oryza coarctata plant, collected from Sundarban delta of West Bengal, India, has been used in the present study to generate draft genome sequences, employing the hybrid genome assembly with Illumina reads and third generation Oxford Nanopore sequencing technology. We report for the first time the draft genome with the coverage of 85.71 % and deposited the raw data in NCBI SRA, with BioProject ID PRJNA396417.

Download Full-text

Draft genome sequence of the pulse crop blackgram [Vigna mungo (L.) Hepper] reveals potential R-genes

10.1101/2020.06.21.163923 ◽

2020 ◽

Author(s):

J Souframanien ◽

Avi Raizada ◽

Punniyamoorthy Dhanasekar ◽

Penna Suprasanna

Keyword(s):

Genome Sequence ◽

De Novo ◽

Draft Genome ◽

Vigna Mungo ◽

Draft Genome Sequence ◽

Sequence Length ◽

Legume Crop ◽

Oxford Nanopore ◽

Hybrid Genome ◽

First Time

AbstractBlackgram [Vigna mungo (L.) Hepper] (2n = 2x = 22), an important Asiatic legume crop, is a major source of dietary protein for the predominantly vegetarian population. Here we construct a draft genome sequence of blackgram, for the first time, by employing hybrid genome assembly with Illumina reads and third generation Oxford Nanopore sequencing technology. The final de novo whole genome of blackgram is ~ 475 Mb (82 % of the genome) and has maximum scaffold length of 6.3 Mb with scaffold N50 of1.42 Mb. Genome analysis identified 18655 genes with mean coding sequence length of 970bp. Around 96.7 % of predicted genes were annotated. Nearly half of the assembled sequence is composed of repetitive elements with retrotransposons as major (47.3% of genome) transposable elements, whereas, DNA transposons made up only 2.29% of the genome. A total of 166014 SSRs, including 65180 compound SSRs, were identified and primer pairs for 34816 SSRs were designed. Out of the 18665 proteins, 678 proteins showed presence of R-gene related domains. KIN class was found in majority of the proteins (372) followed by RLK (79) and N (79). The genome sequence of blackgram will facilitate identification of agronomically important genes and accelerate the genetic improvement of blackgram.

Download Full-text

Draft genome sequence of the pulse crop blackgram [Vigna mungo (L.) Hepper] reveals potential R-genes

Scientific Reports ◽

10.1038/s41598-021-90683-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Souframanien Jegadeesan ◽

Avi Raizada ◽

Punniyamoorthy Dhanasekar ◽

Penna Suprasanna

Keyword(s):

Genome Sequence ◽

De Novo ◽

Draft Genome ◽

Vigna Mungo ◽

Draft Genome Sequence ◽

Sequence Length ◽

Legume Crop ◽

Oxford Nanopore ◽

Hybrid Genome ◽

First Time

AbstractBlackgram [Vigna mungo (L.) Hepper] (2n = 2x = 22), an important Asiatic legume crop, is a major source of dietary protein for the predominantly vegetarian population. Here we construct a draft genome sequence of blackgram, for the first time, by employing hybrid genome assembly with Illumina reads and third generation Oxford Nanopore sequencing technology. The final de novo whole genome of blackgram is ~ 475 Mb (82% of the genome) and has maximum scaffold length of 6.3 Mb with scaffold N50 of 1.42 Mb. Genome analysis identified 42,115 genes with mean coding sequence length of 1131 bp. Around 80.6% of predicted genes were annotated. Nearly half of the assembled sequence is composed of repetitive elements with retrotransposons as major (47.3% of genome) transposable elements, whereas, DNA transposons made up only 2.29% of the genome. A total of 166,014 SSRs, including 65,180 compound SSRs, were identified and primer pairs for 34,816 SSRs were designed. Out of the 33,959 proteins, 1659 proteins showed presence of R-gene related domains. KIN class was found in majority of the proteins (905) followed by RLK (239) and RLP (188). The genome sequence of blackgram will facilitate identification of agronomically important genes and accelerate the genetic improvement of blackgram.

Download Full-text

LongStitch: high-quality genome assembly correction and scaffolding using long reads

BMC Bioinformatics ◽

10.1186/s12859-021-04451-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Lauren Coombe ◽

Janet X. Li ◽

Theodora Lo ◽

Johnathan Wong ◽

Vladimir Nikolic ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Draft Genome ◽

Model Organisms ◽

High Quality ◽

De Novo Genome Assembly ◽

Long Reads ◽

Long Read ◽

Genomic Regions ◽

Genome Assemblies

Abstract Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.

Download Full-text

SLR-superscaffolder: a de novo scaffolding tool for synthetic long reads using a top-to-bottom scheme

BMC Bioinformatics ◽

10.1186/s12859-021-04081-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Lidong Guo ◽

Mengyang Xu ◽

Wenchao Wang ◽

Shengqiang Gu ◽

Xia Zhao ◽

...

Keyword(s):

High Efficiency ◽

De Novo ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Draft Assembly ◽

Screening Algorithm ◽

Long Reads ◽

Hybrid Genome ◽

Genomics Research ◽

Negative Effect

Abstract Background Synthetic long reads (SLR) with long-range co-barcoding information are now widely applied in genomics research. Although several tools have been developed for each specific SLR technique, a robust standalone scaffolder with high efficiency is warranted for hybrid genome assembly. Results In this work, we developed a standalone scaffolding tool, SLR-superscaffolder, to link together contigs in draft assemblies using co-barcoding and paired-end read information. Our top-to-bottom scheme first builds a global scaffold graph based on Jaccard Similarity to determine the order and orientation of contigs, and then locally improves the scaffolds with the aid of paired-end information. We also exploited a screening algorithm to reduce the negative effect of misassembled contigs in the input assembly. We applied SLR-superscaffolder to a human single tube long fragment read sequencing dataset and increased the scaffold NG50 of its corresponding draft assembly 1349 fold. Moreover, benchmarking on different input contigs showed that this approach overall outperformed existing SLR scaffolders, providing longer contiguity and fewer misassemblies, especially for short contigs assembled by next-generation sequencing data. The open-source code of SLR-superscaffolder is available at https://github.com/BGI-Qingdao/SLR-superscaffolder. Conclusions SLR-superscaffolder can dramatically improve the contiguity of a draft assembly by integrating a hybrid assembly strategy.

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text

Chromosome-scale assembly of the Sparassis latifolia genome obtained using long-read and Hi-C sequencing

10.1101/2021.01.08.426014 ◽

2021 ◽

Author(s):

Chi yang ◽

Lu Ma ◽

Donglai Xiao ◽

Xiaoyu Liu ◽

Xiaoling Jiang ◽

...

Keyword(s):

Repetitive Sequences ◽

Draft Genome ◽

Edible Mushroom ◽

Illumina Hiseq ◽

Protein Coding ◽

Long Reads ◽

Oxford Nanopore ◽

Genome Features ◽

Long Read ◽

Genomic Studies

Sparassis latifolia is a valuable edible mushroom cultivated in China. In 2018, our research group reported an incomplete and low quality genome of S. latifolia was obtained by Illumina HiSeq 2500 sequencing. These limitations in the available genome have constrained genetic and genomic studies in this mushroom resource. Herein, an updated draft genome sequence of S. latifolia was generated by Oxford Nanopore sequencing and the Hi-C technique. A total of 8.24 Gb of Oxford Nanopore long reads representing ~198.08X coverage of the S. latifolia genome were generated. Subsequently, a high-quality genome of 41.41 Mb, with scaffold and contig N50 sizes of 3.31 Mb and 1.51 Mb, respectively, was assembled. Hi-C scaffolding of the genome resulted in 12 pseudochromosomes containing 93.56% of the bases in the assembled genome. Genome annotation further revealed that 17.47% of the genome was composed of repetitive sequences. In addition, 13,103 protein-coding genes were predicted, among which 98.72% were functionally annotated. BUSCO assay results further revealed that there were 92.07% complete BUSCOs. The improved chromosome-scale assembly and genome features described here will aid further molecular elucidation of various traits, breeding of S. latifolia, and evolutionary studies with related taxa.

Download Full-text