scholarly journals Rapid de novo assembly of the European eel genome from nanopore sequencing reads

2017 ◽  
Author(s):  
Hans J. Jansen ◽  
Michael Liem ◽  
Susanne A. Jong-Raadsen ◽  
Sylvie Dufour ◽  
Finn-Arne Weltzien ◽  
...  

AbstractWe have sequenced the genome of the endangered European eel using the MinION by Oxford Nanopore, and assembled these data using a novel algorithm specifically designed for large eukaryotic genomes. For this 860 Mbp genome, the entire computational process takes two days on a single CPU. The resulting genome assembly significantly improves on a previous draft based on short reads only, both in terms of contiguity (N50 1.2 Mbp) and structural quality. This combination of affordable nanopore sequencing and light-weight assembly promises to make high-quality genomic resources accessible for many non-model plants and animals.

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Hans J. Jansen ◽  
Michael Liem ◽  
Susanne A. Jong-Raadsen ◽  
Sylvie Dufour ◽  
Finn-Arne Weltzien ◽  
...  

2018 ◽  
Author(s):  
Stáphane Deschamps ◽  
Yun Zhang ◽  
Victor Llaca ◽  
Liang Ye ◽  
Gregory May ◽  
...  

The advent of long-read sequencing technologies has greatly facilitated assemblies of large eukaryotic genomes. In this paper, Oxford Nanopore sequences generated on a MinION sequencer were combined with BioNano Genomics Direct Label and Stain (DLS) optical maps to generate a chromosome-scale de novo assembly of the repeat-rich Sorghum bicolor Tx430 genome. The final hybrid assembly consists of 29 scaffolds, encompassing in most cases entire chromosome arms. It has a scaffold N50 value of 33.28Mbps and covers >90% of Sorghum bicolor expected genome length. A sequence accuracy of 99.67% was obtained in unique regions after aligning contigs against Illumina Tx430 data. Alignments showed that 99.4% of the 34,211 public gene models are present in the assembly, including 94.2% mapping end-to-end. Comparisons of the DLS optical maps against the public Sorghum Bicolor v3.0.1 BTx623 genome assembly suggest the presence of substantial genomic rearrangements whose origin remains to be determined.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1750 ◽  
Author(s):  
Tapan Kumar Mondal ◽  
Hukam Chand Rawal ◽  
Kishor Gaikwad ◽  
Tilak Raj Sharma ◽  
Nagendra Kumar Singh

Oryza coarctata plants, collected from Sundarban delta of West Bengal, India, have been used in the present study to generate draft genome sequences, employing the hybrid genome assembly with Illumina reads and third generation Oxford Nanopore sequencing technology. We report for the first time that more than 85.71 % of the genome coverage and the data have been deposited in NCBI SRA, with BioProject ID PRJNA396417.


2016 ◽  
Author(s):  
Robert Vaser ◽  
Ivan Sović ◽  
Niranjan Nagarajan ◽  
Mile Šikić

The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource intensive error correction and consensus generation steps to obtain high quality assemblies. We show that the error correction step can be omitted and high quality consensus sequences can be generated efficiently with a SIMD accelerated, partial order alignment based stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore datasets we show that Racon coupled with Miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.Racon is available open source under the MIT license at https://github.com/isovic/racon.git.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2016 ◽  
Author(s):  
Chengxi Ye ◽  
Zhanshan (Sam) Ma

Motivation.The third generation sequencing (3GS) technology generates long sequences of thousands of bases. However, its current error rates are estimated in the range of 15–40%, significantly higher than those of the prevalent next generation sequencing (NGS) technologies (less than 1%). Fundamental bioinformatics tasks such asde novogenome assembly and variant calling require high-quality sequences that need to be extracted from these long but erroneous 3GS sequences.Results.We describe a versatile and efficient linear complexity consensus algorithm Sparc to facilitatede novogenome assembly. Sparc builds a sparse k-mer graph using a collection of sequences from a targeted genomic region. The heaviest path which approximates the most likely genome sequence is searched through a sparsity-induced reweighted graph as the consensus sequence. Sparc supports using NGS and 3GS data together, which leads to significant improvements in both cost efficiency and computational efficiency. Experiments with Sparc show that our algorithm can efficiently provide high-quality consensus sequences using both PacBio and Oxford Nanopore sequencing technologies. With only 30× PacBio data, Sparc can reach a consensus with error rate <0.5%. With the more challenging Oxford Nanopore data, Sparc can also achieve similar error rate when combined with NGS data. Compared with the existing approaches, Sparc[i] calculates the consensus with higher accuracy, uses 80% less memory and time, approximately. The source code is available for download athttps://github.com/yechengxi/Sparc.


2020 ◽  
Author(s):  
Kaushik Panda ◽  
R. Keith Slotkin

AbstractHigh-quality transcript-based annotations of genes facilitates both genome-wide analyses and detailed single locus research. In contrast, transposable element (TE) annotations are rudimentary, consisting of only information on location and type of TE. When analyzing TEs, their repetitiveness and limited annotation prevents the ability to distinguish between potentially functional expressed elements and degraded copies. To improve genome-wide TE bioinformatics, we performed long-read Oxford Nanopore sequencing of cDNAs from Arabidopsis lines deficient in multiple layers of TE repression. We used these uniquely-mapping transcripts to identify the set of TEs able to generate mRNAs, and created a new transcript-based annotation of TEs that we have layered upon the existing high-quality community standard TAIR10 annotation. The improved annotation enables us to test specific standing hypotheses in the TE field. We demonstrate that inefficient TE splicing does not trigger small RNA production, and the cell more strongly targets DNA methylation to TEs that have the potential to make mRNAs. This work provides a transcript-based TE annotation for Arabidopsis, and serves as a blueprint to reduce the genomic complexity associated with repetitive TEs in any organism.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 23-24
Author(s):  
Kimberly M Davenport ◽  
Derek M Bickhart ◽  
Kim Worley ◽  
Shwetha C Murali ◽  
Noelle Cockett ◽  
...  

Abstract Sheep are an important agricultural species used for both food and fiber in the United States and globally. A high-quality reference genome enhances the ability to discover genetic and biological mechanisms influencing important traits, such as meat and wool quality. The rapid advances in genome assembly algorithms and emergence of increasingly long sequence read length provide the opportunity for an improved de novo assembly of the sheep reference genome. Tissue was collected postmortem from an adult Rambouillet ewe selected by USDA-ARS for the Ovine Functional Annotation of Animal Genomes project. Short-read (55x coverage), long-read PacBio (75x coverage), and Hi-C data from this ewe were retrieved from public databases. We generated an additional 50x coverage of Oxford Nanopore data and assembled the combined long-read data with canu v1.9. The assembled contigs were polished with Nanopolish v0.12.5 and scaffolded using Hi-C data with Salsa v2.2. Gaps were filled with PBsuite v15.8.24 and polished with Nanopolish v0.12.5 followed by removal of duplicate contigs with PurgeDups v1.0.1. Chromosomes were oriented by identifying centromeres and telomeres with RepeatMasker v4.1.1, indicating a need to reverse the orientation of chromosome 11 relative to Oar_rambouillet_v1.0. Final polishing was performed with two rounds of a pipeline which consisted of freebayes v1.3.1 to call variants, Merfin to validate them, and BCFtools to generate the consensus fasta. The ARS-UI_Ramb_v2.0 assembly has improved continuity (contig N50 of 43.19 Mb) with a 19-fold and 38-fold decrease in the number of scaffolds compared with Oar_rambouillet_v1.0 and Oar_v4.0. ARS-UI_Ramb_v2.0 has greater per-base accuracy and fewer insertions and deletions identified from mapped RNA sequence than previous assemblies. This significantly improved reference assembly, public at NCBI GenBank under accession number GCA_016772045, will optimize the functional annotation of the sheep genome and facilitate improved mapping accuracy of genetic variant and expression data for traits relevant the sheep industry.


2019 ◽  
Vol 11 (8) ◽  
pp. 2306-2311
Author(s):  
Juliane Hartke ◽  
Tilman Schell ◽  
Evelien Jongepier ◽  
Hanno Schmidt ◽  
Philipp P Sprenger ◽  
...  

Abstract The success of social insects is largely intertwined with their highly advanced chemical communication system that facilitates recognition and discrimination of species and nest-mates, recruitment, and division of labor. Hydrocarbons, which cover the cuticle of insects, not only serve as waterproofing agents but also constitute a major component of this communication system. Two cryptic Crematogaster species, which share their nest with Camponotus ants, show striking diversity in their cuticular hydrocarbon (CHC) profile. This mutualistic system therefore offers a great opportunity to study the genetic basis of CHC divergence between sister species. As a basis for further genome-wide studies high-quality genomes are needed. Here, we present the annotated draft genome for Crematogaster levior A. By combining the three most commonly used sequencing techniques—Illumina, PacBio, and Oxford Nanopore—we constructed a high-quality de novo ant genome. We show that even low coverage of long reads can add significantly to overall genome contiguity. Annotation of desaturase and elongase genes, which play a role in CHC biosynthesis revealed one of the largest repertoires in ants and a higher number of desaturases in general than in other Hymenoptera. This may provide a mechanistic explanation for the high diversity observed in C. levior CHC profiles.


2015 ◽  
Vol 25 (11) ◽  
pp. 1750-1756 ◽  
Author(s):  
Sara Goodwin ◽  
James Gurtowski ◽  
Scott Ethe-Sayers ◽  
Panchajanya Deshpande ◽  
Michael C. Schatz ◽  
...  

F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1750 ◽  
Author(s):  
Tapan Kumar Mondal ◽  
Hukam Chand Rawal ◽  
Kishor Gaikwad ◽  
Tilak Raj Sharma ◽  
Nagendra Kumar Singh

Oryza coarctata plant, collected from Sundarban delta of West Bengal, India,  has been used in the present study to generate draft genome sequences, employing the hybrid genome assembly with Illumina reads and third generation Oxford Nanopore sequencing technology. We report for the first time the draft genome with the coverage of 85.71 %  and  deposited the raw data in NCBI SRA, with BioProject ID PRJNA396417.


Sign in / Sign up

Export Citation Format

Share Document