FinisherSC : A repeat-aware tool for upgrading de-novo assembly using long reads

A long reads-based de-novo assembly of the genome of the Arlee homozygous line reveals chromosomal rearrangements in rainbow trout

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab052 ◽

2021 ◽

Author(s):

Guangtu Gao ◽

Susana Magadan ◽

Geoffrey C Waldbieser ◽

Ramey C Youngblood ◽

Paul A Wheeler ◽

...

Keyword(s):

Rainbow Trout ◽

Chromosome Number ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Sequence Data ◽

Structural Variations ◽

High Coverage ◽

Haploid Chromosome Number ◽

Long Reads

Abstract Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.

Download Full-text

Hybrid error correction approach and de novo assembly for minion sequencing long reads

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2016.7822504 ◽

2016 ◽

Author(s):

Mehdi Kchouk ◽

Mourad Elloumi

Keyword(s):

Error Correction ◽

De Novo Assembly ◽

De Novo ◽

Long Reads

Download Full-text

De novo assembly of the complex genome of Nippostrongylus brasiliensis using MinION long reads

BMC Biology ◽

10.1186/s12915-017-0473-4 ◽

2018 ◽

Vol 16 (1) ◽

Cited By ~ 16

Author(s):

David Eccles ◽

Jodie Chandler ◽

Mali Camberis ◽

Bernard Henrissat ◽

Sergey Koren ◽

...

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Nippostrongylus Brasiliensis ◽

Long Reads ◽

Complex Genome

Download Full-text

De novo whole-genome assembly of Chrysanthemum makinoi, a key wild chrysanthemum

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab358 ◽

2021 ◽

Author(s):

Natascha van Lieshout ◽

Martijn van Kaauwen ◽

Linda Kodde ◽

Paul Arens ◽

Marinus J M Smulders ◽

...

Keyword(s):

Ab Initio ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Its Sequence ◽

Whole Genome ◽

Annotation Pipeline ◽

Long Reads ◽

Oxford Nanopore ◽

The World

Abstract Chrysanthemum is among the top ten cut, potted and perennial garden flowers in the world. Despite this, to date, only the genomes of two wild diploid chrysanthemums have been sequenced and assembled. Here we present the most complete and contiguous chrysanthemum de novo assembly published so far, as well as a corresponding ab initio annotation. The cultivated hexaploid varieties are thought to originate from a hybrid of wild chrysanthemums, among which the diploid Chrysanthemum makinoi has been mentioned. Using a combination of Oxford Nanopore long reads, Pacific Biosciences long reads, Illumina short reads, Dovetail sequences and a genetic map, we assembled 3.1 Gb of its sequence into 9 pseudochromosomes, with an N50 of 330 Mb and BUSCO complete score of 92.1%. Our ab initio annotation pipeline predicted 95 074 genes and marked 80.0% of the genome as repetitive. This genome assembly of C. makinoi provides an important step forward in understanding the chrysanthemum genome, evolution and history.

Download Full-text

Whole-Genome Sequencing and De Novo Assembly of Malassezia pachydermatis Isolated from the Ear Canal of a Dog with Otitis

Microbiology Resource Announcements ◽

10.1128/mra.00205-21 ◽

2021 ◽

Vol 10 (21) ◽

Author(s):

S. D’Andreano ◽

J. Viñes ◽

O. Francino

Keyword(s):

Genome Sequencing ◽

Genome Sequence ◽

De Novo Assembly ◽

De Novo ◽

Whole Genome ◽

Ear Canal ◽

Malassezia Pachydermatis ◽

Content Type ◽

Long Reads ◽

Genome Assemblies

We have de novo assembled the genome sequence of Malassezia pachydermatis isolated from a canine otitis sample with Nanopore-only long reads. With 99× coverage and 8.23 Mbp, the genome sequence was assembled in 10 contigs, with 6 of them corresponding to chromosomes, improving the scaffolding of previous genome assemblies for the species.

Download Full-text

Illumina TruSeq Synthetic Long-Reads Empower De Novo Assembly and Resolve Complex, Highly-Repetitive Transposable Elements

PLoS ONE ◽

10.1371/journal.pone.0106689 ◽

2014 ◽

Vol 9 (9) ◽

pp. e106689 ◽

Cited By ~ 137

Author(s):

Rajiv C. McCoy ◽

Ryan W. Taylor ◽

Timothy A. Blauwkamp ◽

Joanna L. Kelley ◽

Michael Kertesz ◽

...

Keyword(s):

Transposable Elements ◽

De Novo Assembly ◽

De Novo ◽

Long Reads

Download Full-text

A long reads-based de-novo assembly of the genome of the Arlee homozygous line reveals structural genome variation in rainbow trout

10.1101/2020.12.28.424581 ◽

2020 ◽

Author(s):

Guangtu Gao ◽

Susana Magadan ◽

Geoffrey C. Waldbieser ◽

Ramey C. Youngblood ◽

Paul A. Wheeler ◽

...

Keyword(s):

Rainbow Trout ◽

Chromosome Number ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Sequence Data ◽

Haploid Chromosome Number ◽

Long Reads ◽

Homozygous Line ◽

Igh Genes

AbstractCurrently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2N=64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.Article SummaryA de-novo genome assembly was generated for the Arlee homozygous line of rainbow trout to enable identification and characterization of genome variants towards developing a rainbow trout pan-genome reference. The new assembly was generated using the PacBio sequencing technology and scaffolding with Hi-C contact maps and Bionano optical mapping. A contiguous genome assembly was obtained, with the contig and scaffold N50 over 15.6 Mb and 39 Mb, respectively, and 95% of the assembly in chromosome sequences. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes.

Download Full-text

phasebook: haplotype-aware de novo assembly of diploid genomes from long reads

10.1101/2021.07.02.450883 ◽

2021 ◽

Author(s):

Xiao Luo ◽

Xiongbin Kang ◽

Alexander Schoenhuth

Keyword(s):

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Haplotype Diversity ◽

Read Length ◽

Diploid Genome ◽

Sequencing Technologies ◽

Novel Approach ◽

Long Reads ◽

Long Read

Haplotype-aware diploid genome assembly is crucial in genomics, precision medicine, and many other disciplines. Long-read sequencing technologies have greatly improved genome assembly thanks to advantages of read length. However, current long-read assemblers usually introduce disturbing biases or fail to capture the haplotype diversity of the diploid genome. Here, we present phasebook, a novel approach for reconstructing the haplotypes of diploid genomes from long reads de novo. Benchmarking experiments demonstrate that our method outperforms other approaches in terms of haplotype coverage by large margins, while preserving competitive performance or even achieving advantages in terms of all other aspects relevant for genome assembly.

Download Full-text

Efficient hybrid de novo assembly of human genomes with WENGAN

Nature Biotechnology ◽

10.1038/s41587-020-00747-w ◽

2020 ◽

Author(s):

Alex Di Genova ◽

Elena Buena-Atienza ◽

Stephan Ossowski ◽

Marie-France Sagot

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Consensus Sequence ◽

Computational Cost ◽

Sequencing Data ◽

Human Genomes ◽

Long Reads ◽

High Gene ◽

Computational Resources ◽

Genome Assemblies

AbstractGenerating accurate genome assemblies of large, repeat-rich human genomes has proved difficult using only long, error-prone reads, and most human genomes assembled from long reads add accurate short reads to polish the consensus sequence. Here we report an algorithm for hybrid assembly, WENGAN, that provides very high quality at low computational cost. We demonstrate de novo assembly of four human genomes using a combination of sequencing data generated on ONT PromethION, PacBio Sequel, Illumina and MGI technology. WENGAN implements efficient algorithms to improve assembly contiguity as well as consensus quality. The resulting genome assemblies have high contiguity (contig NG50: 17.24–80.64 Mb), few assembly errors (contig NGA50: 11.8–59.59 Mb), good consensus quality (QV: 27.84–42.88) and high gene completeness (BUSCO complete: 94.6–95.2%), while consuming low computational resources (CPU hours: 187–1,200). In particular, the WENGAN assembly of the haploid CHM13 sample achieved a contig NG50 of 80.64 Mb (NGA50: 59.59 Mb), which surpasses the contiguity of the current human reference genome (GRCh38 contig NG50: 57.88 Mb).

Download Full-text

BIGMAC: Breaking Inaccurate Genomes and Merging Assembled Contigs for long read metagenomic assembly

10.1101/045690 ◽

2016 ◽

Author(s):

Ka-Kit Lam ◽

Richard Hall ◽

Alicia Clum ◽

Satish Rao

Keyword(s):

Quality Improvement ◽

De Novo Assembly ◽

De Novo ◽

Post Processing ◽

Assembly Quality ◽

Alternative Perspective ◽

Long Reads ◽

Processing Step ◽

Long Read ◽

Metagenomic Assembly

AbstractThe problem of de-novo assembly for metagenomes using only long reads is gaining attention. We study whether post-processing metagenomic assemblies with the original input long reads can result in quality improvement. Previous approaches have focused on pre-processing reads and optimizing assemblers. BIGMAC takes an alternative perspective to focus on the post-processing step. Using both the assembled contigs and original long reads as input, BIGMAC first breaks the contigs at potentially mis-assembled locations and subsequently scaffolds contigs. Our experiments on metagenomes assembled from long reads show that BIGMAC can improve assembly quality by reducing the number of mis-assemblies while maintaining/increasing N50 and N75. The software is available at https://github.com/kakitone/BIGMAC

Download Full-text