De novo assembly of a young Drosophila Y chromosome using single-molecule sequencing and chromatin conformation capture

Shivani Mahajan; Kevin H.-C. Wei; Matthew J. Nalley; Lauren Gibilisco; Doris Bachtrog

doi:10.1371/journal.pbio.2006348

Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome

Nature Genetics ◽

10.1038/ng.3802 ◽

2017 ◽

Vol 49 (4) ◽

pp. 643-650 ◽

Cited By ~ 225

Author(s):

Derek M Bickhart ◽

Benjamin D Rosen ◽

Sergey Koren ◽

Brian L Sayre ◽

Alex R Hastie ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Chromatin Conformation ◽

Domestic Goat ◽

Single Molecule Sequencing ◽

Reference Assembly ◽

Chromatin Conformation Capture

Download Full-text

De novo assembly of the cattle reference genome with single-molecule sequencing

GigaScience ◽

10.1093/gigascience/giaa021 ◽

2020 ◽

Vol 9 (3) ◽

Cited By ~ 35

Author(s):

Benjamin D Rosen ◽

Derek M Bickhart ◽

Robert D Schnabel ◽

Sergey Koren ◽

Christine G Elsik ◽

...

Keyword(s):

Single Molecule ◽

De Novo Assembly ◽

Reference Genome ◽

De Novo ◽

Bos Taurus ◽

Future Research ◽

Protein Coding ◽

Single Molecule Sequencing ◽

Assembly Accuracy ◽

Genomic Tools

Abstract Background Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10–12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies. Results We present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use. Conclusions We demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species.

Download Full-text

De Novo Assembly of the Streptomyces sp. Strain Mg1 Genome Using PacBio Single-Molecule Sequencing

Genome Announcements ◽

10.1128/genomea.00535-13 ◽

2013 ◽

Vol 1 (4) ◽

Cited By ~ 17

Author(s):

B. C. Hoefler ◽

K. Konganti ◽

P. D. Straight

Keyword(s):

Single Molecule ◽

De Novo Assembly ◽

De Novo ◽

Single Molecule Sequencing ◽

Streptomyces Sp

Download Full-text

lra: A long read aligner for sequences and contigs

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009078 ◽

2021 ◽

Vol 17 (6) ◽

pp. e1009078

Author(s):

Jingwen Ren ◽

Mark J. P. Chaisson

Keyword(s):

Dynamic Programming ◽

Single Molecule ◽

De Novo Assembly ◽

De Novo ◽

Concave Function ◽

Single Molecule Sequencing ◽

Link Type ◽

Oxford Nanopore ◽

Concave Cost ◽

Long Read

It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).

Download Full-text

MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads

Nature Methods ◽

10.1038/nmeth.4432 ◽

2017 ◽

Vol 14 (11) ◽

pp. 1072-1074 ◽

Cited By ~ 114

Author(s):

Chuan-Le Xiao ◽

Ying Chen ◽

Shang-Qian Xie ◽

Kai-Ning Chen ◽

Yan Wang ◽

...

Keyword(s):

Error Correction ◽

Single Molecule ◽

De Novo Assembly ◽

De Novo ◽

Fast Mapping ◽

Single Molecule Sequencing

Download Full-text

Hybrid error correction and de novo assembly of single-molecule sequencing reads

Nature Biotechnology ◽

10.1038/nbt.2280 ◽

2012 ◽

Vol 30 (7) ◽

pp. 693-700 ◽

Cited By ~ 699

Author(s):

Sergey Koren ◽

Michael C Schatz ◽

Brian P Walenz ◽

Jeffrey Martin ◽

Jason T Howard ◽

...

Keyword(s):

Error Correction ◽

Single Molecule ◽

De Novo Assembly ◽

De Novo ◽

Single Molecule Sequencing

Download Full-text

De Novo Assembly of the Quorum-Sensing Pandoraea sp. Strain RB-44 Complete Genome Sequence Using PacBio Single-Molecule Real-Time Sequencing Technology

Genome Announcements ◽

10.1128/genomea.00245-14 ◽

2014 ◽

Vol 2 (2) ◽

Cited By ~ 8

Author(s):

R. Ee ◽

Y.-L. Lim ◽

W.-F. Yin ◽

K.-G. Chan

Keyword(s):

Quorum Sensing ◽

Real Time ◽

Genome Sequence ◽

Single Molecule ◽

Complete Genome Sequence ◽

De Novo Assembly ◽

Complete Genome ◽

De Novo ◽

Sequencing Technology

Download Full-text

A whole genome atlas of 81 Psilocybe genomes as a resource for psilocybin production.

F1000Research ◽

10.12688/f1000research.55301.2 ◽

2021 ◽

Vol 10 ◽

pp. 961

Author(s):

Kevin McKernan ◽

Liam Kane ◽

Yvonne Helbert ◽

Lei Zhang ◽

Nathan Houde ◽

...

Keyword(s):

Gene Cluster ◽

Single Molecule ◽

Reference Genome ◽

De Novo ◽

Genomic Diversity ◽

Sequence Coverage ◽

Single Molecule Sequencing ◽

Contiguous Gene ◽

Long Read ◽

Interesting Variation

The Psilocybe genus is well known for the synthesis of valuable psychoactive compounds such as Psilocybin, Psilocin, Baeocystin and Aeruginascin. The ubiquity of Psilocybin synthesis in Psilocybe has been attributed to a horizontal gene transfer mechanism of a ~20Kb gene cluster. A recently published highly contiguous reference genome derived from long read single molecule sequencing has underscored interesting variation in this Psilocybin synthesis gene cluster. This reference genome has also enabled the shotgun sequencing of spores from many Psilocybe strains to better catalog the genomic diversity in the Psilocybin synthesis pathway. Here we present the de novo assembly of 81 Psilocybe genomes compared to the P.envy reference genome. Surprisingly, the genomes of Psilocybe galindoi, Psilocybe tampanensis and Psilocybe azurescens lack sequence coverage over the previously described Psilocybin synthesis pathway but do demonstrate amino acid sequence homology to a less contiguous gene cluster and may illuminate the previously proposed evolution of psilocybin synthesis.

Download Full-text

AsmMix: A pipeline for high quality diploid de novo assembly

10.1101/2021.01.15.426893 ◽

2021 ◽

Author(s):

Pei Wu ◽

Chao Liu ◽

Ou Wang ◽

Xia Zhao ◽

Fang Chen ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Variant Calling ◽

The Other ◽

Second Step ◽

Small Scale ◽

Mixing Process ◽

High Quality ◽

Single Molecule Sequencing ◽

Long Read

AbstractIn this paper, we report a pipeline, AsmMix, which is capable of producing both contiguous and high-quality diploid genomes. The pipeline consists of two steps. In the first step, two sets of assemblies are generated: one is based on co-barcoded reads, which are highly accurate and haplotype-resolved but contain many gaps, the other assembly is based on single-molecule sequencing reads, which is contiguous but error-prone. In the second step, those two sets of assemblies are compared and integrated into a haplotype-resolved assembly with fewer errors. We test our pipeline using a dataset of human genome NA24385, perform variant calling from those assemblies and then compare against GIAB Benchmark. We show that AsmMix pipeline could produce highly contiguous, accurate, and haplotype-resolved assemblies. Especially the assembly mixing process could effectively reduce small-scale errors in the long read assembly.

Download Full-text

Efficient long single molecule sequencing for cost effective and accurate sequencing, haplotyping, and de novo assembly

10.1101/324392 ◽

2018 ◽

Author(s):

Ou Wang ◽

Robert Chin ◽

Xiaofang Cheng ◽

Michelle Ka Wu ◽

Qing Mao ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Low Cost ◽

Variant Calling ◽

Cost Effective ◽

High Quality ◽

Single Molecule Sequencing ◽

Single Tube ◽

Complex Structural

Obtaining accurate sequences from long DNA molecules is very important for genome assembly and other applications. Here we describe single tube long fragment read (stLFR), a technology that enables this a low cost. It is based on adding the same barcode sequence to sub-fragments of the original long DNA molecule (DNA co-barcoding). To achieve this efficiently, stLFR uses the surface of microbeads to create millions of miniaturized barcoding reactions in a single tube. Using a combinatorial process up to 3.6 billion unique barcode sequences were generated on beads, enabling practically non-redundant co-barcoding with 50 million barcodes per sample. Using stLFR, we demonstrate efficient unique co-barcoding of over 8 million 20-300 kb genomic DNA fragments. Analysis of the genome of the human genome NA12878 with stLFR demonstrated high quality variant calling and phasing into contigs up to N50 34 Mb. We also demonstrate detection of complex structural variants and complete diploid de novo assembly of NA12878. These analyses were all performed using single stLFR libraries and their construction did not significantly add to the time or cost of whole genome sequencing (WGS) library preparation. stLFR represents an easily automatable solution that enables high quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.

Download Full-text