Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome

Derek M Bickhart; Benjamin D Rosen; Sergey Koren; Brian L Sayre; Alex R Hastie; Saki Chan; Joyce Lee; Ernest T Lam; Ivan Liachko; Shawn T Sullivan; Joshua N Burton; Heather J Huson; John C Nystrom; Christy M Kelley; Jana L Hutchison; Yang Zhou; Jiajie Sun; Alessandra Crisà; F Abel Ponce de León; John C Schwartz; John A Hammond; Geoffrey C Waldbieser; Steven G Schroeder; George E Liu; Maitreya J Dunham; Jay Shendure; Tad S Sonstegard; Adam M Phillippy; Curtis P Van Tassell; Timothy P L Smith

doi:10.1038/ng.3802

De novo assembly of a young Drosophila Y chromosome using single-molecule sequencing and chromatin conformation capture

PLoS Biology ◽

10.1371/journal.pbio.2006348 ◽

2018 ◽

Vol 16 (7) ◽

pp. e2006348 ◽

Cited By ~ 42

Author(s):

Shivani Mahajan ◽

Kevin H.-C. Wei ◽

Matthew J. Nalley ◽

Lauren Gibilisco ◽

Doris Bachtrog

Keyword(s):

Y Chromosome ◽

Single Molecule ◽

De Novo Assembly ◽

De Novo ◽

Chromatin Conformation ◽

Single Molecule Sequencing ◽

Chromatin Conformation Capture

Download Full-text

De novo assembly of the cattle reference genome with single-molecule sequencing

GigaScience ◽

10.1093/gigascience/giaa021 ◽

2020 ◽

Vol 9 (3) ◽

Cited By ~ 35

Author(s):

Benjamin D Rosen ◽

Derek M Bickhart ◽

Robert D Schnabel ◽

Sergey Koren ◽

Christine G Elsik ◽

...

Keyword(s):

Single Molecule ◽

De Novo Assembly ◽

Reference Genome ◽

De Novo ◽

Bos Taurus ◽

Future Research ◽

Protein Coding ◽

Single Molecule Sequencing ◽

Assembly Accuracy ◽

Genomic Tools

Abstract Background Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10–12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies. Results We present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use. Conclusions We demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species.

Download Full-text

A whole genome atlas of 81 Psilocybe genomes as a resource for psilocybin production.

F1000Research ◽

10.12688/f1000research.55301.2 ◽

2021 ◽

Vol 10 ◽

pp. 961

Author(s):

Kevin McKernan ◽

Liam Kane ◽

Yvonne Helbert ◽

Lei Zhang ◽

Nathan Houde ◽

...

Keyword(s):

Gene Cluster ◽

Single Molecule ◽

Reference Genome ◽

De Novo ◽

Genomic Diversity ◽

Sequence Coverage ◽

Single Molecule Sequencing ◽

Contiguous Gene ◽

Long Read ◽

Interesting Variation

The Psilocybe genus is well known for the synthesis of valuable psychoactive compounds such as Psilocybin, Psilocin, Baeocystin and Aeruginascin. The ubiquity of Psilocybin synthesis in Psilocybe has been attributed to a horizontal gene transfer mechanism of a ~20Kb gene cluster. A recently published highly contiguous reference genome derived from long read single molecule sequencing has underscored interesting variation in this Psilocybin synthesis gene cluster. This reference genome has also enabled the shotgun sequencing of spores from many Psilocybe strains to better catalog the genomic diversity in the Psilocybin synthesis pathway. Here we present the de novo assembly of 81 Psilocybe genomes compared to the P.envy reference genome. Surprisingly, the genomes of Psilocybe galindoi, Psilocybe tampanensis and Psilocybe azurescens lack sequence coverage over the previously described Psilocybin synthesis pathway but do demonstrate amino acid sequence homology to a less contiguous gene cluster and may illuminate the previously proposed evolution of psilocybin synthesis.

Download Full-text

AsmMix: A pipeline for high quality diploid de novo assembly

10.1101/2021.01.15.426893 ◽

2021 ◽

Author(s):

Pei Wu ◽

Chao Liu ◽

Ou Wang ◽

Xia Zhao ◽

Fang Chen ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Variant Calling ◽

The Other ◽

Second Step ◽

Small Scale ◽

Mixing Process ◽

High Quality ◽

Single Molecule Sequencing ◽

Long Read

AbstractIn this paper, we report a pipeline, AsmMix, which is capable of producing both contiguous and high-quality diploid genomes. The pipeline consists of two steps. In the first step, two sets of assemblies are generated: one is based on co-barcoded reads, which are highly accurate and haplotype-resolved but contain many gaps, the other assembly is based on single-molecule sequencing reads, which is contiguous but error-prone. In the second step, those two sets of assemblies are compared and integrated into a haplotype-resolved assembly with fewer errors. We test our pipeline using a dataset of human genome NA24385, perform variant calling from those assemblies and then compare against GIAB Benchmark. We show that AsmMix pipeline could produce highly contiguous, accurate, and haplotype-resolved assemblies. Especially the assembly mixing process could effectively reduce small-scale errors in the long read assembly.

Download Full-text

Efficient long single molecule sequencing for cost effective and accurate sequencing, haplotyping, and de novo assembly

10.1101/324392 ◽

2018 ◽

Author(s):

Ou Wang ◽

Robert Chin ◽

Xiaofang Cheng ◽

Michelle Ka Wu ◽

Qing Mao ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Low Cost ◽

Variant Calling ◽

Cost Effective ◽

High Quality ◽

Single Molecule Sequencing ◽

Single Tube ◽

Complex Structural

Obtaining accurate sequences from long DNA molecules is very important for genome assembly and other applications. Here we describe single tube long fragment read (stLFR), a technology that enables this a low cost. It is based on adding the same barcode sequence to sub-fragments of the original long DNA molecule (DNA co-barcoding). To achieve this efficiently, stLFR uses the surface of microbeads to create millions of miniaturized barcoding reactions in a single tube. Using a combinatorial process up to 3.6 billion unique barcode sequences were generated on beads, enabling practically non-redundant co-barcoding with 50 million barcodes per sample. Using stLFR, we demonstrate efficient unique co-barcoding of over 8 million 20-300 kb genomic DNA fragments. Analysis of the genome of the human genome NA12878 with stLFR demonstrated high quality variant calling and phasing into contigs up to N50 34 Mb. We also demonstrate detection of complex structural variants and complete diploid de novo assembly of NA12878. These analyses were all performed using single stLFR libraries and their construction did not significantly add to the time or cost of whole genome sequencing (WGS) library preparation. stLFR represents an easily automatable solution that enables high quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.

Download Full-text

Genetic Findings of Sanger and Nanopore Single-Molecule Sequencing in Patients with X-Linked Hearing Loss and Incomplete Partition Type III

10.21203/rs.3.rs-501574/v1 ◽

2021 ◽

Author(s):

Ying Chen ◽

Jiajun Qiu ◽

Yingwei Wu ◽

Huan Jia ◽

Yi Jiang ◽

...

Keyword(s):

Single Molecule ◽

Hearing Aids ◽

Sanger Sequencing ◽

Clinical Characteristics ◽

De Novo ◽

Genetic Mutations ◽

Single Molecule Sequencing ◽

Long Read ◽

Incomplete Partition ◽

The Mean

Abstract BackgroundPOU3F4 is the causative gene for X-linked deafness-2 (DFNX2), characterized by incomplete partition type III (IP-III) malformation of the inner ear. The aim of this study was to investigate the clinical characteristics and molecular findings by Sanger or Nanopore single-molecule sequencing in IP-III patients. MethodsDiagnosis of IP-III was mainly based on clinical characteristics including radiological and audiological findings. Sanger sequencing of POU3F4 were carried out for these IP-III patients. For those patients with negative results for POU3F4 Sanger sequencing, Nanopore long-read single-molecule sequencing was used to identify the possible pathogenic variants. Hearing intervention outcomes of hearing aids fitting and cochlear implantation were also analyzed. Grouped by different locations of POU3F4 variants, aided PTA was further compared between patients in whom the variants located in the exon region or in the upstream region.ResultsIn total, 18 male patients from 14 unrelated families were diagnosed with IP-III. 10 variants were identified in POU3F4 by Sanger sequencing and 9 of these were novel (p.Val321Gly, p.Gln181*, p.Cys233*, p.Val215Gly, p.Arg282Gln, p.Trp57*, p.Gln316*, c.903_912 delins TGCCA and p.Arg205del). Four different deletions (DELs) that varied from 80 to 486 kb were identified 876-1503 kb upstream of POU3F4 by Nanopore long-read single-molecule sequencing. Of them, de novo genetic mutations occurred in 21.4% (3/14) of patients with POU3F4 mutations. Of these 18 patients, 7 had bilateral hearing aids (HAs) and 10 patients received unilateral cochlear implantation (CI). The mean aided pure tone average (PTA) for HAs and CI users were 41.1±5.18 and 40.3±7.59 dB HL respectively. The mean PTAs for whom the variants located in the exon and upstream regions were 39.6±6.31 vs 43.0±7.10 dB HL, which presented no significant difference (p=0.342).ConclusionsAmong IP-III patients, 28.6% (4/14) had no definite mutation in exon region of POU3F4, however, possible pathogenic deletions were identified in upstream region of this gen. De novo genetic mutations occurred in 21.4% (3/14) of patients with POU3F4 mutation. Hearing intervention outcomes of IP-III patients presented no difference regardless of the variants locations on exon or upstream regions.

Download Full-text

Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing

10.1101/008003 ◽

2014 ◽

Cited By ~ 13

Author(s):

Konstantin Berlin ◽

Sergey Koren ◽

Chen-Shan Chin ◽

James Drake ◽

Jane M Landolin ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Locality Sensitive Hashing ◽

Model Organisms ◽

Smrt Sequencing ◽

High Coverage ◽

Celera Assembler ◽

Single Molecule Sequencing ◽

Long Reads ◽

Long Read

We report reference-grade de novo assemblies of four model organisms and the human genome from single-molecule, real-time (SMRT) sequencing. Long-read SMRT sequencing is routinely used to finish microbial genomes, but the available assembly methods have not scaled well to larger genomes. Here we introduce the MinHash Alignment Process (MHAP) for efficient overlapping of noisy, long reads using probabilistic, locality-sensitive hashing. Together with Celera Assembler, MHAP was used to reconstruct the genomes of Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster, and human from high-coverage SMRT sequencing. The resulting assemblies include fully resolved chromosome arms and close persistent gaps in these important reference genomes, including heterochromatic and telomeric transition sequences. For D. melanogaster, MHAP achieved a 600-fold speedup relative to prior methods and a cloud computing cost of a few hundred dollars. These results demonstrate that single-molecule sequencing alone can produce near-complete eukaryotic genomes at modest cost.

Download Full-text

A chromosome-level genome assembly of the Chinese tupelo Nyssa sinensis

Scientific Data ◽

10.1038/s41597-019-0296-y ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Xuchen Yang ◽

Minghui Kang ◽

Yanting Yang ◽

Haifeng Xiong ◽

Mingcheng Wang ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Chromosome Conformation ◽

Protein Coding ◽

Single Molecule Sequencing ◽

Data Matching ◽

Long Reads ◽

Autumn Leaf ◽

Chromosome Level

AbstractThe deciduous Chinese tupelo (Nyssa sinensis Oliv.) is a popular ornamental tree for the spectacular autumn leaf color. Here, using single-molecule sequencing and chromosome conformation capture data, we report a high-quality, chromosome-level genome assembly of N. sinensis. PacBio long reads were de novo assembled into 647 polished contigs with a total length of 1,001.42 megabases (Mb) and an N50 size of 3.62 Mb, which is in line with genome sizes estimated using flow cytometry and the k-mer analysis. These contigs were further clustered and ordered into 22 pseudo-chromosomes based on Hi-C data, matching the chromosome counts in Nyssa obtained from previous cytological studies. In addition, a total of 664.91 Mb of repetitive elements were identified and a total of 37,884 protein-coding genes were predicted in the genome of N. sinensis. All data were deposited in publicly available repositories, and should be a valuable resource for genomics, evolution, and conservation biology.

Download Full-text

De Novo Assembly of the Streptomyces sp. Strain Mg1 Genome Using PacBio Single-Molecule Sequencing

Genome Announcements ◽

10.1128/genomea.00535-13 ◽

2013 ◽

Vol 1 (4) ◽

Cited By ~ 17

Author(s):

B. C. Hoefler ◽

K. Konganti ◽

P. D. Straight

Keyword(s):

Single Molecule ◽

De Novo Assembly ◽

De Novo ◽

Single Molecule Sequencing ◽

Streptomyces Sp

Download Full-text

lra: A long read aligner for sequences and contigs

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009078 ◽

2021 ◽

Vol 17 (6) ◽

pp. e1009078

Author(s):

Jingwen Ren ◽

Mark J. P. Chaisson

Keyword(s):

Dynamic Programming ◽

Single Molecule ◽

De Novo Assembly ◽

De Novo ◽

Concave Function ◽

Single Molecule Sequencing ◽

Link Type ◽

Oxford Nanopore ◽

Concave Cost ◽

Long Read

It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).

Download Full-text