scholarly journals De novo assembly of a young Drosophila Y chromosome using single-molecule sequencing and chromatin conformation capture

PLoS Biology ◽  
2018 ◽  
Vol 16 (7) ◽  
pp. e2006348 ◽  
Author(s):  
Shivani Mahajan ◽  
Kevin H.-C. Wei ◽  
Matthew J. Nalley ◽  
Lauren Gibilisco ◽  
Doris Bachtrog
2017 ◽  
Vol 49 (4) ◽  
pp. 643-650 ◽  
Author(s):  
Derek M Bickhart ◽  
Benjamin D Rosen ◽  
Sergey Koren ◽  
Brian L Sayre ◽  
Alex R Hastie ◽  
...  

GigaScience ◽  
2020 ◽  
Vol 9 (3) ◽  
Author(s):  
Benjamin D Rosen ◽  
Derek M Bickhart ◽  
Robert D Schnabel ◽  
Sergey Koren ◽  
Christine G Elsik ◽  
...  

Abstract Background Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10–12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies. Results We present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use. Conclusions We demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species.


2021 ◽  
Vol 17 (6) ◽  
pp. e1009078
Author(s):  
Jingwen Ren ◽  
Mark J. P. Chaisson

It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).


2017 ◽  
Vol 14 (11) ◽  
pp. 1072-1074 ◽  
Author(s):  
Chuan-Le Xiao ◽  
Ying Chen ◽  
Shang-Qian Xie ◽  
Kai-Ning Chen ◽  
Yan Wang ◽  
...  

2012 ◽  
Vol 30 (7) ◽  
pp. 693-700 ◽  
Author(s):  
Sergey Koren ◽  
Michael C Schatz ◽  
Brian P Walenz ◽  
Jeffrey Martin ◽  
Jason T Howard ◽  
...  

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 961
Author(s):  
Kevin McKernan ◽  
Liam Kane ◽  
Yvonne Helbert ◽  
Lei Zhang ◽  
Nathan Houde ◽  
...  

The Psilocybe genus is well known for the synthesis of valuable psychoactive compounds such as Psilocybin, Psilocin, Baeocystin and Aeruginascin. The ubiquity of Psilocybin synthesis in Psilocybe has been attributed to a horizontal gene transfer mechanism of a ~20Kb gene cluster. A recently published highly contiguous reference genome derived from long read single molecule sequencing has underscored interesting variation in this Psilocybin synthesis gene cluster. This reference genome has also enabled the shotgun sequencing of spores from many Psilocybe strains to better catalog the genomic diversity in the Psilocybin synthesis pathway. Here we present the de novo assembly of 81 Psilocybe genomes compared to the P.envy reference genome. Surprisingly, the genomes of Psilocybe galindoi, Psilocybe tampanensis and Psilocybe azurescens lack sequence coverage over the previously described Psilocybin synthesis pathway but do demonstrate amino acid sequence homology to a less contiguous gene cluster and may illuminate the previously proposed evolution of psilocybin synthesis.


2021 ◽  
Author(s):  
Pei Wu ◽  
Chao Liu ◽  
Ou Wang ◽  
Xia Zhao ◽  
Fang Chen ◽  
...  

AbstractIn this paper, we report a pipeline, AsmMix, which is capable of producing both contiguous and high-quality diploid genomes. The pipeline consists of two steps. In the first step, two sets of assemblies are generated: one is based on co-barcoded reads, which are highly accurate and haplotype-resolved but contain many gaps, the other assembly is based on single-molecule sequencing reads, which is contiguous but error-prone. In the second step, those two sets of assemblies are compared and integrated into a haplotype-resolved assembly with fewer errors. We test our pipeline using a dataset of human genome NA24385, perform variant calling from those assemblies and then compare against GIAB Benchmark. We show that AsmMix pipeline could produce highly contiguous, accurate, and haplotype-resolved assemblies. Especially the assembly mixing process could effectively reduce small-scale errors in the long read assembly.


2018 ◽  
Author(s):  
Ou Wang ◽  
Robert Chin ◽  
Xiaofang Cheng ◽  
Michelle Ka Wu ◽  
Qing Mao ◽  
...  

Obtaining accurate sequences from long DNA molecules is very important for genome assembly and other applications. Here we describe single tube long fragment read (stLFR), a technology that enables this a low cost. It is based on adding the same barcode sequence to sub-fragments of the original long DNA molecule (DNA co-barcoding). To achieve this efficiently, stLFR uses the surface of microbeads to create millions of miniaturized barcoding reactions in a single tube. Using a combinatorial process up to 3.6 billion unique barcode sequences were generated on beads, enabling practically non-redundant co-barcoding with 50 million barcodes per sample. Using stLFR, we demonstrate efficient unique co-barcoding of over 8 million 20-300 kb genomic DNA fragments. Analysis of the genome of the human genome NA12878 with stLFR demonstrated high quality variant calling and phasing into contigs up to N50 34 Mb. We also demonstrate detection of complex structural variants and complete diploid de novo assembly of NA12878. These analyses were all performed using single stLFR libraries and their construction did not significantly add to the time or cost of whole genome sequencing (WGS) library preparation. stLFR represents an easily automatable solution that enables high quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.


Sign in / Sign up

Export Citation Format

Share Document