scholarly journals When Less is More: "Slicing" Sequencing Data Improves Read Decoding Accuracy and De Novo Assembly Quality

2015 ◽  
Author(s):  
Stefano Lonardi ◽  
Hamid Mirebrahim ◽  
Steve Wanamaker ◽  
Matthew Alpert ◽  
Gianfranco Ciardo ◽  
...  

Since the invention of DNA sequencing in the seventies, computational biologists have had to deal with the problem de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, for the first time we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. Specifically, we explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to BAC clones (in the context of the combinatorial pooling design proposed by our group), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on "divide and conquer": we "slice" a large dataset into smaller samples of optimal size, decode each slice independently, then merge the results. Experimental results on over 15,000 barley BACs and over 4,000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data.

2015 ◽  
Author(s):  
Ivan Sovic ◽  
Kresimir Krizanovic ◽  
Karolj Skala ◽  
Mile Sikic

Recent emergence of nanopore sequencing technology set a challenge for the established assembly methods not optimized for the combination of read lengths and high error rates of nanopore reads. In this work we assessed how existing de novo assembly methods perform on these reads. We benchmarked three non-hybrid (in terms of both error correction and scaffolding) assembly pipelines as well as two hybrid assemblers which use third generation sequencing data to scaffold Illumina assemblies. Tests were performed on several publicly available MinION and Illumina datasets of E. coli K-12, using several sequencing coverages of nanopore data (20x, 30x, 40x and 50x). We attempted to assess the quality of assembly at each of these coverages, to estimate the requirements for closed bacterial genome assembly. Results show that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and perform relatively well on lower nanopore coverages. Furthermore, when coverage is above 40x, all non-hybrid methods correctly assemble the E. coli genome, even a non-hybrid method tailored for Pacific Bioscience reads. While it requires higher coverage compared to a method designed particularly for nanopore reads, its running time is significantly lower.


2021 ◽  
Author(s):  
Michael A. Martin ◽  
Katia Koelle

An early analysis of SARS-CoV-2 deep-sequencing data that combined epidemiological and genetic data to characterize the transmission dynamics of the virus in and beyond Austria concluded that the size of the virus’s transmission bottleneck was large – on the order of 1000 virions. We performed new computational analyses using these deep-sequenced samples from Austria. Our analyses included characterization of transmission bottleneck sizes across a range of variant calling thresholds and examination of patterns of shared low-frequency variants between transmission pairs in cases where de novo genetic variation was present in the recipient. From these analyses, among others, we found that SARS-CoV-2 transmission bottlenecks are instead likely to be very tight, on the order of 1-3 virions. These findings have important consequences for understanding how SARS-CoV-2 evolves between hosts and the processes shaping genetic variation observed at the population level.


2012 ◽  
Vol 5 (1) ◽  
pp. 338
Author(s):  
Sharon Ben-Zvi ◽  
Adi Givati ◽  
Noam Shomron

2017 ◽  
Vol 26 ◽  
pp. 1-11 ◽  
Author(s):  
Molly M. Rathbun ◽  
Jennifer A. McElhoe ◽  
Walther Parson ◽  
Mitchell M. Holland

2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Lena Veselovska ◽  
Sebastien A. Smallwood ◽  
Heba Saadeh ◽  
Kathleen R. Stewart ◽  
Felix Krueger ◽  
...  

Biology ◽  
2012 ◽  
Vol 1 (2) ◽  
pp. 297-310 ◽  
Author(s):  
Xiaozeng Yang ◽  
Lei Li

2014 ◽  
Vol 2014 ◽  
pp. 1-8
Author(s):  
Momchilo Vuyisich ◽  
Ayesha Arefin ◽  
Karen Davenport ◽  
Shihai Feng ◽  
Cheryl Gleasner ◽  
...  

Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg). There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing andde novoassembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing andde novoassembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderiaspp.), which have the highest GC content and are the longest, we also show that the quality of both resequencing andde novoassembly is not decreased when only 10 ng of input genomic DNA is used.


Sign in / Sign up

Export Citation Format

Share Document