When Less is More: "Slicing" Sequencing Data Improves Read Decoding Accuracy and De Novo Assembly Quality

Mapping Intimacies ◽

10.1101/013425 ◽

2015 ◽

Cited By ~ 1

Author(s):

Stefano Lonardi ◽

Hamid Mirebrahim ◽

Steve Wanamaker ◽

Matthew Alpert ◽

Gianfranco Ciardo ◽

...

Keyword(s):

Deep Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Optimal Size ◽

Sequencing Data ◽

Less Is More ◽

Bac Clones ◽

Deep Sequencing Data ◽

First Time

Since the invention of DNA sequencing in the seventies, computational biologists have had to deal with the problem de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, for the first time we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. Specifically, we explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to BAC clones (in the context of the combinatorial pooling design proposed by our group), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on "divide and conquer": we "slice" a large dataset into smaller samples of optimal size, decode each slice independently, then merge the results. Experimental results on over 15,000 barley BACs and over 4,000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data.

Download Full-text

De novo assembly of ultra-deep sequencing data

Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics - BCB '14 ◽

10.1145/2649387.2660799 ◽

2014 ◽

Author(s):

Hamid Mirebrahim ◽

Timothy Close ◽

Stefano Lonardi

Keyword(s):

Deep Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Sequencing Data ◽

Deep Sequencing Data

Download Full-text

Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads

10.1101/030437 ◽

2015 ◽

Cited By ~ 3

Author(s):

Ivan Sovic ◽

Kresimir Krizanovic ◽

Karolj Skala ◽

Mile Sikic

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Hybrid Methods ◽

Bacterial Genome ◽

Error Rates ◽

Sequencing Data ◽

E Coli ◽

Recent Emergence ◽

K 12

Recent emergence of nanopore sequencing technology set a challenge for the established assembly methods not optimized for the combination of read lengths and high error rates of nanopore reads. In this work we assessed how existing de novo assembly methods perform on these reads. We benchmarked three non-hybrid (in terms of both error correction and scaffolding) assembly pipelines as well as two hybrid assemblers which use third generation sequencing data to scaffold Illumina assemblies. Tests were performed on several publicly available MinION and Illumina datasets of E. coli K-12, using several sequencing coverages of nanopore data (20x, 30x, 40x and 50x). We attempted to assess the quality of assembly at each of these coverages, to estimate the requirements for closed bacterial genome assembly. Results show that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and perform relatively well on lower nanopore coverages. Furthermore, when coverage is above 40x, all non-hybrid methods correctly assemble the E. coli genome, even a non-hybrid method tailored for Pacific Bioscience reads. While it requires higher coverage compared to a method designed particularly for nanopore reads, its running time is significantly lower.

Download Full-text

Reanalysis of deep-sequencing data from Austria points towards a small SARS-COV-2 transmission bottleneck on the order of one to three virions

10.1101/2021.02.22.432096 ◽

2021 ◽

Author(s):

Michael A. Martin ◽

Katia Koelle

Keyword(s):

Genetic Variation ◽

Deep Sequencing ◽

De Novo ◽

Low Frequency ◽

Variant Calling ◽

Population Level ◽

Sequencing Data ◽

Deep Sequencing Data ◽

Computational Analyses ◽

Transmission Bottleneck

An early analysis of SARS-CoV-2 deep-sequencing data that combined epidemiological and genetic data to characterize the transmission dynamics of the virus in and beyond Austria concluded that the size of the virus’s transmission bottleneck was large – on the order of 1000 virions. We performed new computational analyses using these deep-sequenced samples from Austria. Our analyses included characterization of transmission bottleneck sizes across a range of variant calling thresholds and examination of patterns of shared low-frequency variants between transmission pairs in cases where de novo genetic variation was present in the recipient. From these analyses, among others, we found that SARS-CoV-2 transmission bottlenecks are instead likely to be very tight, on the order of 1-3 virions. These findings have important consequences for understanding how SARS-CoV-2 evolves between hosts and the processes shaping genetic variation observed at the population level.

Download Full-text

Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data

BMC Bioinformatics ◽

10.1186/s12859-017-1927-y ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 21

Author(s):

Kosai Al-Nakeeb ◽

Thomas Nordahl Petersen ◽

Thomas Sicheritz-Pontén

Keyword(s):

Mitochondrial Dna ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

GenomeGems: evaluation of genetic variability from deep sequencing data

BMC Research Notes ◽

10.1186/1756-0500-5-338 ◽

2012 ◽

Vol 5 (1) ◽

pp. 338

Author(s):

Sharon Ben-Zvi ◽

Adi Givati ◽

Noam Shomron

Keyword(s):

Genetic Variability ◽

Deep Sequencing ◽

Sequencing Data ◽

Deep Sequencing Data

Download Full-text

Considering DNA damage when interpreting mtDNA heteroplasmy in deep sequencing data

Forensic Science International Genetics ◽

10.1016/j.fsigen.2016.09.008 ◽

2017 ◽

Vol 26 ◽

pp. 1-11 ◽

Cited By ~ 14

Author(s):

Molly M. Rathbun ◽

Jennifer A. McElhoe ◽

Walther Parson ◽

Mitchell M. Holland

Keyword(s):

Dna Damage ◽

Deep Sequencing ◽

Sequencing Data ◽

Deep Sequencing Data

Download Full-text

Erratum to: Deep sequencing and de novo assembly of the mouse occyte transcriptome define the contribution of transcription to the DNA methylation landscape

Genome Biology ◽

10.1186/s13059-015-0809-8 ◽

2015 ◽

Vol 16 (1) ◽

Author(s):

Lena Veselovska ◽

Sebastien A. Smallwood ◽

Heba Saadeh ◽

Kathleen R. Stewart ◽

Felix Krueger ◽

...

Keyword(s):

Dna Methylation ◽

Deep Sequencing ◽

De Novo Assembly ◽

De Novo

Download Full-text

Analyzing the microRNA Transcriptome in Plants Using Deep Sequencing Data

Biology ◽

10.3390/biology1020297 ◽

2012 ◽

Vol 1 (2) ◽

pp. 297-310 ◽

Cited By ~ 6

Author(s):

Xiaozeng Yang ◽

Lei Li

Keyword(s):

Deep Sequencing ◽

Sequencing Data ◽

Deep Sequencing Data

Download Full-text

Computational Tools for Identification of microRNAs in Deep Sequencing Data Sets

Computational Biology and Applied Bioinformatics ◽

10.5772/21173 ◽

2011 ◽

Cited By ~ 1

Author(s):

Manuel A. S. Santos ◽

Ana Raquel

Keyword(s):

Deep Sequencing ◽

Data Sets ◽

Sequencing Data ◽

Computational Tools ◽

Deep Sequencing Data

Download Full-text

Facile, High Quality Sequencing of Bacterial Genomes from Small Amounts of DNA

International Journal of Genomics ◽

10.1155/2014/434575 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8

Author(s):

Momchilo Vuyisich ◽

Ayesha Arefin ◽

Karen Davenport ◽

Shihai Feng ◽

Cheryl Gleasner ◽

...

Keyword(s):

Genomic Dna ◽

De Novo ◽

Gc Content ◽

Library Preparation ◽

Sequencing Data ◽

Bacterial Genomes ◽

Dna Amount ◽

High Quality ◽

Preparation Methods

Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg). There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing andde novoassembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing andde novoassembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderiaspp.), which have the highest GC content and are the longest, we also show that the quality of both resequencing andde novoassembly is not decreased when only 10 ng of input genomic DNA is used.

Download Full-text