Do Read Errors Matter for Genome Assembly?

Mapping Intimacies ◽

10.1101/014399 ◽

2015 ◽

Cited By ~ 5

Author(s):

Ilan Shomorony ◽

Thomas Courtade ◽

David Tse

Keyword(s):

Dna Sequencing ◽

High Throughput ◽

Error Rate ◽

Genome Assembly ◽

Error Rates ◽

Read Length ◽

Basic Question ◽

Sequencing Technologies ◽

Long Reads ◽

High Throughput Dna Sequencing

AbstractWhile most current high-throughput DNA sequencing technologies generate short reads with low error rates, emerging sequencing technologies generate long reads with high error rates. A basic question of interest is the tradeoff between read length and error rate in terms of the information needed for the perfect assembly of the genome. Using an adversarial erasure error model, we make progress on this problem by establishing a critical read length, as a function of the genome and the error rate, above which perfect assembly is guaranteed. For several real genomes, including those from the GAGE dataset, we verify that this critical read length is not significantly greater than the read length required for perfect assembly from reads without errors.

Download Full-text

Genetics of non-syndromic childhood obesity and the use of high-throughput DNA sequencing technologies

Journal of Diabetes and its Complications ◽

10.1016/j.jdiacomp.2017.04.026 ◽

2017 ◽

Vol 31 (10) ◽

pp. 1549-1561 ◽

Cited By ~ 11

Author(s):

Ana Carolina Proença da Fonseca ◽

Claudio Mastronardi ◽

Angad Johar ◽

Mauricio Arcos-Burgos ◽

Gilberto Paz-Filho

Keyword(s):

Childhood Obesity ◽

Dna Sequencing ◽

High Throughput ◽

Sequencing Technologies ◽

High Throughput Dna Sequencing

Download Full-text

Finding Long Tandem Repeats In Long Noisy Reads

Bioinformatics ◽

10.1093/bioinformatics/btaa865 ◽

2020 ◽

Author(s):

Shinichi Morishita ◽

Kazuki Ichikawa ◽

Gene Myers

Keyword(s):

Tandem Repeat ◽

Error Rate ◽

Tandem Repeats ◽

Repeat Unit ◽

Error Rates ◽

De Bruijn Graph ◽

Frequency Distributions ◽

Sequencing Technologies ◽

Long Reads ◽

Repeat Expansions

Abstract Motivation Long tandem repeat expansions of more than 1000 nt have been suggested to be associated with diseases, but remain largely unexplored in individual human genomes because read lengths have been too short. However, new long-read sequencing technologies can produce single reads of 10,000 nt or more that can span such repeat expansions, although these long reads have high error rates, of 10%-20%, which complicates the detection of repetitive elements. Moreover, most traditional algorithms for finding tandem repeats are designed to find short tandem repeats (< 1000 nt) and cannot effectively handle the high error rate of long reads in a reasonable amount of time. Results Here, we report an efficient algorithm for solving this problem that takes advantage of the length of the repeat. Namely, a long tandem repeat has hundreds or thousands of approximate copies of the repeated unit, so despite the error rate, many short k-mers will be error-free in many copies of the unit. We exploited this characteristic to develop a method for first estimating regions that could contain a tandem repeat, by analyzing the k-mer frequency distributions of fixed-size windows across the target read, followed by an algorithm that assembles the k-mers of a putative region into the consensus repeat unit by greedily traversing a de Bruijn graph. Experimental results indicated that the proposed algorithm largely outperformed Tandem Repeats Finder (TRF), a widely used program for finding tandem repeats, in terms of sensitivity. Software availability https://github.com/morisUtokyo/mTR

Download Full-text

Amplicon Sequencing in the Era of Highly-Accurate Long Reads

ARPHA Conference Abstracts ◽

10.3897/aca.4.e65405 ◽

2021 ◽

Vol 4 ◽

Author(s):

Benjamin Callahan

Keyword(s):

Dna Sequencing ◽

New Technology ◽

Amplicon Sequencing ◽

Error Rates ◽

Important Advance ◽

Sequencing Technologies ◽

Long Reads ◽

Long Read

An important advance in DNA sequencing has been the development of long-read sequencing technologies that produce sequencing reads of tens to hundreds of kilobases in length. However, these technologies typically have high (~8%) per-base error rates. Recently, an effectively new technology I call highly-accurate long-read sequencing has been developed, that allows for the generation of multi-kilobase reads with extremely high per-base accuracies (>99.9%). I will present and evaluate two such technologies, PacBio HiFi and LoopSeq SLR sequencing, and discuss potential metabarcoding applications of highly-accurate long-read amplicon sequencing in general.

Download Full-text

phasebook: haplotype-aware de novo assembly of diploid genomes from long reads

10.1101/2021.07.02.450883 ◽

2021 ◽

Author(s):

Xiao Luo ◽

Xiongbin Kang ◽

Alexander Schoenhuth

Keyword(s):

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Haplotype Diversity ◽

Read Length ◽

Diploid Genome ◽

Sequencing Technologies ◽

Novel Approach ◽

Long Reads ◽

Long Read

Haplotype-aware diploid genome assembly is crucial in genomics, precision medicine, and many other disciplines. Long-read sequencing technologies have greatly improved genome assembly thanks to advantages of read length. However, current long-read assemblers usually introduce disturbing biases or fail to capture the haplotype diversity of the diploid genome. Here, we present phasebook, a novel approach for reconstructing the haplotypes of diploid genomes from long reads de novo. Benchmarking experiments demonstrate that our method outperforms other approaches in terms of haplotype coverage by large margins, while preserving competitive performance or even achieving advantages in terms of all other aspects relevant for genome assembly.

Download Full-text

Application of high-throughput DNA sequencing technology in forensic genetics

Issues of Forensic Science ◽

10.34836/pk.2019.304.1 ◽

2019 ◽

Vol 304 ◽

pp. 64-73

Author(s):

Anna Woźniak ◽

◽

Michał Boroń ◽

Renata Zbieć-Piekarska ◽

Magdalena Spólnicka ◽

...

Keyword(s):

Dna Sequencing ◽

High Throughput ◽

Cost Reduction ◽

Forensic Genetics ◽

Practical Application ◽

Sequencing Technology ◽

Sequencing Technologies ◽

High Throughput Dna Sequencing ◽

Generation Sequencing ◽

Forensic Genetic

The turn of the 20th and 21st centuries marks the beginning of high-throughput DNA sequencing methods, which, owing to increasing efficiency and gradual cost reduction, have led to the revolutionization of biomedical research. This article discusses the most popular next generation sequencing technologies and their practical application in forensic genetic analysis.

Download Full-text

Correction to: Ciliate Diversity From Aquatic Environments in the Brazilian Atlantic Forest as Revealed by High-Throughput DNA Sequencing

Microbial Ecology ◽

10.1007/s00248-021-01691-1 ◽

2021 ◽

Author(s):

Noemi M. Fernandes ◽

Pedro H. Campello-Nunes ◽

Thiago S. Paiva ◽

Carlos A. G. Soares ◽

Inácio D. Silva-Neto

Keyword(s):

Dna Sequencing ◽

Atlantic Forest ◽

High Throughput ◽

Aquatic Environments ◽

Brazilian Atlantic Forest ◽

High Throughput Dna Sequencing

Download Full-text

Chemiluminiscence sensor for high-throughput DNA sequencing

Procedia Chemistry ◽

10.1016/j.proche.2009.07.272 ◽

2009 ◽

Vol 1 (1) ◽

pp. 1091-1094

Author(s):

A R A Rahman ◽

Shihui Foo ◽

Sanket Goel

Keyword(s):

Dna Sequencing ◽

High Throughput ◽

High Throughput Dna Sequencing

Download Full-text

Polymorphism discovery and allele frequency estimation using high-throughput DNA sequencing of target-enriched pooled DNA samples

BMC Genomics ◽

10.1186/1471-2164-13-16 ◽

2012 ◽

Vol 13 (1) ◽

pp. 16 ◽

Cited By ~ 12

Author(s):

Michael P Mullen ◽

Christopher J Creevey ◽

Donagh P Berry ◽

Matt S McCabe ◽

David A Magee ◽

...

Keyword(s):

Dna Sequencing ◽

Allele Frequency ◽

High Throughput ◽

Frequency Estimation ◽

Allele Frequency Estimation ◽

Polymorphism Discovery ◽

Pooled Dna ◽

High Throughput Dna Sequencing

Download Full-text

From Genetics to Genomics of Epilepsy

Neurology Research International ◽

10.1155/2012/876234 ◽

2012 ◽

Vol 2012 ◽

pp. 1-18 ◽

Cited By ~ 7

Author(s):

Silvio Garofalo ◽

Marisa Cornacchione ◽

Alfonso Di Costanzo

Keyword(s):

Dna Sequencing ◽

Dna Microarrays ◽

Molecular Cytogenetics ◽

Entire Genome ◽

Sequencing Technologies ◽

Genomic Approach ◽

Molecular Karyotype ◽

High Throughput Dna Sequencing ◽

Laboratory Technology ◽

Near Future

The introduction of DNA microarrays and DNA sequencing technologies in medical genetics and diagnostics has been a challenge that has significantly transformed medical practice and patient management. Because of the great advancements in molecular genetics and the development of simple laboratory technology to identify the mutations in the causative genes, also the diagnostic approach to epilepsy has significantly changed. However, the clinical use of molecular cytogenetics and high-throughput DNA sequencing technologies, which are able to test an entire genome for genetic variants that are associated with the disease, is preparing a further revolution in the near future. Molecular Karyotype and Next-Generation Sequencing have the potential to identify causative genes or loci also in sporadic or non-familial epilepsy cases and may well represent the transition from a genetic to a genomic approach to epilepsy.

Download Full-text

高通量DNA合成测序化学研究进展<br>Advance in Sequence Chemistry of High-Throughput DNA Sequencing by Synthesis

Bioprocess ◽

10.4236/bp.2012.21001 ◽

2012 ◽

Vol 02 (01) ◽

pp. 1-6 ◽

Cited By ~ 1

Author(s):

陈婧

Keyword(s):

Dna Sequencing ◽

High Throughput ◽

High Throughput Dna Sequencing ◽

Sequencing By Synthesis

Download Full-text