scholarly journals Optimization of de novo transcriptome assembly from next-generation sequencing data

2010 ◽  
Vol 20 (10) ◽  
pp. 1432-1440 ◽  
Author(s):  
Y. Surget-Groba ◽  
J. I. Montoya-Burgos
2012 ◽  
Vol 12 (5) ◽  
pp. 834-845 ◽  
Author(s):  
V. CAHAIS ◽  
P. GAYRAL ◽  
G. TSAGKOGEORGA ◽  
J. MELO‐FERREIRA ◽  
M. BALLENGHIEN ◽  
...  

2015 ◽  
Vol 43 (7) ◽  
pp. e46-e46 ◽  
Author(s):  
Xutao Deng ◽  
Samia N. Naccache ◽  
Terry Ng ◽  
Scot Federman ◽  
Linlin Li ◽  
...  

Abstract Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the short length of reads generated by most highly parallel sequencing technologies. Short reads require a high level of sequence similarities to annotated genes to confidently predict gene function or homology. Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs. We describe an ensemble strategy that integrates the sequential use of various de Bruijn graph and overlap-layout-consensus assemblers with a novel partitioned sub-assembly approach. We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly. We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches.


PLoS ONE ◽  
2013 ◽  
Vol 8 (4) ◽  
pp. e62856 ◽  
Author(s):  
Yen-Chun Chen ◽  
Tsunglin Liu ◽  
Chun-Hui Yu ◽  
Tzen-Yuh Chiang ◽  
Chi-Chuan Hwang

2016 ◽  
Vol 4 (2) ◽  
pp. 94-105 ◽  
Author(s):  
Xuan Li ◽  
Yimeng Kong ◽  
Qiong-Yi Zhao ◽  
Yuan-Yuan Li ◽  
Pei Hao

2013 ◽  
Vol 18 (5) ◽  
pp. 500-514 ◽  
Author(s):  
Yiming He ◽  
Zhen Zhang ◽  
Xiaoqing Peng ◽  
Fangxiang Wu ◽  
Jianxin Wang

GigaScience ◽  
2021 ◽  
Vol 10 (7) ◽  
Author(s):  
Michael D Linderman ◽  
Crystal Paudyal ◽  
Musab Shakeel ◽  
William Kelley ◽  
Ali Bashir ◽  
...  

Abstract Background Structural variants (SVs) play a causal role in numerous diseases but are difficult to detect and accurately genotype (determine zygosity) in whole-genome next-generation sequencing data. SV genotypers that assume that the aligned sequencing data uniformly reflect the underlying SV or use existing SV call sets as training data can only partially account for variant and sample-specific biases. Results We introduce NPSV, a machine learning–based approach for genotyping previously discovered SVs that uses next-generation sequencing simulation to model the combined effects of the genomic region, sequencer, and alignment pipeline on the observed SV evidence. We evaluate NPSV alongside existing SV genotypers on multiple benchmark call sets. We show that NPSV consistently achieves or exceeds state-of-the-art genotyping accuracy across SV call sets, samples, and variant types. NPSV can specifically identify putative de novo SVs in a trio context and is robust to offset SV breakpoints. Conclusions Growing SV databases and the increasing availability of SV calls from long-read sequencing make stand-alone genotyping of previously identified SVs an increasingly important component of genome analyses. By treating potential biases as a “black box” that can be simulated, NPSV provides a framework for accurately genotyping a broad range of SVs in both targeted and genome-scale applications.


Sign in / Sign up

Export Citation Format

Share Document