An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data

Xutao Deng; Samia N. Naccache; Terry Ng; Scot Federman; Linlin Li; Charles Y. Chiu; Eric L. Delwart

doi:10.1093/nar/gkv002

An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkv002 ◽

2015 ◽

Vol 43 (7) ◽

pp. e46-e46 ◽

Cited By ~ 125

Author(s):

Xutao Deng ◽

Samia N. Naccache ◽

Terry Ng ◽

Scot Federman ◽

Linlin Li ◽

...

Keyword(s):

Next Generation Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Next Generation Sequencing Data ◽

De Bruijn Graph ◽

Next Generation ◽

Sequencing Data ◽

Short Reads ◽

Ensemble Strategy ◽

Generation Sequencing

Abstract Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the short length of reads generated by most highly parallel sequencing technologies. Short reads require a high level of sequence similarities to annotated genes to confidently predict gene function or homology. Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs. We describe an ensemble strategy that integrates the sequential use of various de Bruijn graph and overlap-layout-consensus assemblers with a novel partitioned sub-assembly approach. We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly. We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches.

Download Full-text

De novo assembly of transcriptome from next-generation sequencing data

Quantitative Biology ◽

10.1007/s40484-016-0069-y ◽

2016 ◽

Vol 4 (2) ◽

pp. 94-105 ◽

Cited By ~ 5

Author(s):

Xuan Li ◽

Yimeng Kong ◽

Qiong-Yi Zhao ◽

Yuan-Yuan Li ◽

Pei Hao

Keyword(s):

Next Generation Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

De novo assembly methods for next generation sequencing data

Tsinghua Science & Technology ◽

10.1109/tst.2013.6616523 ◽

2013 ◽

Vol 18 (5) ◽

pp. 500-514 ◽

Cited By ~ 12

Author(s):

Yiming He ◽

Zhen Zhang ◽

Xiaoqing Peng ◽

Fangxiang Wu ◽

Jianxin Wang

Keyword(s):

Next Generation Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

De Novo Genome Assembly of Next-Generation Sequencing Data

Compendium of Plant Genomes - The Brassica rapa Genome ◽

10.1007/978-3-662-47901-8_4 ◽

2015 ◽

pp. 41-51

Author(s):

Min Liu ◽

Dongyuan Liu ◽

Hongkun Zheng

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Generation Sequencing

Download Full-text

PERHAPS: Paired-End short Reads-based HAPlotyping from next-generation Sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbaa320 ◽

2020 ◽

Author(s):

Jie Huang ◽

Stefano Pallotti ◽

Qianling Zhou ◽

Marcus Kleber ◽

Xiaomeng Xin ◽

...

Keyword(s):

Next Generation Sequencing ◽

Snp Array ◽

Simple Approach ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Short Read ◽

Array Data ◽

Short Reads ◽

Generation Sequencing

Abstract The identification of rare haplotypes may greatly expand our knowledge in the genetic architecture of both complex and monogenic traits. To this aim, we developed PERHAPS (Paired-End short Reads-based HAPlotyping from next-generation Sequencing data), a new and simple approach to directly call haplotypes from short-read, paired-end Next Generation Sequencing (NGS) data. To benchmark this method, we considered the APOE classic polymorphism (*1/*2/*3/*4), since it represents one of the best examples of functional polymorphism arising from the haplotype combination of two Single Nucleotide Polymorphisms (SNPs). We leveraged the big Whole Exome Sequencing (WES) and SNP-array data obtained from the multi-ethnic UK BioBank (UKBB, N=48,855). By applying PERHAPS, based on piecing together the paired-end reads according to their FASTQ-labels, we extracted the haplotype data, along with their frequencies and the individual diplotype. Concordance rates between WES directly called diplotypes and the ones generated through statistical pre-phasing and imputation of SNP-array data are extremely high (>99%), either when stratifying the sample by SNP-array genotyping batch or self-reported ethnic group. Hardy-Weinberg Equilibrium tests and the comparison of obtained haplotype frequencies with the ones available from the 1000 Genome Project further supported the reliability of PERHAPS. Notably, we were able to determine the existence of the rare APOE*1 haplotype in two unrelated African subjects from UKBB, supporting its presence at appreciable frequency (approximatively 0.5%) in the African Yoruba population. Despite acknowledging some technical shortcomings, PERHAPS represents a novel and simple approach that will partly overcome the limitations in direct haplotype calling from short read-based sequencing.

Download Full-text

Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly

PLoS ONE ◽

10.1371/journal.pone.0062856 ◽

2013 ◽

Vol 8 (4) ◽

pp. e62856 ◽

Cited By ~ 121

Author(s):

Yen-Chun Chen ◽

Tsunglin Liu ◽

Chun-Hui Yu ◽

Tzen-Yuh Chiang ◽

Chi-Chuan Hwang

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Gc Bias ◽

Generation Sequencing

Download Full-text

Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data

PLoS ONE ◽

10.1371/journal.pone.0060204 ◽

2013 ◽

Vol 8 (4) ◽

pp. e60204 ◽

Cited By ~ 42

Author(s):

Aarti Desai ◽

Veer Singh Marwah ◽

Akshay Yadav ◽

Vineet Jha ◽

Kishor Dhaygude ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Sequencing Depth ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Generation Sequencing

Download Full-text

Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine

Pharmaceutics ◽

10.3390/pharmaceutics7040523 ◽

2015 ◽

Vol 7 (4) ◽

pp. 523-541 ◽

Cited By ~ 15

Author(s):

Hao Ye ◽

Joe Meehan ◽

Weida Tong ◽

Huixiao Hong

Keyword(s):

Next Generation Sequencing ◽

Precision Medicine ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Crucial Step ◽

Short Reads ◽

Generation Sequencing

Download Full-text

NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data

GigaScience ◽

10.1093/gigascience/giab046 ◽

2021 ◽

Vol 10 (7) ◽

Author(s):

Michael D Linderman ◽

Crystal Paudyal ◽

Musab Shakeel ◽

William Kelley ◽

Ali Bashir ◽

...

Keyword(s):

Next Generation Sequencing ◽

De Novo ◽

Training Data ◽

Next Generation Sequencing Data ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Next Generation ◽

Structural Variants ◽

Sequencing Data ◽

Generation Sequencing

Abstract Background Structural variants (SVs) play a causal role in numerous diseases but are difficult to detect and accurately genotype (determine zygosity) in whole-genome next-generation sequencing data. SV genotypers that assume that the aligned sequencing data uniformly reflect the underlying SV or use existing SV call sets as training data can only partially account for variant and sample-specific biases. Results We introduce NPSV, a machine learning–based approach for genotyping previously discovered SVs that uses next-generation sequencing simulation to model the combined effects of the genomic region, sequencer, and alignment pipeline on the observed SV evidence. We evaluate NPSV alongside existing SV genotypers on multiple benchmark call sets. We show that NPSV consistently achieves or exceeds state-of-the-art genotyping accuracy across SV call sets, samples, and variant types. NPSV can specifically identify putative de novo SVs in a trio context and is robust to offset SV breakpoints. Conclusions Growing SV databases and the increasing availability of SV calls from long-read sequencing make stand-alone genotyping of previously identified SVs an increasingly important component of genome analyses. By treating potential biases as a “black box” that can be simulated, NPSV provides a framework for accurately genotyping a broad range of SVs in both targeted and genome-scale applications.

Download Full-text

Optimization of de novo transcriptome assembly from next-generation sequencing data

Genome Research ◽

10.1101/gr.103846.109 ◽

2010 ◽

Vol 20 (10) ◽

pp. 1432-1440 ◽

Cited By ~ 259

Author(s):

Y. Surget-Groba ◽

J. I. Montoya-Burgos

Keyword(s):

Next Generation Sequencing ◽

De Novo ◽

Transcriptome Assembly ◽

Next Generation Sequencing Data ◽

De Novo Transcriptome Assembly ◽

Next Generation ◽

Sequencing Data ◽

De Novo Transcriptome ◽

Generation Sequencing

Download Full-text

Faculty Opinions recommendation of VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718272765.793499663 ◽

2014 ◽

Author(s):

Gary Bader ◽

Mohamed Helmy

Keyword(s):

Next Generation Sequencing ◽

Network Analysis ◽

Next Generation Sequencing Data ◽

Cancer Genes ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text