Reducing the number of artifactual repeats in de novo assembly of RNA-Seq data by optimizing the assembly pipeline

A practical guide to buildde-novoassemblies for single tissues of non-model organisms: the example of a Neotropical frog

PeerJ ◽

10.7717/peerj.3702 ◽

2017 ◽

Vol 5 ◽

pp. e3702 ◽

Cited By ~ 5

Author(s):

Santiago Montero-Mendieta ◽

Manfred Grabherr ◽

Henrik Lantz ◽

Ignacio De la Riva ◽

Jennifer A. Leonard ◽

...

Keyword(s):

Defense Mechanisms ◽

De Novo ◽

Transcriptome Assembly ◽

Cost Effective ◽

Model Organisms ◽

Rna Seq ◽

Assembly Pipeline ◽

Wide Variability ◽

History Of ◽

Inexperienced User

Whole genome sequencing (WGS) is a very valuable resource to understand the evolutionary history of poorly known species. However, in organisms with large genomes, as most amphibians, WGS is still excessively challenging and transcriptome sequencing (RNA-seq) represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome and the transcriptome must be assembledde-novo. We used RNA-seq to obtain the transcriptomic profile forOreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome ofO. cruralis. We also present a pipeline to assist with pre-processing, assembling, evaluating and functionally annotating ade-novotranscriptome from RNA-seq data of non-model organisms. Our pipeline guides the inexperienced user in an intuitive way through all the necessary steps to buildde-novotranscriptome assemblies using readily available software and is freely available at:https://github.com/biomendi/TRANSCRIPTOME-ASSEMBLY-PIPELINE/wiki.

Download Full-text

Optimizing de novo genome assembly from PCR-amplified metagenomes

10.7287/peerj.preprints.27453 ◽

2018 ◽

Author(s):

Simon Roux ◽

Gareth Trubl ◽

Danielle Goudeau ◽

Nandita Nath ◽

Estelle Couradeau ◽

...

Keyword(s):

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Pcr Amplification ◽

Error Rates ◽

De Novo Genome Assembly ◽

Low Input ◽

Assembly Algorithm ◽

Coverage Bias ◽

Assembly Pipeline

Background. Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enabling de novo assembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes. Methods. Here we evaluate de novo assembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes. Results. Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥ 10kb by 10 to 100-fold for low input metagenomes. Conclusions. PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improved de novo genome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.

Download Full-text

Author Correction: De novo assembly of the Platycladus orientalis (L.) Franco transcriptome provides insight into the development and pollination mechanism of female cone based on RNA-Seq data

Scientific Reports ◽

10.1038/s41598-019-53777-z ◽

2019 ◽

Vol 9 (1) ◽

Author(s):

Wei Zhou ◽

Qi Chen ◽

Xiao-Bing Wang ◽

Tyler O. Hughes ◽

Jian-Jun Liu ◽

...

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Rna Seq ◽

Female Cone ◽

Pollination Mechanism ◽

Platycladus Orientalis ◽

Insight Into

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

Download Full-text

De novo assembly of red clover transcriptome based on RNA-Seq data provides insight into drought response, gene discovery and marker identification

BMC Genomics ◽

10.1186/1471-2164-15-453 ◽

2014 ◽

Vol 15 (1) ◽

pp. 453 ◽

Cited By ~ 83

Author(s):

Steven A Yates ◽

Martin T Swain ◽

Matthew J Hegarty ◽

Igor Chernukin ◽

Matthew Lowe ◽

...

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Gene Discovery ◽

Red Clover ◽

Drought Response ◽

Rna Seq ◽

Response Gene ◽

Insight Into ◽

Marker Identification

Download Full-text

De novo assembly of highly polymorphic metagenomic data using in situ generated reference sequences and a novel BLAST-based assembly pipeline

BMC Bioinformatics ◽

10.1186/s12859-017-1630-z ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 9

Author(s):

You-Yu Lin ◽

Chia-Hung Hsieh ◽

Jiun-Hong Chen ◽

Xuemei Lu ◽

Jia-Horng Kao ◽

...

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Metagenomic Data ◽

Assembly Pipeline ◽

Reference Sequences

Download Full-text

Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq

Science China Life Sciences ◽

10.1007/s11427-013-4442-z ◽

2013 ◽

Vol 56 (2) ◽

pp. 143-155 ◽

Cited By ~ 37

Author(s):

BingXin Lu ◽

ZhenBing Zeng ◽

TieLiu Shi

Keyword(s):

Comparative Study ◽

De Novo Assembly ◽

De Novo ◽

Rna Seq ◽

Transcriptome Reconstruction ◽

Guided Assembly

Download Full-text

De novo assembly and characterisation of the field pea transcriptome using RNA-Seq

BMC Genomics ◽

10.1186/s12864-015-1815-7 ◽

2015 ◽

Vol 16 (1) ◽

Cited By ~ 31

Author(s):

Shimna Sudheesh ◽

Timothy I. Sawbridge ◽

Noel OI Cogan ◽

Peter Kennedy ◽

John W. Forster ◽

...

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Field Pea ◽

Rna Seq

Download Full-text

In Silico identification and annotation of non-coding RNAs by RNA-seq and De Novo assembly of the transcriptome of Tomato Fruits

PLoS ONE ◽

10.1371/journal.pone.0171504 ◽

2017 ◽

Vol 12 (2) ◽

pp. e0171504 ◽

Cited By ~ 14

Author(s):

Daria Scarano ◽

Rosa Rao ◽

Giandomenico Corrado

Keyword(s):

De Novo Assembly ◽

In Silico ◽

De Novo ◽

Rna Seq ◽

Tomato Fruits ◽

In Silico Identification ◽

Non Coding Rnas

Download Full-text

Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies

10.7287/peerj.preprints.2284 ◽

2016 ◽

Author(s):

Cédric Cabau ◽

Frédéric Escudié ◽

Anis Djari ◽

Yann Guiguen ◽

Julien Bobe ◽

...

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Error Rates ◽

Rna Seq ◽

De Novo Transcriptome ◽

Software Packages ◽

Redundancy Reduction ◽

Assembly Pipeline ◽

Free Open Source

Background De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software packages are available to perform this task. Even if their results are of good quality it is still possible to improve them in several ways including redundancy reduction or error correction. Trinity and Oases are two commonly used de novo transcriptome assemblers. The contig sets they produce are of good quality. Still, their compaction (number of contigs needed to represent the transcriptome) and their quality (chimera and nucleotide error rates) can be improved. Results We built a de novo RNA-Seq Assembly Pipeline (DRAP) which wraps these two assemblers (Trinity and Oases) in order to improve their results regarding the above-mentioned criteria. DRAP reduces from 1,3 to 15 fold the number of resulting contigs of the assemblies depending on the read set and the assembler used. This article presents seven assembly comparisons showing in some cases drastic improvements when using DRAP. DRAP does not significantly impair assembly quality metrics such are read realignment rate or protein reconstruction counts. Conclusion Transcriptome assembly is a challenging computational task even if good solutions are already available to end-users, these solutions can still be improved while conserving the overall representation and quality of the assembly. The de novo RNA-Seq Assembly Pipeline (DRAP) is an ease to use software package to produce compact and corrected transcript set. DRAP is free, open-source and available at http://www.sigenae.org/drap .

Download Full-text

RNA-Seq De Novo Assembly of Clonal Immunoglobulin Rearrangements Identifies Interesting Biology and Uncovers Prognostic Features in Multiple Myeloma

Blood ◽

10.1182/blood.v128.22.195.195 ◽

2016 ◽

Vol 128 (22) ◽

pp. 195-195

Author(s):

David Mosen-Ansorena ◽

Rachael Bashford-Rogers ◽

Niccolo Bolli ◽

Stephane Minvielle ◽

Florence Magrangeas ◽

...

Keyword(s):

Multiple Myeloma ◽

Light Chain ◽

De Novo Assembly ◽

Poor Prognosis ◽

De Novo ◽

Clinical Course ◽

Rna Seq ◽

Prognostic Features ◽

Vh Gene ◽

Cluster A

Abstract Introduction Although monoclonal immunoglobulin (Ig) production by myeloma cells is one of the central features of the disease, genotypic identification of the clonal Ig sequence remains understudied in multiple myeloma (MM). Here, using extensive RNA-seq data, we study molecular features of clonal Ig rearrangements, as well as their association with other MM markers and patient outcome. Methods We performed deep RNA-seq on purified CD138+ MM cells from 429 newly-diagnosed uniformly-treated patients with long clinical follow-up. For each sample, we performed de novo assembly using sequences that appeared in the library with a frequency of at least one in a million. Germline V and J genes were then BLASTed against the assembled contigs to determine the clonal germline genes and pinpoint mutations. Using the sequences reconstructed from the Ig contigs and the BLAST output, we ran IgBLAST to fully characterize the predominant Ig V(D)J sequence. Results We tested the accuracy of our approach by looking at 24 technical duplicates and one triplicate. In all cases, the predicted gene and gene allele were consistent across replicates. Next, we evaluated our large patient cohort, identifying IGHV3 as the most common clonal VH gene subgroup (53.3%), followed by IGHV4 (17.8%) and IGHV1 (15.6%). Importantly, we observed a significant association between poorer prognosis and IGHV3, both for progression-free survival (PFS) (p=0.0019) and overall survival (OS) (p=0.012). IGHV3-30 (11%, the most commonly rearranged VH gene) and IGHV3-9 (4.8%) were the drivers behind this poor prognosis (IGHV3-30: PFS p=0.021; OS p=0.013) (IGHV3-9: PFS p=0.002). IGHV3-30 was even more preferentially rearranged than in normal B-cell VH repertoires from previous studies (8.5%, 6.3%) and ours (2%). Remarkably, these results sharply contrast with what has been observed in CLL. In this malignancy, IGHV3-30 use has been seen to be underrepresented and usually characterizes an indolent clinical course, while IGHV3-21 and possibly IGHV3-23 carry poor prognosis. We predicted light chain usage through the presence of clonal VL sequences. The most frequent VL genes were from the κ locus (69.4% total): IGKV1-33 (12.4%), IGKV1-5 (11.3%), IGKV3-20 (9.9%) and IGKV1-39 (8.0%). Del(22q) was observed more frequently in patients with IGλ (OR=10.0, p=6e-15) and, within this group, del(22q) was more frequent if Vλ belonged to the more centromeric V-clusters C or B, in contrast to cluster A (OR=8.4, p=5e-4). Remarkably, patients with Vλ gene from cluster A presented worse OS (vs. Vk: p=0.0079; vs. Vλ B,C: p=0.067). The proportion of mutated bases was higher in the heavy chain than in the light chain (mean 7.0% vs. 4.8%, max 14.6% vs. 14.3%), and it was associated with OS (heavy p=0.0020, light p=0.036, both=0.0056), but not PFS. Interestingly, mutated Ig in CLL results in a more benign clinical course. We further found that 24.9% and 22.7% of the mutations lay within WRCY or RGYW AID motifs in the light and heavy chains respectively (enrichment p<1e-16), while AID mutations in a TW or WA context accounted for 22.9% and 25.7% (p=0.14, p=0.64). Higher ratios of mutations in WRCY vs. RGYW motifs within the light chain were highly predictive of poor prognosis (PFS p=0.0019, OS p=6.3e-4). Strikingly, IGλ usage was linked to higher ratios (p=3e-6), an association not explained by germline sequence variability (p=0.24). The usage of IGHV3 genes and the AID WRCY/RGYW motif ratio were independent markers of each other (p=1) and of other markers of poor prognosis in MM, such as presence of either t(4;14) or del(17p) (IGHV3 p=0.10; motif ratio p=0.49). In conclusion, de novo Ig heavy and light chain assembly using RNA-seq identifies interesting biology, may provide MM markers and highlights a novel application of high-throughput genomics. Disclosures Anderson: OncoPep Inc.: Equity Ownership, Membership on an entity's Board of Directors or advisory committees. Avet-Loiseau:sanofi: Consultancy; celgene: Consultancy; amgen: Consultancy; janssen: Consultancy.

Download Full-text