Proteomic Identification and Meta-Analysis in Salvia hispanica RNA-Seq de novo Assemblies

Ashwil Klein; Lizex H. H. Husselmann; Achmat Williams; Liam Bell; Bret Cooper; Brent Ragar; David L. Tabb

doi:10.3390/plants10040765

Proteomic Identification and Meta-Analysis in Salvia hispanica RNA-Seq de novo Assemblies

Plants ◽

10.3390/plants10040765 ◽

2021 ◽

Vol 10 (4) ◽

pp. 765

Author(s):

Ashwil Klein ◽

Lizex H. H. Husselmann ◽

Achmat Williams ◽

Liam Bell ◽

Bret Cooper ◽

...

Keyword(s):

Genome Sequence ◽

Protein Sequence ◽

Seed Protein ◽

De Novo ◽

Meta Analysis ◽

Model Organisms ◽

Rna Seq ◽

Proteomics Experiment ◽

Good Identification ◽

Taxonomic Order

While proteomics has demonstrated its value for model organisms and for organisms with mature genome sequence annotations, proteomics has been of less value in nonmodel organisms that are unaccompanied by genome sequence annotations. This project sought to determine the value of RNA-Seq experiments as a basis for establishing a set of protein sequences to represent a nonmodel organism, in this case, the pseudocereal chia. Assembling four publicly available chia RNA-Seq datasets produced transcript sequence sets with a high BUSCO completeness, though the number of transcript sequences and Trinity “genes” varied considerably among them. After six-frame translation, ProteinOrtho detected substantial numbers of orthologs among other species within the taxonomic order Lamiales. These protein sequence databases demonstrated a good identification efficiency for three different LC-MS/MS proteomics experiments, though a seed proteome showed considerable variability in the identification of peptides based on seed protein sequence inclusion. If a proteomics experiment emphasizes a particular tissue, an RNA-Seq experiment incorporating that same tissue is more likely to support a database search identification of that proteome.

Download Full-text

A practical guide to buildde-novoassemblies for single tissues of non-model organisms: the example of a Neotropical frog

PeerJ ◽

10.7717/peerj.3702 ◽

2017 ◽

Vol 5 ◽

pp. e3702 ◽

Cited By ~ 5

Author(s):

Santiago Montero-Mendieta ◽

Manfred Grabherr ◽

Henrik Lantz ◽

Ignacio De la Riva ◽

Jennifer A. Leonard ◽

...

Keyword(s):

Defense Mechanisms ◽

De Novo ◽

Transcriptome Assembly ◽

Cost Effective ◽

Model Organisms ◽

Rna Seq ◽

Assembly Pipeline ◽

Wide Variability ◽

History Of ◽

Inexperienced User

Whole genome sequencing (WGS) is a very valuable resource to understand the evolutionary history of poorly known species. However, in organisms with large genomes, as most amphibians, WGS is still excessively challenging and transcriptome sequencing (RNA-seq) represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome and the transcriptome must be assembledde-novo. We used RNA-seq to obtain the transcriptomic profile forOreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome ofO. cruralis. We also present a pipeline to assist with pre-processing, assembling, evaluating and functionally annotating ade-novotranscriptome from RNA-seq data of non-model organisms. Our pipeline guides the inexperienced user in an intuitive way through all the necessary steps to buildde-novotranscriptome assemblies using readily available software and is freely available at:https://github.com/biomendi/TRANSCRIPTOME-ASSEMBLY-PIPELINE/wiki.

Download Full-text

De novo transcriptomes of 14 gammarid individuals for proteogenomic analysis of seven taxonomic groups

Scientific Data ◽

10.1038/s41597-019-0192-5 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 5

Author(s):

Yannick Cogne ◽

Davide Degli-Esposti ◽

Olivier Pible ◽

Duarte Gouveia ◽

Adeline François ◽

...

Keyword(s):

Aquatic Ecosystems ◽

Molecular Diversity ◽

De Novo ◽

Model Organisms ◽

Rna Seq ◽

Specific Expression ◽

Gammarus Fossarum ◽

Male Individual ◽

Taxonomic Groups ◽

Whole Transcriptome

Abstract Gammarids are amphipods found worldwide distributed in fresh and marine waters. They play an important role in aquatic ecosystems and are well established sentinel species in ecotoxicology. In this study, we sequenced the transcriptomes of a male individual and a female individual for seven different taxonomic groups belonging to the two genera Gammarus and Echinogammarus: Gammarus fossarum A, G. fossarum B, G. fossarum C, Gammarus wautieri, Gammarus pulex, Echinogammarus berilloni, and Echinogammarus marinus. These taxa were chosen to explore the molecular diversity of transcribed genes of genotyped individuals from these groups. Transcriptomes were de novo assembled and annotated. High-quality assembly was confirmed by BUSCO comparison against the Arthropod dataset. The 14 RNA-Seq-derived protein sequence databases proposed here will be a significant resource for proteogenomics studies of these ecotoxicologically relevant non-model organisms. These transcriptomes represent reliable reference sequences for whole-transcriptome and proteome studies on other gammarids, for primer design to clone specific genes or monitor their specific expression, and for analyses of molecular differences between gammarid species.

Download Full-text

rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data

10.1101/420208 ◽

2018 ◽

Cited By ~ 13

Author(s):

Elena Bushmanova ◽

Dmitry Antipov ◽

Alla Lapidus ◽

Andrey D. Prjibelski

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

Challenging Problem ◽

Rna Seq ◽

De Novo Transcriptome ◽

Weak Points ◽

Transcriptome Reconstruction ◽

Evaluation Approaches ◽

Genome Assembler

AbstractSummaryPossibility to generate large RNA-seq datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the model organisms with finished and annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing and paralogous genes. In this paper we describe a novel transcriptome assembler called rnaSPAdes, which is developed on top of SPAdes genome assembler and explores surprising computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-Seq datasets, and briefly highlight strong and weak points of different assemblers.Availability and implementationrnaSPAdes is implemented in C++ and Python and is freely available at cab.spbu.ru/software/rnaspades/.

Download Full-text

Genome sequencing of the multicellular alga Astrephomene provides insights into convergent evolution of germ-soma differentiation

Scientific Reports ◽

10.1038/s41598-021-01521-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Shota Yamashita ◽

Kayoko Yamamoto ◽

Ryo Matsuzaki ◽

Shigekatsu Suzuki ◽

Haruyo Yamaguchi ◽

...

Keyword(s):

Genome Sequence ◽

Convergent Evolution ◽

Tandem Duplication ◽

De Novo ◽

Molecular Genetic ◽

Somatic Cells ◽

Whole Genome Sequence ◽

Rna Seq ◽

Genome Data ◽

Genes Encoding

AbstractGerm-soma differentiation evolved independently in many eukaryotic lineages and contributed to complex multicellular organizations. However, the molecular genetic bases of such convergent evolution remain unresolved. Two multicellular volvocine green algae, Volvox and Astrephomene, exhibit convergent evolution of germ-soma differentiation. The complete genome sequence is now available for Volvox, while genome information is scarce for Astrephomene. Here, we generated the de novo whole genome sequence of Astrephomene gubernaculifera and conducted RNA-seq analysis of isolated somatic and reproductive cells. In Volvox, tandem duplication and neofunctionalization of the ancestral transcription factor gene (RLS1/rlsD) might have led to the evolution of regA, the master regulator for Volvox germ-soma differentiation. However, our genome data demonstrated that Astrephomene has not undergone tandem duplication of the RLS1/rlsD homolog or acquisition of a regA-like gene. Our RNA-seq analysis revealed the downregulation of photosynthetic and anabolic gene expression in Astrephomene somatic cells, as in Volvox. Among genes with high expression in somatic cells of Astrephomene, we identified three genes encoding putative transcription factors, which may regulate somatic cell differentiation. Thus, the convergent evolution of germ-soma differentiation in the volvocine algae may have occurred by the acquisition of different regulatory circuits that generate a similar division of labor.

Download Full-text

The evaluation of RNA-Seq de novo assembly by PacBio long read sequencing

10.1101/735621 ◽

2019 ◽

Author(s):

Yifan Yang ◽

Michael Gribskov

Keyword(s):

Real Time ◽

De Novo ◽

Critical Issue ◽

Evaluation Methods ◽

Model Organisms ◽

Rna Seq ◽

Long Reads ◽

Long Read ◽

Set Up ◽

Downstream Analysis

AbstractRNA-Seq de novo assembly is an important method to generate transcriptomes for non-model organisms before any downstream analysis. Given many great de novo assembly methods developed by now, one critical issue is that there is no consensus on the evaluation of de novo assembly methods yet. Therefore, to set up a benchmark for evaluating the quality of de novo assemblies is very critical. Addressing this challenge will help us deepen the insights on the properties of different de novo assemblers and their evaluation methods, and provide hints on choosing the best assembly sets as transcriptomes of non-model organisms for the further functional analysis. In this article, we generate a “real time” transcriptome using PacBio long reads as a benchmark for evaluating five de novo assemblers and two model-based de novo assembly evaluation methods. By comparing the de novo assmblies generated by RNA-Seq short reads with the “real time” transcriptome from the same biological sample, we find that Trinity is best at the completeness by generating more assemblies than the alternative assemblers, but less continuous and having more misassemblies; Oases is best at the continuity and specificity, but less complete; The performance of SOAPdenovo-Trans, Trans-AByss and IDBA-Tran are in between of five assemblers. For evaluation methods, DETONATE leverages multiple aspects of the assembly set and ranks the assembly set with an average performance as the best, meanwhile the contig score can serve as a good metric to select assemblies with high completeness, specificity, continuity but not sensitive to misassemblies; TransRate contig score is useful for removing misassemblies, yet often the assemblies in the optimal set is too few to be used as a transcriptome.

Download Full-text

Ion channel profiling of the Lymnaea stagnalis ganglia via transcriptome analysis

10.21203/rs.3.rs-31358/v1 ◽

2020 ◽

Author(s):

Nan Dong ◽

Julia Bandura ◽

Zhaolei Zhang ◽

Yan Wang ◽

Karine Labadie ◽

...

Keyword(s):

Ion Channels ◽

Reference Genome ◽

Lymnaea Stagnalis ◽

De Novo ◽

Model Organisms ◽

Sequence Length ◽

Pond Snail ◽

Sequence Information ◽

Functional Domain ◽

Rna Seq

Abstract Background. The pond snail Lymnaea stagnalis (L. stagnalis) has been widely used as a model organism in neurobiology, ecotoxicology, and parasitology due to the relative simplicity of its CNS. However, its usefulness is restricted by a limited availability of transcriptome data. While sequence information for the L. stagnalis CNS transcripts has been obtained from EST library and a de novo RNA-seq assembly, the quality of these assemblies is limited by a combination of low coverage of EST libraries, the fragmented nature of de novo assemblies, and lack of reference genome. Results. In this study, taking advantage of the recent availability of the L. stagnalis reference genome, we generated an RNA-seq library from the adult L. stagnalis CNS, using a combination of genome-guided and de novo assembly programs to identify 17,832 protein-coding L. stagnalis transcripts. We combined our library with existing resources to produce a transcript set with greater sequence length, completeness, and diversity than previously available ones. Using our assembly and functional domain analysis, we profiled L. stagnalis CNS transcripts encoding ion channels and ionotropic receptors, which are key proteins for CNS function, and compared their sequences to other vertebrate and invertebrate model organisms. Interestingly, L. stagnalis transcripts encoding numerous putative Ca2+ channels showed the most sequence similarity to those of mouse, zebrafish, Xenopus tropicalis, fruit fly, and C. elegans, suggesting that many calcium channel-related signaling pathways may be evolutionarily conserved. Conclusions. Our study provides the most thorough characterization to date of the L. stagnalis transcriptome and provides insights into differences between vertebrates and invertebrates in CNS transcript diversity, according to function and protein class. Furthermore, this study is, to the best of our knowledge, the first to provide a complete characterization of the ion channels of a single species, opening new avenues for future research on fundamental neurobiological processes.

Download Full-text

TransPi – a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly

10.1101/2021.02.18.431773 ◽

2021 ◽

Author(s):

R.E. Rivera-Vicéns ◽

C. Garcia Escudero ◽

N. Conci ◽

M. Eitel ◽

G. Wörheide

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

Rna Seq ◽

Analysis Pipeline ◽

User Input ◽

Genome Data ◽

Differential Gene ◽

Transcriptomic Level ◽

Genome Information

AbstractThe use of RNA-Seq data and the generation of de novo transcriptome assemblies have been pivotal for studies in ecology and evolution. This is distinctly true for non-model organisms, where no genome information is available; yet, studies of differential gene expression, DNA enrichment baits design, and phylogenetics can all be accomplished with the data gathered at the transcriptomic level. Multiple tools are available for transcriptome assembly, however, no single tool can provide the best assembly for all datasets. Therefore, a multi assembler approach, followed by a reduction step, is often sought to generate an improved representation of the assembly. To reduce errors in these complex analyses while at the same time attaining reproducibility and scalability, automated workflows have been essential in the analysis of RNA-Seq data. However, most of these tools are designed for species where genome data is used as reference for the assembly process, limiting their use in non-model organisms. We present TransPi, a comprehensive pipeline for de novo transcriptome assembly, with minimum user input but without losing the ability of a thorough analysis. A combination of different model organisms, kmer sets, read lengths, and read quantities were used for assessing the tool. Furthermore, a total of 49 non-model organisms, spanning different phyla, were also analyzed. Compared to approaches using single assemblers only, TransPi produces higher BUSCO completeness percentages, and a concurrent significant reduction in duplication rates. TransPi is easy to configure and can be deployed seamlessly using Conda, Docker and Singularity.

Download Full-text

A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model

10.1101/156406 ◽

2017 ◽

Cited By ~ 2

Author(s):

Mickael Orgeur ◽

Marvin Martens ◽

Stefan T. Börno ◽

Bernd Timmermann ◽

Delphine Duprez ◽

...

Keyword(s):

Genome Sequence ◽

De Novo ◽

Gene Annotation ◽

Transcriptome Assembly ◽

Draft Genome ◽

Transcript Abundance ◽

Accurate Estimation ◽

Rna Seq ◽

A Genome ◽

Transcript Discovery

AbstractThe sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq) data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads and the gene annotation that defines gene features must also be taken into account. Partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and ade novotranscriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.

Download Full-text

Glutton: large-scale integration of non-model organism transcriptome data for comparative analysis

10.1101/077511 ◽

2016 ◽

Cited By ~ 2

Author(s):

Alan Medlar ◽

Laura Laakso ◽

Andreia Miraldo ◽

Ari Löytynoja

Keyword(s):

Comparative Analysis ◽

Large Scale ◽

De Novo ◽

Sequence Data ◽

Model Organism ◽

Model Organisms ◽

Rna Seq ◽

Reference Species ◽

Wide Range ◽

The Impact

AbstractHigh-throughput RNA-seq data has become ubiquitous in the study of non-model organisms, but its use in comparative analysis remains a challenge. Without a reference genome for mapping, sequence data has to be de novo assembled, producing large numbers of short, highly redundant contigs. Preparing these assemblies for comparative analyses requires the removal of redundant isoforms, assignment of orthologs and converting fragmented transcripts into gene alignments. In this article we present Glutton, a novel tool to process transcriptome assemblies for downstream evolutionary analyses. Glutton takes as input a set of fragmented, possibly erroneous transcriptome assemblies. Utilising phylogeny-aware alignment and reference data from a closely related species, it reconstructs one transcript per gene, finds orthologous sequences and produces accurate multiple alignments of coding sequences. We present a comprehensive analysis of Glutton’s performance across a wide range of divergence times between study and reference species. We demonstrate the impact choice of assembler has on both the number of alignments and the correctness of ortholog assignment and show substantial improvements over heuristic methods, without sacrificing correctness. Finally, using inference of Darwinian selection as an example of downstream analysis, we show that Glutton-processed RNA-seq data give results comparable to those obtained from full length gene sequences even with distantly related reference species. Glutton is available from http://wasabiapp.org/software/glutton/ and is licensed under the GPLv3.

Download Full-text

Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment

10.1101/2021.04.23.441097 ◽

2021 ◽

Author(s):

Anish M.S. Shrestha ◽

Joyce Emlyn B. Guiao ◽

Kyle Christian R. Santiago

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Expression Analysis ◽

De Novo ◽

Transcriptome Assembly ◽

Differential Expression Analysis ◽

Homology Search ◽

Model Organisms ◽

Rna Seq ◽

Protein Database

AbstractRNA-seq is being increasingly adopted for gene expression studies in a panoply of non-model organisms, with applications spanning the fields of agriculture, aquaculture, ecology, and environment. Conventional differential expression analysis for organisms without reference sequences requires performing computationally expensive and error-prone de-novo transcriptome assembly, followed by homology search against a high-confidence protein database for functional annotation. We propose a shortcut, where we obtain counts for differential expression analysis by directly aligning RNA-seq reads to the protein database. Through experiments on simulated and real data, we show drastic reductions in run-time and memory usage, with no loss in accuracy. A Snakemake implementation of our workflow is available at:https://bitbucket.org/project_samar/samar

Download Full-text