De novo transcriptome analysis of dermal tissue from the rough-skinned newt, Taricha granulosa, enables investigation of tetrodotoxin expression

TrancriptomeReconstructoR, A Data-Driven Annotation of Complex Transcriptomes

10.21203/rs.3.rs-131404/v1 ◽

2020 ◽

Author(s):

Maxim Ivanov ◽

Albin Sandelin ◽

Sebastian Marquardt

Keyword(s):

De Novo ◽

Gene Annotation ◽

R Package ◽

Sequence Information ◽

Rna Seq ◽

Sequencing Data ◽

Gene Model ◽

Preparation Methods ◽

Downstream Analysis

Abstract Background: The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data. Results: We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5' and 3' tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.Conclusions: Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.

Download Full-text

De novo assembly and inferred functional annotation of the transcriptome of Heterosigma akashiwo

10.22541/au.164112034.43396317/v1 ◽

2022 ◽

Author(s):

Masanao Sato ◽

Masahide Seki ◽

Yutaka Suzuki ◽

Shoko Ueki

Keyword(s):

Functional Annotation ◽

De Novo ◽

Molecular Level ◽

Transcriptome Assembly ◽

Practical Interest ◽

Heterosigma Akashiwo ◽

Sequence Information ◽

Biological Characterization ◽

Nucleotide Database

Heterosigma akashiwo is a eukaryotic, cosmopolitan, and unicellular alga (class: Raphidophyceae), and produces fish-killing blooms. There is a substantial scientific and practical interest in its ecophysiological characteristics that determine bloom dynamics and its adaptation to broad climate zones. A well-annotated genomic/genetic sequence information enables researchers to characterize organisms using modern molecular technology. The Chloroplast and the mitochondrial genome sequences and transcriptome sequence assembly (TSA) datasets with limited sizes for H. akashiwo are available in NCBI nucleotide database on December 2021: there is no doubt that more genetic information of the species will greatly enhance the progress of biological characterization of the species. Here, we conducted H. akashiwo RNA sequencing, a de novo transcriptome assembly (NCBI TSA ICRV01) of a large number of high-quality short-read sequences, and the functional annotation of predicted genes. Based on our transcriptome, we confirmed that the organism possesses genes that were predicted to function in phagocytosis, supporting the earlier observations of H. akashiwo bacterivory. Along with its capability for photosynthesis, the mixotrophy of H. akashiwo may partially explain its high adaptability to various environmental conditions. Our study here will provide an important toehold to decipher H. akashiwo ecophysiology at a molecular level.

Download Full-text

Construction of a reference transcriptome for the analysis of male sterility in sugi (Cryptomeria japonica D. Don) focusing on MALE STERILITY 1 (MS1)

PLoS ONE ◽

10.1371/journal.pone.0247180 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0247180

Author(s):

Fu-Jin Wei ◽

Saneyoshi Ueno ◽

Tokuko Ujino-Ihara ◽

Maki Saito ◽

Yoshihiko Tsumura ◽

...

Keyword(s):

Male Sterility ◽

Cryptomeria Japonica ◽

Evolutionary Biology ◽

De Novo ◽

Transcriptome Assembly ◽

Single Copy ◽

Reference Transcriptome ◽

Three Stages ◽

Significant Expression ◽

Short Timeframe

Sugi (Cryptomeria japonica D. Don) is an important conifer used for afforestation in Japan. As the genome of this species is 11 Gbps, it is too large to assemble within a short timeframe. Transcriptomics is one approach that can address this deficiency. Here we designed a workflow consisting of three stages to de novo assemble transcriptome using Oases and Trinity. The three transcriptomic stage used were independent assembly, automatic and semi-manual integration, and refinement by filtering out potential contamination. We identified a set of 49,795 cDNA and an equal number of translated proteins. According to the benchmark set by BUSCO, 87.01% of cDNAs identified were complete genes, and 78.47% were complete and single-copy genes. Compared to other full-length cDNA resources collected by Sanger and PacBio sequencers, the extent of the coverage in our dataset was the highest, indicating that these data can be safely used for further studies. When two tissue-specific libraries were compared, there were significant expression differences between male strobili and leaf and bark sets. Moreover, subtle expression difference between male-fertile and sterile libraries were detected. Orthologous genes from other model plants and conifer species were identified. We demonstrated that our transcriptome assembly output (CJ3006NRE) can serve as a reference transcriptome for future functional genomics and evolutionary biology studies.

Download Full-text

Proteotranscriptomics assisted gene annotation and spatial proteomics of Bombyx mori BmN4 cell line

10.21203/rs.3.rs-23159/v2 ◽

2020 ◽

Author(s):

Michal Levin ◽

Marion Scheibe ◽

Falk Butter

Keyword(s):

Mass Spectrometry ◽

Bombyx Mori ◽

Cell Line ◽

De Novo ◽

High Resolution Mass Spectrometry ◽

Gene Annotation ◽

Transcriptome Assembly ◽

Model Organisms ◽

Sequence Information ◽

A Genome

Abstract BackgroundThe process of identifying all coding regions in a genome is crucial for any study at the level of molecular biology, ranging from single-gene cloning to genome-wide measurements using RNA-Seq or mass spectrometry. While satisfactory annotation has been made feasible for well-studied model organisms through great efforts of big consortia, for most systems this kind of data is either absent or not adequately precise. ResultsCombining in-depth transcriptome sequencing and high resolution mass spectrometry, we here use proteotranscriptomics to improve gene annotation of protein-coding genes in the Bombyx mori cell line BmN4 which is an increasingly used tool for the analysis of piRNA biogenesis and function. Using this approach we provide the exact coding sequence and evidence for more than 6,200 genes on the protein level. Furthermore using spatial proteomics, we establish the subcellular localization of thousands of these proteins. We show that our approach outperforms current Bombyx mori annotation attempts in terms of accuracy and coverage. ConclusionsWe show that proteotranscriptomics is an efficient, cost-effective and accurate approach to improve previous annotations or generate new gene models. As this technique is based on de-novo transcriptome assembly, it provides the possibility to study any species also in the absence of genome sequence information for which proteogenomics would be impossible.

Download Full-text

De novo assembly and inferred functional annotation of the transcriptome of Heterosigma akashiwo

10.22541/au.164112034.43396317/v2 ◽

2022 ◽

Author(s):

Masanao Sato ◽

Masahide Seki ◽

Yutaka Suzuki ◽

Shoko Ueki

Keyword(s):

Functional Annotation ◽

De Novo ◽

Molecular Level ◽

Transcriptome Assembly ◽

Practical Interest ◽

Heterosigma Akashiwo ◽

Sequence Information ◽

Biological Characterization ◽

Nucleotide Database

Heterosigma akashiwo is a eukaryotic, cosmopolitan, and unicellular alga (class: Raphidophyceae), and produces fish-killing blooms. There is a substantial scientific and practical interest in its ecophysiological characteristics that determine bloom dynamics and its adaptation to broad climate zones. A well-annotated genomic/genetic sequence information enables researchers to characterize organisms using modern molecular technology. The Chloroplast and the mitochondrial genome sequences and transcriptome sequence assembly (TSA) datasets with limited sizes for H. akashiwo are available in NCBI nucleotide database on December 2021: there is no doubt that more genetic information of the species will greatly enhance the progress of biological characterization of the species. Here, we conducted H. akashiwo RNA sequencing, a de novo transcriptome assembly (NCBI TSA ICRV01) of a large number of high-quality short-read sequences, and the functional annotation of predicted genes. Based on our transcriptome, we confirmed that the organism possesses genes that were predicted to function in phagocytosis, supporting the earlier observations of H. akashiwo bacterivory. Along with its capability for photosynthesis, the mixotrophy of H. akashiwo may partially explain its high adaptability to various environmental conditions. Our study here will provide an important toehold to decipher H. akashiwo ecophysiology at a molecular level.

Download Full-text

TrancriptomeReconstructoR: data-driven annotation of complex transcriptomes

10.1101/2020.12.10.418897 ◽

2020 ◽

Author(s):

Maxim Ivanov ◽

Albin Sandelin ◽

Sebastian Marquardt

Keyword(s):

De Novo ◽

Gene Annotation ◽

R Package ◽

Sequence Information ◽

Rna Seq ◽

Sequencing Data ◽

Gene Model ◽

Preparation Methods ◽

Downstream Analysis

AbstractBackgroundThe quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data.ResultsWe developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5’ and 3’ tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.ConclusionsOur proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.

Download Full-text

Development of a relevant strategy using de novo transcriptome assembly method for transcriptome comparisons between Muscovy and common duck species and their reciprocal inter-specific Mule and Hinny hybrids fed ad libitum and overfed

10.21203/rs.3.rs-18946/v3 ◽

2020 ◽

Author(s):

Xi Liu ◽

Frédéric Hérault ◽

Christian Diot ◽

Erwan Corre

Keyword(s):

Gene Expression ◽

De Novo ◽

Transcriptome Assembly ◽

Genetic Type ◽

Rna Seq ◽

Genetic Types ◽

Assembly Method ◽

Muscovy Ducks ◽

Ad Libitum ◽

Different Response

Abstract Background: Common Pekin and Muscovy ducks and their intergeneric hinny and mule hybrids have different abilities for fatty liver production. RNA-Seq analyses from the liver of these different genetic types fed ad libitum or overfed would help to identify genes with different response to overfeeding between them. However RNA-seq analyses from different species and comparison is challenging. The goal of this study was develop a relevant strategy for transcriptome analysis and comparison between different species.Results: Transcriptomes were first assembled with a reference-based approach. Important mapping biases were observed when heterologous mapping were conducted on common duck reference genome, suggesting that this reference-based strategy was not suited to compare the four different genetic types. De novo transcriptome assemblies were then performed using Trinity and Oases. Assemblies of transcriptomes were not relevant when more than a single genetic type was considered. Finally, single genetic type transcriptomes were assembled with DRAP in a mega-transcriptome. No bias was observed when reads from the different genetic types were mapped on this mega-transcriptome and differences in gene expression between the four genetic types could be identified.Conclusions: Analyses using both reference-based and de novo transcriptome assemblies point out a good performance of the de novo approach for the analysis of gene expression in different species. It also allowed the identification of differences in responses to overfeeding between Pekin and Muscovy ducks and hinny and mule hybrids.

Download Full-text

Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies

10.7287/peerj.preprints.2284 ◽

2016 ◽

Author(s):

Cédric Cabau ◽

Frédéric Escudié ◽

Anis Djari ◽

Yann Guiguen ◽

Julien Bobe ◽

...

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Error Rates ◽

Rna Seq ◽

De Novo Transcriptome ◽

Software Packages ◽

Redundancy Reduction ◽

Assembly Pipeline ◽

Free Open Source

Background De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software packages are available to perform this task. Even if their results are of good quality it is still possible to improve them in several ways including redundancy reduction or error correction. Trinity and Oases are two commonly used de novo transcriptome assemblers. The contig sets they produce are of good quality. Still, their compaction (number of contigs needed to represent the transcriptome) and their quality (chimera and nucleotide error rates) can be improved. Results We built a de novo RNA-Seq Assembly Pipeline (DRAP) which wraps these two assemblers (Trinity and Oases) in order to improve their results regarding the above-mentioned criteria. DRAP reduces from 1,3 to 15 fold the number of resulting contigs of the assemblies depending on the read set and the assembler used. This article presents seven assembly comparisons showing in some cases drastic improvements when using DRAP. DRAP does not significantly impair assembly quality metrics such are read realignment rate or protein reconstruction counts. Conclusion Transcriptome assembly is a challenging computational task even if good solutions are already available to end-users, these solutions can still be improved while conserving the overall representation and quality of the assembly. The de novo RNA-Seq Assembly Pipeline (DRAP) is an ease to use software package to produce compact and corrected transcript set. DRAP is free, open-source and available at http://www.sigenae.org/drap .

Download Full-text

De novo assembly and annotation of the eastern fence lizard (Sceloporus undulatus) transcriptome

10.1101/136069 ◽

2017 ◽

Author(s):

Mariana B. Grizante ◽

Marc Tollis ◽

Juan J. Rodriguez ◽

Ofir Levy ◽

Michael J. Angilletta ◽

...

Keyword(s):

Complex Traits ◽

De Novo ◽

Transcriptome Assembly ◽

Single Copy ◽

Genomic Research ◽

Sceloporus Undulatus ◽

Protein Coding ◽

Average Contig Length ◽

Green Anole Lizard ◽

Eastern Fence Lizard

AbstractBackgroundThe eastern fence lizard (Sceloporus undulatus) has been a model species for ecological and evolutionary research. Genomic and transcriptomic resources for this species would promote investigation of genetic mechanisms that underpin plastic responses to environmental stress, such as climate warming. Moreover, such resources would aid comparative studies of complex traits at the molecular level, such as the transition from oviparous to viviparous reproduction, which happened at least four times within Sceloporus.FindingsA de novo transcriptome assembly for Sceloporus undulatus, Sund_v1.0, was generated using over 179 million Illumina reads obtained from three tissues (whole brain, skeletal muscle, and embryo) as well as previously reported liver sequences. The Sund_v1.0 assembly had an average contig length of 782 nucleotides and an E90N50 statistic of 2,550 nucleotides. Comparing S. undulatus transcripts with the benchmarking universal single-copy orthologs (BUSCO) for tetrapod species yielded 97.2% gene representation. A total of 13,422 protein-coding orthologs were identified in comparison to the genome of the green anole lizard, Anolis carolinensis, which is the closest related species with genomic data available.ConclusionsThe multi-tissue transcriptome of S. undulatus is the first for a member of the family Phrynosomatidae, offering an important resource to advance studies of adaptation in this species and genomic research in reptiles.

Download Full-text

Development of a relevant strategy using de novo transcriptome assembly method for transcriptome comparisons between Muscovy and common duck species and their reciprocal inter-specific Mule and Hinny hybrids fed ad libitum and overfed

10.21203/rs.3.rs-18946/v2 ◽

2020 ◽

Author(s):

Xi Liu ◽

Frédéric Hérault ◽

Christian Diot ◽

Erwan Corre

Keyword(s):

Gene Expression ◽

De Novo ◽

Transcriptome Assembly ◽

Genetic Type ◽

Rna Seq ◽

Genetic Types ◽

Assembly Method ◽

Muscovy Ducks ◽

Ad Libitum ◽

Different Response

Abstract Background: Common Pekin and Muscovy ducks and their intergeneric hinny and mule hybrids have different abilities for fatty liver production. RNA-Seq analyses from the liver of these different genetic types fed ad libitum or overfed would help to identify genes with different response to overfeeding between them. However RNA-seq analyses from different species and comparison is challenging. The goal of this study was develop a relevant strategy for transcriptome analysis and comparison between different species.Results: Transcriptomes were first assembled with a reference-based approach. Important mapping biases were observed when heterologous mapping were conducted on common duck reference genome, suggesting that this reference-based strategy was not suited to compare the four different genetic types. De novo transcriptome assemblies were then performed using Trinity and Oases. Assemblies of transcriptomes were not relevant when more than a single genetic type was considered. Finally, single genetic type transcriptomes were assembled with DRAP in a mega-transcriptome. No bias was observed when reads from the different genetic types were mapped on this mega-transcriptome and differences in gene expression between the four genetic types could be identified.Conclusions: Analyses using both reference-based and de novo transcriptome assemblies point out a good performance of the de novo approach for the analysis of gene expression in different species. It also allowed the identification of differences in responses to overfeeding between Pekin and Muscovy ducks and hinny and mule hybrids.

Download Full-text