Transcriptome sequencing reveals high isoform diversity in the ant Formica exsecta

PeerJ ◽

10.7717/peerj.3998 ◽

2017 ◽

Vol 5 ◽

pp. e3998 ◽

Cited By ~ 4

Author(s):

Kishor Dhaygude ◽

Kalevi Trontti ◽

Jenni Paviala ◽

Claire Morandin ◽

Christopher Wheat ◽

...

Keyword(s):

Rna Sequencing ◽

De Novo ◽

Splice Variants ◽

Transcriptome Assembly ◽

Sequencing Data ◽

Genetic Studies ◽

Final Assembly ◽

Isoform Diversity ◽

Gene Ontologies ◽

Scaffolding Software

Transcriptome resources for social insects have the potential to provide new insight into polyphenism, i.e., how divergent phenotypes arise from the same genome. Here we present a transcriptome based on paired-end RNA sequencing data for the ant Formica exsecta (Formicidae, Hymenoptera). The RNA sequencing libraries were constructed from samples of several life stages of both sexes and female castes of queens and workers, in order to maximize representation of expressed genes. We first compare the performance of common assembly and scaffolding software (Trinity, Velvet-Oases, and SOAPdenovo-trans), in producing de novo assemblies. Second, we annotate the resulting expressed contigs to the currently published genomes of ants, and other insects, including the honeybee, to filter genes that have annotation evidence of being true genes. Our pipeline resulted in a final assembly of altogether 39,262 mRNA transcripts, with an average coverage of >300X, belonging to 17,496 unique genes with annotation in the related ant species. From these genes, 536 genes were unique to one caste or sex only, highlighting the importance of comprehensive sampling. Our final assembly also showed expression of several splice variants in 6,975 genes, and we show that accounting for splice variants affects the outcome of downstream analyses such as gene ontologies. Our transcriptome provides an outstanding resource for future genetic studies on F. exsecta and other ant species, and the presented transcriptome assembly can be adapted to any non-model species that has genomic resources available from a related taxon.

Download Full-text

Computational Approaches for Transcriptome Assembly Based on Sequencing Technologies

Current Bioinformatics ◽

10.2174/1574893614666190410155603 ◽

2020 ◽

Vol 15 (1) ◽

pp. 2-16

Author(s):

Yuwen Luo ◽

Xingyu Liao ◽

Fang-Xiang Wu ◽

Jianxin Wang

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Critical Role ◽

High Sensitivity ◽

Biological Properties ◽

Sequencing Data ◽

Sequencing Technologies ◽

Long Reads ◽

Massive Sequencing ◽

Generation Sequencing

Transcriptome assembly plays a critical role in studying biological properties and examining the expression levels of genomes in specific cells. It is also the basis of many downstream analyses. With the increase of speed and the decrease in cost, massive sequencing data continues to accumulate. A large number of assembly strategies based on different computational methods and experiments have been developed. How to efficiently perform transcriptome assembly with high sensitivity and accuracy becomes a key issue. In this work, the issues with transcriptome assembly are explored based on different sequencing technologies. Specifically, transcriptome assemblies with next-generation sequencing reads are divided into reference-based assemblies and de novo assemblies. The examples of different species are used to illustrate that long reads produced by the third-generation sequencing technologies can cover fulllength transcripts without assemblies. In addition, different transcriptome assemblies using the Hybrid-seq methods and other tools are also summarized. Finally, we discuss the future directions of transcriptome assemblies.

Download Full-text

The rate and spectrum of mosaic mutations during embryogenesis revealed by RNA sequencing of 49 tissues

10.1101/687822 ◽

2019 ◽

Cited By ~ 1

Author(s):

Francesc Muyas ◽

Luis Zapata ◽

Roderic Guigó ◽

Stephan Ossowski

Keyword(s):

Rna Sequencing ◽

De Novo ◽

Genetic Disorders ◽

Adult Life ◽

Diagnostic Procedures ◽

Cancer Predisposition ◽

Sequencing Data ◽

Individual Study ◽

Similar Frequency ◽

Mutational Spectrum

AbstractBackgroundMosaic mutations acquired during early embryogenesis can lead to severe early-onset genetic disorders and cancer predisposition, but are often undetectable in blood samples. The rate and mutational spectrum of embryonic mosaic mutations (EMMs) have only been studied in few tissues and their contribution to genetic disorders is unknown. Therefore, we investigated how frequent mosaic mutations occur during embryogenesis across all germ layers and tissues.ResultsUsing RNA sequencing data from the Genotype-Tissue Expression (GTEx) cohort comprising 49 normal tissues and 570 individuals, we found that new-borns on average harbour 0.5 - 1 EMMs in the exome affecting multiple organs (1.3230 × 10−8 per nucleotide per individual), a similar frequency as reported for germline de novo mutations. Our multi-tissue, multi-individual study design allowed us to distinguish mosaic mutations acquired during different stages of embryogenesis and adult life, as well as to provide insights into the rate and spectrum of mosaic mutations. We observed that EMMs are dominated by a mutational signature associated with spontaneous deamination of methylated cytosines and the number of cell divisions. After birth, cells continue to accumulate somatic mutations, which can lead to the development of cancer. Investigation of the mutational spectrum of the gastrointestinal tract revealed a mutational pattern associated with the food-borne carcinogen aflatoxin, a signature that has so far only been reported in liver cancer.ConclusionIn summary, our multi-tissue, multi-individual study reveals a surprisingly high number of embryonic mosaic mutations in coding regions, implying novel hypotheses and diagnostic procedures for investigating genetic causes of disease and cancer predisposition.

Download Full-text

YerA41, a Yersinia ruckeri Bacteriophage: Determination of a Non-Sequencable DNA Bacteriophage Genome via RNA-Sequencing

Viruses ◽

10.3390/v12060620 ◽

2020 ◽

Vol 12 (6) ◽

pp. 620

Author(s):

Katarzyna Leskinen ◽

Maria I. Pajunen ◽

Miguel Vincente Gomez-Raya Vilanova ◽

Saija Kiljunen ◽

Andrew Nelson ◽

...

Keyword(s):

Rna Sequencing ◽

Transcriptional Control ◽

De Novo ◽

Genomic Sequence ◽

Pcr Amplification ◽

Yersinia Ruckeri ◽

Sequencing Data ◽

Bacterial Gene ◽

Sequencing Technologies ◽

Bacterial Gene Expression

YerA41 is a Myoviridae bacteriophage that was originally isolated due its ability to infect Yersinia ruckeri bacteria, the causative agent of enteric redmouth disease of salmonid fish. Several attempts to determine its genomic DNA sequence using traditional and next generation sequencing technologies failed, indicating that the phage genome is modified in such a way that it is an unsuitable template for PCR amplification and for conventional sequencing. To determine the YerA41 genome sequence, we performed RNA-sequencing from phage-infected Y. ruckeri cells at different time points post-infection. The host-genome specific reads were subtracted and de novo assembly was performed on the remaining unaligned reads. This resulted in nine phage-specific scaffolds with a total length of 143 kb that shared only low level and scattered identity to known sequences deposited in DNA databases. Annotation of the sequences revealed 201 predicted genes, most of which found no homologs in the databases. Proteome studies identified altogether 63 phage particle-associated proteins. The RNA-sequencing data were used to characterize the transcriptional control of YerA41 and to investigate its impact on the bacterial gene expression. Overall, our results indicate that RNA-sequencing can be successfully used to obtain the genomic sequence of non-sequencable phages, providing simultaneous information about the phage–host interactions during the process of infection.

Download Full-text

Comparative assessment of long-read error correction software applied to Nanopore RNA-sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbz058 ◽

2019 ◽

Vol 21 (4) ◽

pp. 1164-1181 ◽

Cited By ~ 9

Author(s):

Leandro Lima ◽

Camille Marchet ◽

Ségolène Caboche ◽

Corinne Da Silva ◽

Benjamin Istace ◽

...

Keyword(s):

Error Correction ◽

Rna Sequencing ◽

Gene Families ◽

Error Rates ◽

Open Reading Frames ◽

Sequencing Data ◽

Isoform Diversity ◽

Long Reads ◽

Long Read ◽

Read Error Correction

Abstract Motivation Nanopore long-read sequencing technology offers promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However this technology is currently hindered by high error rates in the output data that affect analyses such as the identification of isoforms, exon boundaries, open reading frames and creation of gene catalogues. Due to the novelty of such data, computational methods are still actively being developed and options for the error correction of Nanopore RNA-sequencing long reads remain limited. Results In this article, we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting cDNA Nanopore reads. We provide an automatic and extensive benchmark tool that not only reports classical error correction metrics but also the effect of correction on gene families, isoform diversity, bias toward the major isoform and splice site detection. We find that long read error correction tools that were originally developed for DNA are also suitable for the correction of Nanopore RNA-sequencing data, especially in terms of increasing base pair accuracy. Yet investigators should be warned that the correction process perturbs gene family sizes and isoform diversity. This work provides guidelines on which (or whether) error correction tools should be used, depending on the application type. Benchmarking software https://gitlab.com/leoisl/LR_EC_analyser

Download Full-text

De Novo Assembly of a Bell Pepper Endornavirus Genome Sequence Using RNA Sequencing Data

Genome Announcements ◽

10.1128/genomea.00061-15 ◽

2015 ◽

Vol 3 (2) ◽

Cited By ~ 5

Author(s):

Yeonhwa Jo ◽

Hoseng Choi ◽

Won Kyong Cho

Keyword(s):

Rna Sequencing ◽

Genome Sequence ◽

De Novo Assembly ◽

De Novo ◽

Bell Pepper ◽

Sequencing Data

Download Full-text

Raw transcriptomics data to gene specific SSRs: a validated free bioinformatics workflow for biologists

Scientific Reports ◽

10.1038/s41598-020-75270-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

D. N. U. Naranpanawa ◽

C. H. W. M. R. B. Chandrasekara ◽

P. C. G. Bandaranayake ◽

A. U. Bandaranayake

Keyword(s):

De Novo ◽

Sequence Data ◽

Transcriptome Assembly ◽

Low Cost ◽

Santalum Album ◽

Sequencing Data ◽

Illumina Hiseq ◽

Tissue Samples ◽

Downstream Analysis ◽

Bioinformatics Workflow

Abstract Recent advances in next-generation sequencing technologies have paved the path for a considerable amount of sequencing data at a relatively low cost. This has revolutionized the genomics and transcriptomics studies. However, different challenges are now created in handling such data with available bioinformatics platforms both in assembly and downstream analysis performed in order to infer correct biological meaning. Though there are a handful of commercial software and tools for some of the procedures, cost of such tools has made them prohibitive for most research laboratories. While individual open-source or free software tools are available for most of the bioinformatics applications, those components usually operate standalone and are not combined for a user-friendly workflow. Therefore, beginners in bioinformatics might find analysis procedures starting from raw sequence data too complicated and time-consuming with the associated learning-curve. Here, we outline a procedure for de novo transcriptome assembly and Simple Sequence Repeats (SSR) primer design solely based on tools that are available online for free use. For validation of the developed workflow, we used Illumina HiSeq reads of different tissue samples of Santalum album (sandalwood), generated from a previous transcriptomics project. A portion of the designed primers were tested in the lab with relevant samples and all of them successfully amplified the targeted regions. The presented bioinformatics workflow can accurately assemble quality transcriptomes and develop gene specific SSRs. Beginner biologists and researchers in bioinformatics can easily utilize this workflow for research purposes.

Download Full-text

Transcriptome analysis reveals higher levels of mobile element-associated abnormal gene transcripts in temporal lobe epilepsy patients

10.1101/2021.05.14.444199 ◽

2021 ◽

Author(s):

Hai Hu ◽

Ping Liang

Keyword(s):

Temporal Lobe Epilepsy ◽

Temporal Lobe ◽

Molecular Mechanisms ◽

De Novo ◽

Splice Variants ◽

Transcriptome Assembly ◽

Hippocampal Sclerosis ◽

Mobile Element ◽

Loss Of Function ◽

Gene Transcripts

Objective: To determine role of abnormal splice variants associated with mobile elements in epilepsy. Methods: Publicly available human RNA-seq-based transcriptome data for laser-captured dentate granule cells of post-mortem hippocampal tissues from temporal lobe epilepsy patients with (TLE, N=14 for 7 subjects) and without hippocampal sclerosis (TLE-HS, N=8 for 5 subjects) and healthy individuals (N=51), surgically resected bulk neocortex tissues from TLE patients (TLE-NC, N=17). For each individual sample, de novo transcriptome assembly was performed followed by identification of spliced gene transcripts containing mobile element (ME) sequences (ME-transcripts) to compare the ME-transcript frequency across the sample groups. Enrichment analysis for genes associated with ME-transcripts and detailed sequence examination for representative epileptic genes were performed to analyze the pattern and mechanism of ME-transcripts on gene function. Results: We observed significantly higher levels of ME-transcripts in the hippocampal tissues of epileptic patients, particularly in TLE-HS. Among ME classes, SINEs were shown to be the most frequent contributor to ME-transcripts followed by LINEs and DNA transposons. These ME sequences almost in all cases represent older MEs normally located in the intron sequences, leading abnormal splicing variants. For protein coding genes, ME sequences were mostly found in the 3-UTR regions, with a significant portion also in the coding sequences (CDS) leading to reading frame disruption. Genes associated with ME-transcripts showed enrichment for involvement in the mRNA splicing process in all sample groups, with bias towards neural and epilepsy-associated genes in the epileptic transcriptomes. Significance: Our data suggest that abnormal splicing involving MEs, leading to loss of function in critical genes, plays a role in epilepsy, particularly in TLE-HS, providing a novel insight on the molecular mechanisms underlying epileptogenesis.

Download Full-text

A de novo Transcriptome Assembly of the European Flounder (Platichthys flesus): The Preselection of Transcripts Encoding Active Forms of Enzymes

Frontiers in Marine Science ◽

10.3389/fmars.2021.618779 ◽

2021 ◽

Vol 8 ◽

Author(s):

Konrad Pomianowski ◽

Artur Burzyński ◽

Ewa Kulczykowska

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Head Kidney ◽

Sequencing Data ◽

Mrna Isoforms ◽

Protein Database ◽

Next Generation Sequencing Technology ◽

Platichthys Flesus ◽

European Flounder ◽

High Level

The RNA sequencing data sets available for different fish species show a potentially high variety of forms of enzymes just in teleosts. This is primarily considered an effect of the first round of whole-genome duplication with mutations in duplicated genes (isozymes) and alternative splicing of mRNA (isoforms). However, the abundance of the mRNA transcript variants is not necessarily reflected in the abundance of active forms of proteins. We have investigated the transcriptional profiles of two enzymes, aralkylamine N-acetyltransferase (AANAT: EC 2.3.1.87) and N-acetylserotonin O-methyltransferase (ASMT: EC 2.1.1.4), in the eyeball, brain, intestines, spleen, heart, liver, head kidney, gonads, and skin of the European flounder (Platichthys flesus). High-throughput next-generation sequencing technology NovaSeq6000 was used to generate 500M sequencing reads. These were then assembled and filtered producing 75k reliable contigs. Gene ontology (GO) terms were assigned to the majority of annotated contigs/unigenes based on the results of PFAM, PANTHER, UniProt, and InterPro protein database searches. BUSCOs statistics for metazoa, vertebrata, and actinopterygii databases showed that the reported transcriptome represents a high level of completeness. In this article, we show how to preselect transcripts encoding the active enzymes (isozymes or isoforms), using AANAT and ASMT in the European flounder as the examples. The data can be used as a tool to design the experiments as well as a basis for discussion of diversity of enzyme forms and their physiological relevance in teleosts.

Download Full-text

Comparative assessment of long-read error-correction software applied to RNA-sequencing data

10.1101/476622 ◽

2018 ◽

Cited By ~ 2

Author(s):

Leandro Lima ◽

Camille Marchet ◽

Ségolène Caboche ◽

Corinne Da Silva ◽

Benjamin Istace ◽

...

Keyword(s):

Error Correction ◽

Rna Sequencing ◽

Gene Families ◽

Error Rates ◽

Open Reading Frames ◽

Sequencing Data ◽

Sequencing Technologies ◽

Isoform Diversity ◽

Long Read ◽

Read Error Correction

AbstractMotivationLong-read sequencing technologies offer promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However these technologies are currently hindered by high error rates in the output data that affect analyses such as the identification of isoforms, exon boundaries, open reading frames, and the creation of gene catalogues. Due to the novelty of such data, computational methods are still actively being developed and options for the error-correction of RNA-sequencing long reads remain limited.ResultsIn this article, we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting cDNA Nanopore reads. We provide an automatic and extensive benchmark tool that not only reports classical error-correction metrics but also the effect of correction on gene families, isoform diversity, bias towards the major isoform, and splice site detection. We find that long read error-correction tools that were originally developed for DNA are also suitable for the correction of RNA-sequencing data, especially in terms of increasing base-pair accuracy. Yet investigators should be warned that the correction process perturbs gene family sizes and isoform diversity. This work provides guidelines on which (or whether) error-correction tools should be used, depending on the application type.Benchmarking softwarehttps://gitlab.com/leoisl/LR_EC_analyser

Download Full-text

Deconvolution of Expression for Nascent RNA Sequencing Data (DENR) Highlights Pre-RNA Isoform Diversity in Human Cells

10.1101/2021.03.16.435537 ◽

2021 ◽

Author(s):

Yixin Zhao ◽

Noah Dukler ◽

Gilad Barshad ◽

Shushan Toneyan ◽

Charles G. Danko ◽

...

Keyword(s):

T Cells ◽

Rna Sequencing ◽

Cell Types ◽

Transcription Unit ◽

Human Cells ◽

Computational Method ◽

Rna Seq ◽

Sequencing Data ◽

Isoform Diversity ◽

Nascent Rna

AbstractQuantification of mature-RNA isoform abundance from RNA-seq data has been extensively studied, but much less attention has been devoted to quantifying the abundance of distinct precursor RNAs based on nascent RNA sequencing data. Here we address this problem with a new computational method called Deconvolution of Expression for Nascent RNA sequencing data (DENR). DENR models the nascent RNA read counts at each locus as a mixture of user-provided isoforms. The performance of the baseline algorithm is enhanced by the use of machine-learning predictions of transcription start sites (TSSs) and an adjustment for the typical “shape profile” of read counts along a transcription unit. We show using simulated data that DENR clearly outperforms simple read-count-based methods for estimating the abundances of both whole genes and isoforms. By applying DENR to previously published PRO-seq data from K562 and CD4+ T cells, we find that transcription of multiple isoforms per gene is widespread, and the dominant isoform frequently makes use of an internal TSS. We also identify > 200 genes whose dominant isoforms make use of different TSSs in these two cell types. Finally, we apply DENR and StringTie to newly generated PRO-seq and RNA-seq data, respectively, for human CD4+ T cells and CD14+ monocytes, and show that entropy at the pre-RNA level makes a disproportionate contribution to overall isoform diversity, especially across cell types. Altogether, DENR is the first computational tool to enable abundance quantification of pre-RNA isoforms based on nascent RNA sequencing data, and it reveals high levels of pre-RNA isoform diversity in human cells.

Download Full-text