scholarly journals Fast and interpretable alternative splicing and differential gene-level expression analysis using transcriptome segmentation with Yanagi

2018 ◽  
Author(s):  
Mohamed K Gunady ◽  
Stephen M Mount ◽  
Héctor Corrada Bravo

AbstractIntroduction:Analysis of differential alternative splicing from RNA-seq data is complicated by the fact that many RNA-seq reads map to multiple transcripts, besides, the annotated transcripts are often a small subset of the possible transcripts of a gene. Here we describe Yanagi, a tool for segmenting transcriptome to create a library of maximal L-disjoint segments from a complete transcriptome annotation. That segment library preserves all transcriptome substrings of length L and transcripts structural relationships while eliminating unnecessary sequence duplications.Contributions:In this paper, we formalize the concept of transcriptome segmentation and propose an efficient algorithm for generating segment libraries based on a length parameter dependent on specific RNA-Seq library construction. The resulting segment sequences can be used with pseudo-alignment tools to quantify expression at the segment level. We characterize the segment libraries for the reference transcriptomes of Drosophila melanogaster and Homo sapiens and provide gene-level visualization of the segments for better interpretability. Then we demonstrate the use of segments-level quantification into gene expression and alternative splicing analysis. The notion of transcript segmentation as introduced here and implemented in Yanagi opens the door for the application of lightweight, ultra-fast pseudo-alignment algorithms in a wide variety of RNA-seq analyses.Conclusion:Using segment library rather than the standard transcriptome succeeds in significantly reducing ambigious alignments where reads are multimapped to several sequences in the reference. That allowed avoiding the quantification step required by standard kmer-based pipelines for gene expression analysis. Moreover, using segment counts as statistics for alternative splicing analysis enables achieving comparable performance to counting-based approaches (e.g. rMATS) while rather using fast and lighthweight pseudo alignment.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Bin Liu ◽  
Shuo Zhao ◽  
Pengli Li ◽  
Yilu Yin ◽  
Qingliang Niu ◽  
...  

AbstractIn plants, alternative splicing (AS) is markedly induced in response to environmental stresses, but it is unclear why plants generate multiple transcripts under stress conditions. In this study, RNA-seq was performed to identify AS events in cucumber seedlings grown under different light intensities. We identified a novel transcript of the gibberellin (GA)-deactivating enzyme Gibberellin 2-beta-dioxygenase 8 (CsGA2ox8). Compared with canonical CsGA2ox8.1, the CsGA2ox8.2 isoform presented intron retention between the second and third exons. Functional analysis proved that the transcript of CsGA2ox8.1 but not CsGA2ox8.2 played a role in the deactivation of bioactive GAs. Moreover, expression analysis demonstrated that both transcripts were upregulated by increased light intensity, but the expression level of CsGA2ox8.1 increased slowly when the light intensity was >400 µmol·m−2·s−1 PPFD (photosynthetic photon flux density), while the CsGA2ox8.2 transcript levels increased rapidly when the light intensity was >200 µmol·m−2·s−1 PPFD. Our findings provide evidence that plants might finely tune their GA levels by buffering against the normal transcripts of CsGA2ox8 through AS.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Clemens Falker-Gieske ◽  
Andrea Mott ◽  
Sören Franzenburg ◽  
Jens Tetens

Abstract Background Retinol (RO) and its active metabolite retinoic acid (RA) are major regulators of gene expression in vertebrates and influence various processes like organ development, cell differentiation, and immune response. To characterize a general transcriptomic response to RA-exposure in vertebrates, independent of species- and tissue-specific effects, four publicly available RNA-Seq datasets from Homo sapiens, Mus musculus, and Xenopus laevis were analyzed. To increase species and cell-type diversity we generated RNA-seq data with chicken hepatocellular carcinoma (LMH) cells. Additionally, we compared the response of LMH cells to RA and RO at different time points. Results By conducting a transcriptome meta-analysis, we identified three retinoic acid response core clusters (RARCCs) consisting of 27 interacting proteins, seven of which have not been associated with retinoids yet. Comparison of the transcriptional response of LMH cells to RO and RA exposure at different time points led to the identification of non-coding RNAs (ncRNAs) that are only differentially expressed (DE) during the early response. Conclusions We propose that these RARCCs stand on top of a common regulatory RA hierarchy among vertebrates. Based on the protein sets included in these clusters we were able to identify an RA-response cluster, a control center type cluster, and a cluster that directs cell proliferation. Concerning the comparison of the cellular response to RA and RO we conclude that ncRNAs play an underestimated role in retinoid-mediated gene regulation.


Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1487
Author(s):  
Marie Lataretu ◽  
Martin Hölzer

RNA-Seq enables the identification and quantification of RNA molecules, often with the aim of detecting differentially expressed genes (DEGs). Although RNA-Seq evolved into a standard technique, there is no universal gold standard for these data’s computational analysis. On top of that, previous studies proved the irreproducibility of RNA-Seq studies. Here, we present a portable, scalable, and parallelizable Nextflow RNA-Seq pipeline to detect DEGs, which assures a high level of reproducibility. The pipeline automatically takes care of common pitfalls, such as ribosomal RNA removal and low abundance gene filtering. Apart from various visualizations for the DEG results, we incorporated downstream pathway analysis for common species as Homo sapiens and Mus musculus. We evaluated the DEG detection functionality while using qRT-PCR data serving as a reference and observed a very high correlation of the logarithmized gene expression fold changes.


2012 ◽  
Vol 30 (30_suppl) ◽  
pp. 56-56
Author(s):  
Byung-In Lee ◽  
Kahuku Oades ◽  
Lien Vo ◽  
Jerry Lee ◽  
Mark Landers ◽  
...  

56 Background: Gene expression profiling has been shown to be effective in analyzing postoperative tumor samples in various cancers. However, in analyzing small specimens such as core biopsies, the limited amount of available material makes multi-gene analyses difficult or impossible. Microarray-based analyses also provide limited dynamic range. We describe the development of targeted RNA-sequencing methodology which combines the power of a universal RNA amplification with NGS for an ultra-deep expression analysis of multiple target genes, enabling <100 ng of sample input for multi-gene analysis in a single tube format. Methods: The gene expression patterns of triple-negative breast cancer FFPE samples were analyzed using a 96-gene breast cancer biomarker panel across three different platforms: Affymetrix Human Gene ST 1.0 microarrays, a pre-developed OncoScore qRT-PCR panel, and targeted RNA-seq. For targeted RNA-seq analysis, the 96-gene panel was amplified using a universal, single-tube “XP-PCR” amplification strategy followed by sequence analysis using the Ion-Torrent Personal Genome Machine. Results: Targeted RNA-seq provided the most sensitivity in terms of detection rates with <100 ng FFPE RNA input and provides unlimited dynamic range with increased sequencing depth. Expression ratio compression issues typically associated with a high number of pre-amplification cycles in standard multiplex-primed methods were not observed here. Low expressing genes, undetectable by qRT-PCR analysis from 1,000 ng input FFPE RNA, were detected and eligible for expression analysis with a significant number of sequencing reads. Alternative transcription/splicing analysis is also possible from sequence analysis of the target transcripts using targeted RNA-seq. Conclusions: By combining universally primed pre-amplification and NGS in multi-gene expression analysis, targeted RNA-seq provides the most sensitive gene expression analysis methodology.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 4500-4500
Author(s):  
Mariateresa Fulciniti ◽  
Michael A Lopez ◽  
Anil Aktas Samur ◽  
Eugenio Morelli ◽  
Hervé Avet-Loiseau ◽  
...  

Abstract Gene expression profile has provided interesting insights into the disease biology, helped develop new risk stratification, and identify novel druggable targets in multiple myeloma (MM). However, there is significant impact of alternative pre-mRNA splicing (AS) as one of the key transcriptome modifier. These spliced variants increases the transcriptomic complexity and its misregulation affect disease behavior impacting therapeutic consideration in various disease processes including cancer. Our large well annotated deep RNA sequencing data from purified MM cells data from 420 newly-diagnosed patients treated homogeneously have identified 1534 genes with one or more splicing events observed in at least 10% or more patients. Median alternative splicing event per patient was 595 (range 223 - 2735). These observed global alternative splicing events in MM involves aberrant splicing of critical growth and survival genes affects the disease biology as well as overall survival. Moreover, the decrease of cell viability observed in a large panel of MM cell lines after inhibition of splicing at the pre-mRNA complex and stalling at the A complex confirmed that MM cells are exquisitely sensitive to pharmacological inhibition of splicing. Based on these data, we further focused on understanding the molecular mechanisms driving aberrant alternative splicing in MM. An increasing body of evidence indicates that altered expression of regulatory splicing factors (SF) can have oncogenic properties by impacting AS of cancer-associated genes. We used our large RNA-seq dataset to create a genome wide global alterations map of SF and identified several splicing factors significantly dysregulated in MM compared to normal plasma cells with impact on clinical outcome. The splicing factor Serine and Arginine Rich Splicing Factor 1 (SRSF1), regulating initiation of spliceosome assembly, was selected for further evaluation, as its impact on clinical outcome was confirmed in two additional independent myeloma datasets. In gain-of (GOF) studies enforced expression of SRSF1 in MM cells significantly increased proliferation, especially in the presence of bone marrow stromal cells; and conversely, in loss-of function (LOF) studies, downregulation of SRSF1, using stable or doxy-inducible shRNA systems significantly inhibited MM cell proliferation and survival over time. We utilized SRSF1 mutants to dissect the mechanisms involved in the SRSF1-mediated MM growth induction, and observed that the growth promoting effect of SRSF1 in MM cells was mainly due to its splicing activity. We next investigated the impact of SRSF1 on allelic isoforms of specific gene targets by RNA-seq in LOF and confirmed in GOF studies. Splicing profiles showed widespread changes in AS induced by SRSF1 knock down. The most recurrent splicing events were skipped exon (SE) and alternative first (AF) exon splicing as compared to control cells. SE splice events were primarily upregulated and AF splice events were evenly upregulated and downregulated. Genes in which splicing events in these categories occurred mostly did not show significant difference in overall gene expression level when compared to control, following SRSF1 depletion. When analyzing cellular functions of SRSF1-regulated splicing events, we found that SRSF1 knock down affects genes in the RNA processing pathway as well as genes involved in cancer-related functions such as mTOR and MYC-related pathways. Splicing analysis was corroborated with immunoprecipitation (IP) followed by mass spectrometry (MS) analysis of T7-tagged SRSF1 MM cells. We have observed increased levels of SRSF phosphorylation, which regulates it's subcellular localization and activity, in MM cell lines and primary patient MM cells compared to normal donor PBMCs. Moreover, we evaluated the chemical compound TG003, an inhibitor of Cdc2-like kinase (CLK) 1 and 4 that regulate splicing by fine-tuning the phosphorylation of SR proteins. Treatment with TG003 decreased SRSF1 phosphorylation preventing the spliceosome assembly and inducing a dose dependent inhibition of MM cell viability. In conclusions, here we provide mechanistic insights into myeloma-related splicing dysregulation and establish SRSF1 as a tumor promoting gene with therapeutic potential. Disclosures Avet-Loiseau: Janssen: Consultancy, Membership on an entity's Board of Directors or advisory committees; Celgene: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding; Sanofi: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding; Abbvie: Membership on an entity's Board of Directors or advisory committees; Amgen: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding; Takeda: Membership on an entity's Board of Directors or advisory committees, Research Funding. Munshi:OncoPep: Other: Board of director.


2019 ◽  
Author(s):  
Zou Yutong ◽  
Bui Thuy Tien ◽  
Kumar Selvarajoo

AbstractHere we report a bio-statistical/informatics tool, ABioTrans, developed in R for gene expression analysis. The tool allows the user to directly read RNA-Seq data files deposited in the Gene Expression Omnibus or GEO database. Operated using any web browser application, ABioTrans provides easy options for multiple statistical distribution fitting, Pearson and Spearman rank correlations, PCA, k-means and hierarchical clustering, differential expression analysis, Shannon entropy and noise (square of coefficient of variation) analyses, as well as Gene ontology classifications.Availability and implementationABioTrans is available at https://github.com/buithuytien/ABioTransOperating system(s): Platform independent (web browser)Programming language: R (R studio)Other requirements: Bioconductor genome wide annotation databases, R-packages (shiny, LSD, fitdistrplus, actuar, entropy, moments, RUVSeq, edgeR, DESeq2, NOISeq, AnnotationDbi, ComplexHeatmap, circlize, clusterProfiler, reshape2, DT, plotly, shinycssloaders, dplyr, ggplot2). These packages will automatically be installed when the ABioTrans.R is executed in R studio.No restriction of usage for non-academic.


Sign in / Sign up

Export Citation Format

Share Document