scholarly journals A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies

2017 ◽  
Author(s):  
Gabriela A. Merino ◽  
Ana Conesa ◽  
Elmer A. Fernández

ABSTRACTOver the last few years, RNA-seq has been used to study alterations in alternative splicing related to several diseases. Bioinformatics workflows used to perform these studies can be divided into two groups, those finding changes in the absolute isoform expression and those studying differential splicing. Many computational methods for transcriptomics analysis have been developed, evaluated and compared; however, there are not enough reports of systematic and objective assessment of processing pipelines as a whole. Moreover, comparative studies have been performed considering separately the changes in absolute or relative isoform expression levels. Consequently, no consensus exists about the best practices and appropriate workflows to analyse alternative and differential splicing. To assist the adequate pipeline choice, we present here a benchmarking of nine commonly used workflows to detect differential isoform expression and splicing. We evaluated the workflows performance over three different experimental scenarios where changes in absolute and relative isoform expression occurred simultaneously. In addition, the effect of the number of isoforms per gene, and the magnitude of the expression change over pipeline performances were also evaluated. Our results suggest that workflow performance is influenced by the number of replicates per condition and the conditions heterogeneity. In general, workflows based on DESeq, DEXSeq, Limma and NOISeq performed well over a wide range of transcriptomics experiments. In particular, we suggest the use of workflows based on Limma when high precision is required, and DESeq2 and DEXseq pipelines to prioritize sensitivity. When several replicates per condition are available, NOISeq and Limma pipelines are indicated.

2017 ◽  
Author(s):  
María José Nueda ◽  
Jordi Martorell-Marugan ◽  
Cristina Martí ◽  
Sonia Tarazona ◽  
Ana Conesa

AbstractAs sequencing technologies improve their capacity to detect distinct transcripts of the same gene and to address complex experimental designs such as longitudinal studies, there is a need to develop statistical methods for the analysis of isoform expression changes in time series data. Iso-maSigPro is a new functionality of the R package maSigPro for transcriptomics time series data analysis. Iso-maSigPro identifies genes with a differential isoform usage across time. The package also includes new clustering and visualization functions that allow grouping of genes with similar expression patterns at the isoform level, as well as those genes with a shift in major expressed isoform. The package is freely available under the LGPL license from the Bioconductor web site (http://bioconductor.org).


2017 ◽  
Vol 25 (1) ◽  
pp. 4-12 ◽  
Author(s):  
Reem Almugbel ◽  
Ling-Hong Hung ◽  
Jiaming Hu ◽  
Abeer Almutairy ◽  
Nicole Ortogero ◽  
...  

Abstract Objective Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. Materials and methods We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. Results BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Conclusion Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous.


2021 ◽  
Vol 4 (4) ◽  
pp. 68
Author(s):  
Alexandros C. Dimopoulos ◽  
Konstantinos Koukoutegos ◽  
Fotis E. Psomopoulos ◽  
Panagiotis Moulos

RNA sequencing has become the standard technique for high resolution genome-wide monitoring of gene expression. As such, it often comprises the first step towards understanding complex molecular mechanisms driving various phenotypes, spanning organ development to disease genesis, monitoring and progression. An advantage of RNA sequencing is its ability to capture complex transcriptomic events such as alternative splicing which results in alternate isoform abundance. At the same time, this advantage remains algorithmically and computationally challenging, especially with the emergence of even higher resolution technologies such as single-cell RNA sequencing. Although several algorithms have been proposed for the effective detection of differential isoform expression from RNA-Seq data, no widely accepted golden standards have been established. This fact is further compounded by the significant differences in the output of different algorithms when applied on the same data. In addition, many of the proposed algorithms remain scarce and poorly maintained. Driven by these challenges, we developed a novel integrative approach that effectively combines the most widely used algorithms for differential transcript and isoform analysis using state-of-the-art machine learning techniques. We demonstrate its usability by applying it on simulated data based on several organisms, and using several performance metrics; we conclude that our strategy outperforms the application of the individual algorithms. Finally, our approach is implemented as an R Shiny application, with the underlying data analysis pipelines also available as docker containers.


2021 ◽  
Author(s):  
Elisabeth Rebboah ◽  
Fairlie Reese ◽  
Katherine Williams ◽  
Gabriela Balderrama-Gutierrez ◽  
Cassandra McGill ◽  
...  

AbstractAlternative RNA isoforms are defined by promoter choice, alternative splicing, and polyA site selection. Although differential isoform expression is known to play a large regulatory role in eukaryotes, it has proved challenging to study with standard short-read RNA-seq because of the uncertainties it leaves about the full-length structure and precise termini of transcripts. The rise in throughput and quality of long-read sequencing now makes it possible, in principle, to unambiguously identify most transcript isoforms from beginning to end. However, its application to single-cell RNA-seq has been limited by throughput and expense. Here, we develop and characterize long-read Split-seq (LR-Split-seq), which uses a combinatorial barcoding-based method for sequencing single cells and nuclei with long reads. We show that LR-Split-seq can associate isoforms with cell types with relative economy and design flexibility. We characterize LR-Split-seq for whole cells and nuclei by using the well-studied mouse C2C12 system in which mononucleated myoblast cells differentiate and fuse into multinucleated myotubes. We show that the overall results are reproducible when comparing long- and short-read data from the same cell or nucleus. We find substantial evidence of differential isoform expression during differentiation including alternative transcription start site (TSS) usage. We integrate the resulting isoform expression dynamics with snATAC-seq chromatin accessibility to validate TSS-driven isoform choices. LR-Split-seq provides an affordable method for identifying cluster-specific isoforms in single cells that can be further quantified with companion deep short-read scRNA-seq from the same cell populations.


2017 ◽  
Vol 34 (3) ◽  
pp. 524-526 ◽  
Author(s):  
María José Nueda ◽  
Jordi Martorell-Marugan ◽  
Cristina Martí ◽  
Sonia Tarazona ◽  
Ana Conesa

2017 ◽  
Author(s):  
Reem Almugbel ◽  
Ling-Hong Hung ◽  
Jiaming Hu ◽  
Abeer Almutairy ◽  
Nicole Ortogero ◽  
...  

ABSTRACTObjectiveBioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, to allow users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server.Materials and methodsWe present three different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets and process RNA-seq data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need for installation of any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder.ResultsBiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods.ConclusionGiven the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as ubiquitous and necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 311
Author(s):  
Zhenqiu Liu

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Matthew Chung ◽  
Vincent M. Bruno ◽  
David A. Rasko ◽  
Christina A. Cuomo ◽  
José F. Muñoz ◽  
...  

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.


Sign in / Sign up

Export Citation Format

Share Document