SpliceNet: recovering splicing isoform-specific differential gene networks from RNA-Seq data of normal and diseased samples

Hari Krishna Yalamanchili; Zhaoyuan Li; Panwen Wang; Maria P. Wong; Jianfeng Yao; Junwen Wang

doi:10.1093/nar/gku577

SpliceNet: recovering splicing isoform-specific differential gene networks from RNA-Seq data of normal and diseased samples

Nucleic Acids Research ◽

10.1093/nar/gku577 ◽

2014 ◽

Vol 42 (15) ◽

pp. e121-e121 ◽

Cited By ~ 18

Author(s):

Hari Krishna Yalamanchili ◽

Zhaoyuan Li ◽

Panwen Wang ◽

Maria P. Wong ◽

Jianfeng Yao ◽

...

Keyword(s):

Sample Size ◽

Gene Networks ◽

Simulated Data ◽

Exon Array ◽

R Package ◽

Mapk Signaling ◽

Rna Seq ◽

Gene Expressions ◽

Exon Level ◽

Splicing Isoforms

Abstract Conventionally, overall gene expressions from microarrays are used to infer gene networks, but it is challenging to account splicing isoforms. High-throughput RNA Sequencing has made splice variant profiling practical. However, its true merit in quantifying splicing isoforms and isoform-specific exon expressions is not well explored in inferring gene networks. This study demonstrates SpliceNet, a method to infer isoform-specific co-expression networks from exon-level RNA-Seq data, using large dimensional trace. It goes beyond differentially expressed genes and infers splicing isoform network changes between normal and diseased samples. It eases the sample size bottleneck; evaluations on simulated data and lung cancer-specific ERBB2 and MAPK signaling pathways, with varying number of samples, evince the merit in handling high exon to sample size ratio datasets. Inferred network rewiring of well established Bcl-x and EGFR centered networks from lung adenocarcinoma expression data is in good agreement with literature. Gene level evaluations demonstrate a substantial performance of SpliceNet over canonical correlation analysis, a method that is currently applied to exon level RNA-Seq data. SpliceNet can also be applied to exon array data. SpliceNet is distributed as an R package available at http://www.jjwanglab.org/SpliceNet.

Download Full-text

SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data

BioMed Research International ◽

10.1155/2015/780519 ◽

2015 ◽

Vol 2015 ◽

pp. 1-5 ◽

Cited By ~ 2

Author(s):

Yuxiang Tan ◽

Yann Tambouret ◽

Stefano Monti

Keyword(s):

Sample Size ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Performance Metrics ◽

Simulated Data ◽

Real Data ◽

Rna Seq ◽

Sequencing Data ◽

Detection Algorithms ◽

Fusion Detection

The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions’ background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions’ supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated.

Download Full-text

consensusDE: an R package for assessing consensus of multiple RNA-seq algorithms with RUV correction

10.1101/692582 ◽

2019 ◽

Author(s):

Ashley J. Waardenberg ◽

Matt A. Field

Keyword(s):

Differential Expression ◽

Simulated Data ◽

R Package ◽

Bioconductor Package ◽

Rna Seq ◽

User Input ◽

Extensive Evaluation ◽

The Impact ◽

Transcript Database ◽

Multiple Algorithms

AbstractExtensive evaluation of RNA-seq methods have demonstrated that no single algorithm consistently outperforms all others. Removal of unwanted variation (RUV) has also been proposed as a method for stabilizing differential expression (DE) results. Despite this, it remains a challenge to run multiple RNA-seq algorithms to identify significant differences common to multiple algorithms, whilst also integrating and assessing the impact of RUV into all algorithms. consensusDE was developed to automate the process of identifying significant DE by combining the results from multiple algorithms with minimal user input and with the option to automatically integrate RUV. consensusDE only requires a table describing the sample groups, a directory containing BAM files or preprocessed count tables and an optional transcript database for annotation. It supports merging of technical replicates, paired analyses and outputs a compendium of plots to guide the user in subsequent analyses. Herein, we also assess the ability of RUV to improve DE stability when combined with multiple algorithms through application to real and simulated data. We find that, although RUV demonstrated improved FDR in a setting of low replication, the effect was algorithm specific and diminished with increased replication, reinforcing increased replication for recovery of true DE genes. We finish by offering some rules and considerations for the application of RUV in a consensus-based setting.consensusDE is freely available, implemented in R and available as a Bioconductor package, under the GPL-3 license, along with a comprehensive vignette describing functionality: http://bioconductor.org/packages/consensusDE/

Download Full-text

Identifying progressive gene network perturbation from single-cell RNA-seq data

10.1101/297275 ◽

2018 ◽

Author(s):

Sumit Mukherjee ◽

Alberto Carignano ◽

Georg Seelig ◽

Su-In Lee

Keyword(s):

Single Cell ◽

Gene Networks ◽

Regulatory Networks ◽

Simulated Data ◽

Alternative Methods ◽

Rna Seq ◽

Differential Network Analysis ◽

Master Regulators ◽

Differential Network ◽

Gene Regulatory

AbstractIdentifying the gene regulatory networks that control development or disease is one of the most important problems in biology. Here, we introduce a computational approach, called PIPER (ProgressIve network PERturbation), to identify the perturbed genes that drive differences in the gene regulatory network across different points in a biological progression. PIPER employs algorithms tailor-made for single cell RNA sequencing (scRNA-seq) data to jointly identify gene networks for multiple progressive conditions. It then performs differential network analysis along the identified gene networks to identify master regulators. We demonstrate that PIPER outperforms state-of-the-art alternative methods on simulated data and is able to predict known key regulators of differentiation on real scRNA-Seq datasets.

Download Full-text

SPARSim single cell: a count data simulator for scRNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz752 ◽

2019 ◽

Cited By ~ 2

Author(s):

Giacomo Baruzzo ◽

Ilaria Patuzzi ◽

Barbara Di Camillo

Keyword(s):

Single Cell ◽

Count Data ◽

Simulated Data ◽

Real Data ◽

R Package ◽

Supplementary Information ◽

Rna Seq ◽

Distribution Of Zeros ◽

New Methods ◽

Research Fields

Abstract Motivation Single cell RNA-seq (scRNA-seq) count data show many differences compared with bulk RNA-seq count data, making the application of many RNA-seq pre-processing/analysis methods not straightforward or even inappropriate. For this reason, the development of new methods for handling scRNA-seq count data is currently one of the most active research fields in bioinformatics. To help the development of such new methods, the availability of simulated data could play a pivotal role. However, only few scRNA-seq count data simulators are available, often showing poor or not demonstrated similarity with real data. Results In this article we present SPARSim, a scRNA-seq count data simulator based on a Gamma-Multivariate Hypergeometric model. We demonstrate that SPARSim allows to generate count data that resemble real data in terms of count intensity, variability and sparsity, performing comparably or better than one of the most used scRNA-seq simulator, Splat. In particular, SPARSim simulated count matrices well resemble the distribution of zeros across different expression intensities observed in real count data. Availability and implementation SPARSim R package is freely available at http://sysbiobig.dei.unipd.it/? q=SPARSim and at https://gitlab.com/sysbiobig/sparsim. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

flexiMAP: a regression-based method for discovering differential alternative polyadenylation events in standard RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa854 ◽

2020 ◽

Author(s):

Krzysztof J Szkop ◽

David S Moss ◽

Irene Nobeli

Keyword(s):

Simulated Data ◽

Alternative Polyadenylation ◽

Real Data ◽

R Package ◽

Supplementary Information ◽

Beta Regression ◽

Rna Seq ◽

Good Balance ◽

Flexible Modeling ◽

Specificity And Sensitivity

Abstract Motivation We present flexible Modeling of Alternative PolyAdenylation (flexiMAP), a new beta-regression-based method implemented in R, for discovering differential alternative polyadenylation events in standard RNA-seq data. Results We show, using both simulated and real data, that flexiMAP exhibits a good balance between specificity and sensitivity and compares favourably to existing methods, especially at low fold changes. In addition, the tests on simulated data reveal some hitherto unrecognized caveats of existing methods. Importantly, flexiMAP allows modeling of multiple known covariates that often confound the results of RNA-seq data analysis. Availability and implementation The flexiMAP R package is available at: https://github.com/kszkop/flexiMAP. Scripts and data to reproduce the analysis in this paper are available at: https://doi.org/10.5281/zenodo.3689788. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

i2d: an R package for simulating data from images and the implications in biomedical research

Bioinformatics ◽

10.1093/bioinformatics/btaa991 ◽

2020 ◽

Author(s):

Xiaoyu Liang ◽

Ying Hu ◽

Chunhua Yan ◽

Ke Xu

Keyword(s):

Gene Networks ◽

Simulated Data ◽

R Package ◽

Quantitative Information ◽

Human Vision ◽

Supplementary Information ◽

Biological Research ◽

Limited Capacity ◽

Complex Information ◽

Quality Imaging

Abstract Motivation High-quality imaging analyses have been proposed to drive innovation in biomedical and biological research. However, the application of images remains underexploited because of the limited capacity of human vision and the challenges in extracting quantitative information from images. Computationally extracting quantitative information from images is critical to overcoming this limitation. Here, we present a novel R package, i2d, to simulate data from an image based on digital convolution. Results The R package i2d allows users to transform an image into a simulated dataset that can be used to extract and analyze complex information in biomedical and biological research. The package also includes three novel and efficient methods for graph clustering based on simulated data, which can be used to dissect complex gene networks into sub-clusters that have similar biological functions. Availability and implementation The code, the documentation, a tutorial and example data are available on an open source at: github.com/XiaoyuLiang/i2d. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A large-scale comparative study of isoform expressions measured on four platforms

10.21203/rs.2.11536/v2 ◽

2019 ◽

Author(s):

Wei Zhang ◽

Raphael Petegrosso ◽

Jae Woong Chang ◽

Jiao Sun ◽

Jeongsik Yong ◽

...

Keyword(s):

Gene Expression ◽

Comparative Study ◽

Cell Lines ◽

Cancer Cell ◽

Large Scale ◽

Exon Array ◽

Cancer Cell Lines ◽

Rna Seq ◽

Gene Expressions ◽

Isoform Quantification

Abstract Background: Most eukaryotic genes produce different transcripts of multiple isoforms by inclusion or exclusion of particular exons. The isoforms of a gene often play diverse functional roles and thus it is necessary to accurately measure isoform expressions as well as gene expressions. While previous studies have demonstrated the strong agreement between mRNA-sequencing (RNA-seq) and array-based gene and/or isoform quantification platforms (Microarray gene expression and Exon-array), the more recently developed NanoString platform has not been systematically evaluated and compared, especially in large-scale studies across different cancer domains. Results: In this paper, we present a large-scale comparative study among RNA-seq, NanoString, array-based, and RT-qPCR platforms using 46 cancer cell lines across different cancer types. The goal is to understand and evaluate the calibers of the platforms for measuring gene and isoform expressions in cancer studies. We first performed NanoString experiments on 59 cancer cell lines with 403 custom-designed probes for measuring the expressions of 478 isoforms in 155 genes and additional RT-qPCR experiments for a subset of the measured isoforms in 13 cell lines. We then combined the data with the matched RNA-seq, Exon-array and Microarray data of 46 of the 59 cell lines for the comparative analysis. Conclusion: In the comparisons of the platforms for evaluating expressions at both isoform and gene levels, we found that (1) the degree of agreement across platforms on quantifying isoform expressions is lower than gene expressions; (2) NanoString and Exon-array are not consistent on isoform quantification even though both techniques are based on hybridization reactions; (3) RT-qPCR experiments are more consistent with RNA-seq and Exon-array quantification results on isoform-level compared to NanoString; (4) different RNA-seq isoform quantification algorithms showed inconsistent results, and two isoform quantification methods Net-RSTQ and eXpress are more consistent across the platforms in the comparison; and (5) RNA-seq has the best overall consistency with the other platforms on gene expression quantification.

Download Full-text

Data-based RNA-seq Simulations by Binomial Thinning

10.1101/758524 ◽

2019 ◽

Cited By ~ 1

Author(s):

David Gerard

Keyword(s):

Theoretical Model ◽

Single Cell ◽

Differential Expression Analysis ◽

Simulated Data ◽

Real Data ◽

Theoretical Models ◽

Simulation Method ◽

R Package ◽

Rna Seq ◽

Ideal Model

AbstractWith the explosion in the number of methods designed to analyze bulk and single-cell RNA-seq data, there is a growing need for approaches that assess and compare these methods. The usual technique is to compare methods on data simulated according to some theoretical model. However, as real data often exhibit violations from theoretical models, this can result in un-substantiated claims of a method’s performance. Rather than generate data from a theoretical model, in this paper we develop methods to add signal to real RNA-seq datasets. Since the resulting simulated data are not generated from an unrealistic theoretical model, they exhibit realistic (annoying) attributes of real data. This lets RNA-seq methods developers assess their procedures in non-ideal (model-violating) scenarios. Our procedures may be applied to both single-cell and bulk RNA-seq. We show that our simulation method results in more realistic datasets and can alter the conclusions of a differential expression analysis study. We also demonstrate our approach by comparing various factor analysis techniques on RNA-seq datasets. Our tools are available in the seqgendiff R package on the Comprehensive R Archive Net-work: https://cran.r-project.org/package=seqgendiff.

Download Full-text

consensusDE: an R package for assessing consensus of multiple RNA-seq algorithms with RUV correction

PeerJ ◽

10.7717/peerj.8206 ◽

2019 ◽

Vol 7 ◽

pp. e8206 ◽

Cited By ~ 6

Author(s):

Ashley J. Waardenberg ◽

Matthew A. Field

Keyword(s):

Differential Expression ◽

Simulated Data ◽

R Package ◽

Bioconductor Package ◽

Rna Seq ◽

User Input ◽

Extensive Evaluation ◽

The Impact ◽

Transcript Database ◽

Multiple Algorithms

Extensive evaluation of RNA-seq methods have demonstrated that no single algorithm consistently outperforms all others. Removal of unwanted variation (RUV) has also been proposed as a method for stabilizing differential expression (DE) results. Despite this, it remains a challenge to run multiple RNA-seq algorithms to identify significant differences common to multiple algorithms, whilst also integrating and assessing the impact of RUV into all algorithms. consensusDE was developed to automate the process of identifying significant DE by combining the results from multiple algorithms with minimal user input and with the option to automatically integrate RUV. consensusDE only requires a table describing the sample groups, a directory containing BAM files or preprocessed count tables and an optional transcript database for annotation. It supports merging of technical replicates, paired analyses and outputs a compendium of plots to guide the user in subsequent analyses. Herein, we assess the ability of RUV to improve DE stability when combined with multiple algorithms and between algorithms, through application to real and simulated data. We find that, although RUV increased fold change stability between algorithms, it demonstrated improved FDR in a setting of low replication for the intersect, the effect was algorithm specific and diminished with increased replication, reinforcing increased replication for recovery of true DE genes. We finish by offering some rules and considerations for the application of RUV in a consensus-based setting. consensusDE is freely available, implemented in R and available as a Bioconductor package, under the GPL-3 license, along with a comprehensive vignette describing functionality: http://bioconductor.org/packages/consensusDE/.

Download Full-text

Comparative Analysis of the Liver Transcriptome among Cattle Breeds Using RNA-seq

Veterinary Sciences ◽

10.3390/vetsci6020036 ◽

2019 ◽

Vol 6 (2) ◽

pp. 36

Author(s):

Chandra Pareek ◽

Mateusz Sachajko ◽

Jedrzej Jaskowski ◽

Magdalena Herudzinska ◽

Mariusz Skowronski ◽

...

Keyword(s):

Gene Expression ◽

Metabolic Pathways ◽

Gene Networks ◽

Cholesterol Biosynthesis ◽

Bovine Liver ◽

Differentially Expressed ◽

Rna Seq ◽

Gene Expressions ◽

Cattle Breeds ◽

Liver Transcriptome

Global gene expression in liver transcriptome varies among cattle breeds. The present investigation was aimed to identify the differentially expressed genes (DEGs), metabolic gene networks and metabolic pathways in bovine liver transcriptome of young bulls. In this study, we comparatively analyzed the bovine liver transcriptome of dairy (Polish Holstein Friesian (HF); n = 6), beef (Hereford; n = 6), and dual purpose (Polish-Red; n = 6) cattle breeds. This study identified 895, 338, and 571 significant (p < 0.01) differentially expressed (DE) gene-transcripts represented as 745, 265, and 498 hepatic DE genes through the Polish-Red versus Hereford, Polish-HF versus Hereford, and Polish-HF versus Polish-Red breeds comparisons, respectively. By combining all breeds comparisons, 75 hepatic DE genes (p < 0.01) were identified as commonly shared among all the three breed comparisons; 70, 160, and 38 hepatic DE genes were commonly shared between the following comparisons: (i) Polish-Red versus Hereford and Polish-HF versus Hereford; (ii) Polish-Red versus Hereford and Polish-HF versus Polish-Red; and (iii) Polish-HF versus Hereford and Polish-HF versus Polish-Red, respectively. A total of 440, 82, and 225 hepatic DE genes were uniquely observed for the Polish-Red versus Hereford, Polish-HF versus Hereford, and Polish-Red versus Polish-HF comparisons, respectively. Gene ontology (GO) analysis identified top-ranked enriched GO terms (p < 0.01) including 17, 16, and 31 functional groups and 151, 61, and 140 gene functions that were DE in all three breed liver transcriptome comparisons. Gene network analysis identified several potential metabolic pathways involved in glutamine family amino-acid, triglyceride synthesis, gluconeogenesis, p38MAPK cascade regulation, cholesterol biosynthesis (Polish-Red versus Hereford); IGF-receptor signaling, catecholamine transport, lipoprotein lipase, tyrosine kinase binding receptor (Polish-HF versus Hereford), and PGF-receptor binding, (Polish-HF versus Polish-Red). Validation results showed that the relative expression values were consistent to those obtained by RNA-seq, and significantly correlated between the quantitative reverse transcription PCR (RT-qPCR) and RNA-seq (Pearson’s r > 0.90). Our results provide new insights on bovine liver gene expressions among dairy versus dual versus beef breeds by identifying the large numbers of DEGs markers submitted to NCBI gene expression omnibus (GEO) accession number GSE114233, which can serve as useful genetic tools to develop the gene assays for trait-associated studies as well as, to effectively implement in genomics selection (GS) cattle breeding programs in Poland.

Download Full-text