scholarly journals Topconfects: a package for confident effect sizes in differential expression analysis provides improved usability ranking genes of interest

2018 ◽  
Author(s):  
Paul F. Harrison ◽  
Andrew D. Pattison ◽  
David R. Powell ◽  
Traude H. Beilharz

AbstractBackgroundA differential gene expression analysis may produce a set of significantly differentially expressed genes that is too large to easily investigate, so that a means of ranking genes by their biological interest level is desirable. The life-sciences have grappled with the abuse of p-values to rank genes for this purpose. As an alternative, a lower confidence bound on the magnitude of Log Fold Change (LFC) could be used to rank genes, but it has been unclear how to reconcile this with the need to perform False Discovery Rate (FDR) correction. The TREAT test of McCarthy and Smyth is a step in this direction, finding genes significantly exceeding a specified LFC threshold. Here we describe the use of test inversion on TREAT to present genes ranked by a confidence bound on the LFC, while still controlling FDR.ResultsTesting the Topconfects R package with simulated gene expression data shows the method outperforming current statistical approaches across a wide range of experiment sizes in the identification of genes with largest LFCs. Applying the method to a TCGA breast cancer data-set shows the method ranks some genes with large LFC higher than would traditional ranking by p-value. Importantly these two ranking methods lead to a different biological emphasis, in terms both of specific highly ranked genes and gene-set enrichment.ConclusionsThe choice of ranking method in differential expression analysis can affect the biological interpretation. The common default of ranking by p-value is implicitly by an effect size in which each gene is standardized to its own variability, rather than comparing genes on a common scale, which may not be appropriate. The Topconfects approach of presenting genes ranked by confident LFC effect size is a variation on the TREAT method with improved usability, removing the need to fine-tune a threshold parameter and removing the temptation to abuse p-values as a de-facto effect size.

F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 2010 ◽  
Author(s):  
Monther Alhamdoosh ◽  
Charity W. Law ◽  
Luyi Tian ◽  
Julie M. Sheridan ◽  
Milica Ng ◽  
...  

Gene set enrichment analysis is a popular approach for prioritising the biological processes perturbed in genomic datasets. The Bioconductor project hosts over 80 software packages capable of gene set analysis. Most of these packages search for enriched signatures amongst differentially regulated genes to reveal higher level biological themes that may be missed when focusing only on evidence from individual genes. With so many different methods on offer, choosing the best algorithm and visualization approach can be challenging. The EGSEA package solves this problem by combining results from up to 12 prominent gene set testing algorithms to obtain a consensus ranking of biologically relevant results.This workflow demonstrates how EGSEA can extend limma-based differential expression analyses for RNA-seq and microarray data using experiments that profile 3 distinct cell populations important for studying the origins of breast cancer. Following data normalization and set-up of an appropriate linear model for differential expression analysis, EGSEA builds gene signature specific indexes that link a wide range of mouse or human gene set collections obtained from MSigDB, GeneSetDB and KEGG to the gene expression data being investigated. EGSEA is then configured and the ensemble enrichment analysis run, returning an object that can be queried using several S4 methods for ranking gene sets and visualizing results via heatmaps, KEGG pathway views, GO graphs, scatter plots and bar plots. Finally, an HTML report that combines these displays can fast-track the sharing of results with collaborators, and thus expedite downstream biological validation. EGSEA is simple to use and can be easily integrated with existing gene expression analysis pipelines for both human and mouse data.


2016 ◽  
Vol 36 (suppl_1) ◽  
Author(s):  
Elisa C Maruko ◽  
Hao Xu ◽  
Sushma Kaul ◽  
Brian J Capaldo ◽  
Nathalie Pamir ◽  
...  

Atherosclerosis is a disease of both lipids and inflammatory immune cells. More specifically, elevated plasma levels of low-density lipoproteins (LDL) leads to migration of circulating monocytes into the artery wall. Lipid loaded monocyte cells subsequently proliferate in the arterial walls becoming macrophage foam cells; a hallmark of atherosclerotic lesions. A proposed mechanism of the protective effects of high-density lipoprotein (HDL) is apolipoprotein A-I (apo A-I) acting as a mediator of cholesterol efflux and subsequent foam cell regression. To better understand the biological changes stimulated by apo A-I treatment, differential expression analysis of microarray data was performed on spleen cells from apo A-I treated mice. LDL receptor null (LDLr -/- ) and LDL receptor and apo A-I null (LDLr -/- , apoA-I -/- ) mice were fed a western diet consisting of 0.2% cholesterol and 42% of calories as fat for 12 weeks. After 6 weeks of diet, a subset of mice for each genotype was subcutaneously injected with 200 micrograms of apo A-I 3 times a week for the remaining 6 weeks. The control group mice were subcutaneously injected with 200 micrograms of saline or BSA. Spleen cell RNA was isolated, purified, and analyzed for differential expression analysis using Illumina BeadArray Microarray Technology Analysis. Individual gene expression analysis for LDLr -/- , apoA-I -/- apo A-I treated mice showed 281 significantly differentially expressed genes compared to BSA treated mice. LDLr -/- A-I treated mice had 1502. Of the significant genes, 189 intersected across both genotypes. LDLr -/- , apoA-I -/- A-I mice showed 73 up-regulated and 116 down-regulated genes. Similarly, LDLr -/- A-I mice had 71 up-regulated and 118 down-regulated. One-directional Gene Set Enrichment Analysis (GSEA) of LDLr -/- , apoA-I -/- A-I mice revealed 49 significant pathways while a total of 63 were found for LDLr -/- . Of these pathways, 21 were up-regulated and 13 were down-regulated in both genotypes. Eight of the top 10 most significant up-regulated pathways in both genotypes were immune cell related. Their functions involve receptor, adhesion, and chemokine signaling. Overall, preliminary analysis suggests A-I treatment induces similar gene expression changes across different genotypes.


2015 ◽  
Vol 9s3 ◽  
pp. BBI.S29470 ◽  
Author(s):  
Mikhail G. Dozmorov ◽  
Nicolas Dominguez ◽  
Krista Bean ◽  
Susan R. Macwana ◽  
Virginia Roberts ◽  
...  

Systemic lupus erythematosus (SLE) is an autoimmune disease characterized by complex interplay among immune cell types. SLE activity is experimentally assessed by several blood tests, including gene expression profiling of heterogeneous populations of cells in peripheral blood. To better understand the contribution of different cell types in SLE pathogenesis, we applied the two methods in cell-type-specific differential expression analysis, csSAM and DSection, to identify cell-type-specific gene expression differences in heterogeneous gene expression measures obtained using RNA-seq technology. We identified B-cell-, monocyte-, and neutrophil-specific gene expression differences. Immunoglobulin-coding gene expression was altered in B-cells, while a ribosomal signature was prominent in monocytes. On the contrary, genes differentially expressed in the heterogeneous mixture of cells did not show any functional enrichment. Our results identify antigen binding and structural constituents of ribosomes as functions altered by B-cell- and monocyte-specific gene expression differences, respectively. Finally, these results position both csSAM and DSection methods as viable techniques for cell-type-specific differential expression analysis, which may help uncover pathogenic, cell-type-specific processes in SLE.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Federico Marini ◽  
Annekathrin Ludt ◽  
Jan Linke ◽  
Konstantin Strauch

Abstract Background The interpretation of results from transcriptome profiling experiments via RNA sequencing (RNA-seq) can be a complex task, where the essential information is distributed among different tabular and list formats—normalized expression values, results from differential expression analysis, and results from functional enrichment analyses. A number of tools and databases are widely used for the purpose of identification of relevant functional patterns, yet often their contextualization within the data and results at hand is not straightforward, especially if these analytic components are not combined together efficiently. Results We developed the software package, which serves as a comprehensive toolkit for streamlining the interpretation of functional enrichment analyses, by fully leveraging the information of expression values in a differential expression context. is implemented in R and Shiny, leveraging packages that enable HTML-based interactive visualizations for executing drilldown tasks seamlessly, viewing the data at a level of increased detail. is integrated with the core classes of existing Bioconductor workflows, and can accept the output of many widely used tools for pathway analysis, making this approach applicable to a wide range of use cases. Users can effectively navigate interlinked components (otherwise available as flat text or spreadsheet tables), bookmark features of interest during the exploration sessions, and obtain at the end a tailored HTML report, thus combining the benefits of both interactivity and reproducibility. Conclusion is distributed as an R package in the Bioconductor project (https://bioconductor.org/packages/GeneTonic/) under the MIT license. Offering both bird’s-eye views of the components of transcriptome data analysis and the detailed inspection of single genes, individual signatures, and their relationships, aims at simplifying the process of interpretation of complex and compelling RNA-seq datasets for many researchers with different expertise profiles.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1444
Author(s):  
Charity W. Law ◽  
Kathleen Zeglinski ◽  
Xueyi Dong ◽  
Monther Alhamdoosh ◽  
Gordon K. Smyth ◽  
...  

Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a limma-style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis.


2017 ◽  
Author(s):  
Charlotte Soneson ◽  
Mark D. Robinson

AbstractBackgroundAs single-cell RNA-seq (scRNA-seq) is becoming increasingly common, the amount of publicly available data grows rapidly, generating a useful resource for computational method development and extension of published results. Although processed data matrices are typically made available in public repositories, the procedure to obtain these varies widely between data sets, which may complicate reuse and cross-data set comparison. Moreover, while many statistical methods for performing differential expression analysis of scRNA-seq data are becoming available, their relative merits and the performance compared to methods developed for bulk RNA-seq data are not sufficiently well understood.ResultsWe present conquer, a collection of consistently processed, analysis-ready public single-cell RNA-seq data sets. Each data set has count and transcripts per million (TPM) estimates for genes and transcripts, as well as quality control and exploratory analysis reports. We use a subset of the data sets available in conquer to perform an extensive evaluation of the performance and characteristics of statistical methods for differential gene expression analysis, evaluating a total of 30 statistical approaches on both experimental and simulated scRNA-seq data.ConclusionsConsiderable differences are found between the methods in terms of the number and characteristics of the genes that are called differentially expressed. Pre-filtering of lowly expressed genes can have important effects on the results, particularly for some of the methods originally developed for analysis of bulk RNA-seq data. Generally, however, methods developed for bulk RNA-seq analysis do not perform notably worse than those developed specifically for scRNA-seq.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S11) ◽  
Author(s):  
Yingying Cao ◽  
Simo Kitanovski ◽  
Daniel Hoffmann

Abstract Background RNA-Seq, the high-throughput sequencing (HT-Seq) of mRNAs, has become an essential tool for characterizing gene expression differences between different cell types and conditions. Gene expression is regulated by several mechanisms, including epigenetically by post-translational histone modifications which can be assessed by ChIP-Seq (Chromatin Immuno-Precipitation Sequencing). As more and more biological samples are analyzed by the combination of ChIP-Seq and RNA-Seq, the integrated analysis of the corresponding data sets becomes, theoretically, a unique option to study gene regulation. However, technically such analyses are still in their infancy. Results Here we introduce intePareto, a computational tool for the integrative analysis of RNA-Seq and ChIP-Seq data. With intePareto we match RNA-Seq and ChIP-Seq data at the level of genes, perform differential expression analysis between biological conditions, and prioritize genes with consistent changes in RNA-Seq and ChIP-Seq data using Pareto optimization. Conclusion intePareto facilitates comprehensive understanding of high dimensional transcriptomic and epigenomic data. Its superiority to a naive differential gene expression analysis with RNA-Seq and available integrative approach is demonstrated by analyzing a public dataset.


Sign in / Sign up

Export Citation Format

Share Document