pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components

Mapping Intimacies ◽

10.1101/493551 ◽

2018 ◽

Cited By ~ 1

Author(s):

Federico Marini ◽

Harald Binder

Keyword(s):

Principal Components ◽

Principal Component ◽

R Package ◽

Expression Data ◽

Rna Seq ◽

Functional Interpretation ◽

Software Packages ◽

Bioconductor Project ◽

Interactive Data ◽

User Friendly

AbstractBackgroundPrincipal component analysis (PCA) is frequently useentirely written ind in genomics applications for quality assessment and exploratory analysis in high-dimensional data, such as RNA sequencing (RNA-seq) gene expression assays. Despite the availability of many software packages developed for this purpose, an interactive and comprehensive interface for performing these operations is lacking.ResultsWe developed the pcaExplorer software package to enhance commonly performed analysis steps with an interactive and user-friendly application, which provides state saving as well as the automated creation of reproducible reports. pcaExplorer is implemented in R using the Shiny framework and exploits data structures from the open-source Bioconductor project. Users can easily generate a wide variety of publication-ready graphs, while assessing the expression data in the different modules available, including a general overview, dimension reduction on samples and genes, as well as functional interpretation of the principal components.ConclusionpcaExplorer is distributed as an R package in the Bioconductor project (http://bioconductor.org/packages/pcaExplorer/), and is designed to assist a broad range of researchers in the critical step of interactive data exploration.

Download Full-text

CLUSTERING GENE EXPRESSION DATA WITH KERNEL PRINCIPAL COMPONENTS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720005001168 ◽

2005 ◽

Vol 03 (02) ◽

pp. 303-316 ◽

Cited By ~ 6

Author(s):

ZHENQIU LIU ◽

DECHANG CHEN ◽

HALIMA BENSMAIL ◽

YING XU

Keyword(s):

Gene Expression ◽

Principal Component Analysis ◽

Gene Expression Data ◽

Microarray Data ◽

Principal Components ◽

Data Clustering ◽

Principal Component ◽

Kernel Principal Component Analysis ◽

Expression Data ◽

Fuzzy C Means

Kernel principal component analysis (KPCA) has been applied to data clustering and graphic cut in the last couple of years. This paper discusses the application of KPCA to microarray data clustering. A new algorithm based on KPCA and fuzzy C-means is proposed. Experiments with microarray data show that the proposed algorithms is in general superior to traditional algorithms.

Download Full-text

Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components

Cancer Informatics ◽

10.1177/1176935118771082 ◽

2018 ◽

Vol 17 ◽

pp. 117693511877108 ◽

Cited By ~ 4

Author(s):

Min Wang ◽

Steven M Kornblau ◽

Kevin R Coombes

Keyword(s):

Principal Components ◽

Myeloid Leukemia ◽

Principal Component ◽

R Package ◽

Biological Data ◽

Data Sets ◽

Proteomics Data ◽

Data Set ◽

Apoptosis Pathway ◽

Biological Interpretation

Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable.

Download Full-text

MicroScope: ChIP-seq and RNA-seq software analysis suite for gene expression heatmaps

10.1101/034694 ◽

2015 ◽

Cited By ~ 1

Author(s):

Bohdan B. Khomtchouk ◽

James R. Hennessy ◽

Claes Wahlestedt

Keyword(s):

Web Application ◽

Differential Expression Analysis ◽

Dynamic Network ◽

Principal Component ◽

Rna Seq ◽

Software Suite ◽

Software Analysis ◽

Link Type ◽

R Shiny ◽

User Friendly

AbstractWe propose a user-friendly ChIP-seq and RNA-seq software suite for the interactive visualization and analysis of genomic data, including integrated features to support differential expression analysis, interactive heatmap production, principal component analysis, gene ontology analysis, and dynamic network analysis.MicroScope is hosted online as an R Shiny web application based on the D3 JavaScript library: http://microscopebioinformatics.org/. The methods are implemented in R, and are available as part of the MicroScope project at: https://github.com/Bohdan-Khomtchouk/Microscope.

Download Full-text

scCancer: a package for automated processing of single cell RNA-seq data in cancer

10.1101/800490 ◽

2019 ◽

Author(s):

Wenbo Guo ◽

Dongfang Wang ◽

Shicheng Wang ◽

Yiran Shan ◽

Jin Gu

Keyword(s):

Single Cell ◽

Learning Algorithm ◽

R Package ◽

Rna Seq ◽

Cell Level ◽

Quality Control Metrics ◽

Automated Processing ◽

Cellular Phenotypes ◽

User Friendly ◽

Processing Steps

AbstractSummaryMolecular heterogeneities bring great challenges for cancer diagnosis and treatment. Recent advance in single cell RNA-sequencing (scRNA-seq) technology make it possible to study cancer transcriptomic heterogeneities at single cell level. Here, we develop an R package named scCancer which focuses on processing and analyzing scRNA-seq data for cancer research. Except basic data processing steps, this package takes several special considerations for cancer-specific features. Firstly, the package introduced comprehensive quality control metrics. Secondly, it used a data-driven machine learning algorithm to accurately identify major cancer microenvironment cell populations. Thirdly, it estimated a malignancy score to classify malignant (cancerous) and non-malignant cells. Then, it analyzed intra-tumor heterogeneities by key cellular phenotypes (such as cell cycle and stemness) and gene signatures. Finally, a user-friendly graphic report was generated for all the analyses.Availabilityhttp://lifeome.net/software/sccancer/[email protected]

Download Full-text

ideal: an R/Bioconductor package for Interactive Differential Expression Analysis

10.1101/2020.01.10.901652 ◽

2020 ◽

Cited By ~ 4

Author(s):

Federico Marini ◽

Jan Linke ◽

Harald Binder

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Web Application ◽

Differential Expression Analysis ◽

Transcriptome Profiling ◽

Data Interpretation ◽

R Package ◽

Rna Seq ◽

Fully Integrated ◽

Bioconductor Project

AbstractBackgroundRNA sequencing (RNA-seq) is an ever increasingly popular tool for transcriptome profiling. A key point to make the best use of the available data is to provide software tools that are easy to use but still provide flexibility and transparency in the adopted methods. Despite the availability of many packages focused on detecting differential expression, a method to streamline this type of bioinformatics analysis in a comprehensive, accessible, and reproducible way is lacking.ResultsWe developed the ideal software package, which serves as a web application for interactive and reproducible RNA-seq analysis, while producing a wealth of visualizations to facilitate data interpretation. ideal is implemented in R using the Shiny framework, and is fully integrated with the existing core structures of the Bioconductor project. Users can perform the essential steps of the differential expression analysis work-flow in an assisted way, and generate a broad spectrum of publication-ready outputs, including diagnostic and summary visualizations in each module, all the way down to functional analysis. ideal also offers the possibility to seamlessly generate a full HTML report for storing and sharing results together with code for reproducibility.Conclusionideal is distributed as an R package in the Bioconductor project (http://bioconductor.org/packages/ideal/), and provides a solution for performing interactive and reproducible analyses of summarized RNA-seq expression data, empowering researchers with many different profiles (life scientists, clinicians, but also experienced bioinformaticians) to make the ideal use of the data at hand.

Download Full-text

NewWave: a scalable R/Bioconductor package for the dimensionality reduction and batch effect removal of single-cell RNA-seq data

10.1101/2021.08.02.453487 ◽

2021 ◽

Author(s):

Federico Agostinis ◽

Chiara Romualdi ◽

Gabriele Sales ◽

Davide Risso

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

R Package ◽

Batch Effect ◽

Supplementary Information ◽

Bioconductor Package ◽

Rna Seq ◽

Sequencing Data ◽

Bioconductor Project ◽

Single Cell Rna Sequencing

Summary: We present NewWave, a scalable R/Bioconductor package for the dimensionality reduction and batch effect removal of single-cell RNA sequencing data. To achieve scalability, NewWave uses mini-batch optimization and can work with out-of-memory data, enabling users to analyze datasets with millions of cells. Availability and implementation: NewWave is implemented as an open-source R package available through the Bioconductor project at https://bioconductor.org/packages/NewWave/ Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline

F1000Research ◽

10.12688/f1000research.8987.1 ◽

2016 ◽

Vol 5 ◽

pp. 1438 ◽

Cited By ~ 9

Author(s):

Yunshun Chen ◽

Aaron T. L. Lun ◽

Gordon K. Smyth

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Mouse Mammary Gland ◽

Complete Analysis ◽

Rna Seq ◽

R Software ◽

Software Packages ◽

Bioconductor Project ◽

Computational Workflow

In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

Download Full-text

BingleSeq: A user-friendly R package for Bulk and Single-cell RNA-Seq Data Analysis

10.1101/2020.06.16.148239 ◽

2020 ◽

Author(s):

Daniel Dimitrov ◽

Quan Gu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Gene Annotation ◽

Differential Expression Analysis ◽

Transcriptome Profiling ◽

R Package ◽

Rna Seq ◽

The Individual ◽

User Friendly

AbstractRNA sequencing is a high-throughput sequencing technique considered as an indispensable research tool used in a broad range of transcriptome analysis studies. The most common application of RNA Sequencing is Differential Expression analysis and it is used to determine genetic loci with distinct expression across different conditions. On the other hand, an emerging field called single-cell RNA sequencing is used for transcriptome profiling at the individual cell level. The standard protocols for both these types of analyses include the processing of sequencing libraries and result in the generation of count matrices. An obstacle to these analyses and the acquisition of meaningful results is that both require programming expertise.BingleSeq was developed as an intuitive application that provides a user-friendly solution for the analysis of count matrices produced by both Bulk and Single-cell RNA-Seq experiments. This was achieved by building an interactive dashboard-like user interface and incorporating three state-of-the-art software packages for each type of the aforementioned analyses, alongside additional features such as key visualisation techniques, functional gene annotation analysis and rank-based consensus for differential gene analysis results, among others. As a result, BingleSeq puts the best and most widely used packages and tools for RNA-Seq analyses at the fingertips of biologists with no programming experience.

Download Full-text

Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components

10.1101/237883 ◽

2017 ◽

Cited By ~ 2

Author(s):

Min Wang ◽

Steven M. Kornblau ◽

Kevin R. Coombes

Keyword(s):

Principal Components ◽

Myeloid Leukemia ◽

Principal Component ◽

R Package ◽

Biological Data ◽

Data Sets ◽

Proteomics Data ◽

Data Set ◽

Apoptosis Pathway ◽

Biological Interpretation

AbstractPrincipal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises two challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure performs best when considering both accuracy and speed. We applied the method to a proteomics data set from acute myeloid leukemia patients. Proteins in the apoptosis pathway could be explained using six PCs. By clustering the proteins in PC space, we were able to replace the PCs by six “biological components”, three of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable.

Download Full-text

Peer Review #1 of "BingleSeq: a user-friendly R package for bulk and single-cell RNA-Seq data analysis (v0.1)"

10.7287/peerj.10469v0.1/reviews/1 ◽

2020 ◽

Keyword(s):

Data Analysis ◽

Single Cell ◽

Peer Review ◽

R Package ◽

Rna Seq ◽

User Friendly

Download Full-text