projectR: An R/Bioconductor package for transfer learning via PCA, NMF, correlation, and clustering

Mapping Intimacies ◽

10.1101/726547 ◽

2019 ◽

Cited By ~ 2

Author(s):

Gaurav Sharma ◽

Carlo Colantuoni ◽

Loyal A Goff ◽

Elana J Fertig ◽

Genevieve Stein-O’Brien

Keyword(s):

Single Cell ◽

Transfer Learning ◽

Single Cell Analysis ◽

Ground Truth ◽

Biological Data ◽

Bioconductor Package ◽

Reduction Techniques ◽

Biological Phenomena ◽

Feature Discovery ◽

Dimension Reduction Techniques

AbstractMotivationDimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically import to large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically-driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset.ResultsWe developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation, and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis.AvailabilityprojectR is available on Bioconductor and at https://github.com/genesofeve/[email protected]; [email protected]

Download Full-text

projectR: an R/Bioconductor package for transfer learning via PCA, NMF, correlation and clustering

Bioinformatics ◽

10.1093/bioinformatics/btaa183 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3592-3593 ◽

Cited By ~ 2

Author(s):

Gaurav Sharma ◽

Carlo Colantuoni ◽

Loyal A Goff ◽

Elana J Fertig ◽

Genevieve Stein-O’Brien

Keyword(s):

Single Cell ◽

Transfer Learning ◽

Single Cell Analysis ◽

Ground Truth ◽

Biological Data ◽

Supplementary Information ◽

Bioconductor Package ◽

Reduction Techniques ◽

Biological Phenomena ◽

Feature Discovery

Abstract Motivation Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically importent in analysis of large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. Results We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis. Availability and implementation projectR is available on Bioconductor and at https://github.com/genesofeve/projectR. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CellBench: R/Bioconductor software for comparing single-cell RNA-seq analysis methods

Bioinformatics ◽

10.1093/bioinformatics/btz889 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2288-2290 ◽

Cited By ~ 3

Author(s):

Shian Su ◽

Luyi Tian ◽

Xueyi Dong ◽

Peter F Hickey ◽

Saskia Freytag ◽

...

Keyword(s):

Single Cell ◽

Ad Hoc ◽

Performance Metrics ◽

Single Cell Analysis ◽

Ground Truth ◽

Bioinformatic Analysis ◽

Rna Seq ◽

Effective Manner ◽

Cell Gene Expression ◽

The Many

Abstract Motivation Bioinformatic analysis of single-cell gene expression data is a rapidly evolving field. Hundreds of bespoke methods have been developed in the past few years to deal with various aspects of single-cell analysis and consensus on the most appropriate methods to use under different settings is still emerging. Benchmarking the many methods is therefore of critical importance and since analysis of single-cell data usually involves multi-step pipelines, effective evaluation of pipelines involving different combinations of methods is required. Current benchmarks of single-cell methods are mostly implemented with ad-hoc code that is often difficult to reproduce or extend, and exhaustive manual coding of many combinations is infeasible in most instances. Therefore, new software is needed to manage pipeline benchmarking. Results The CellBench R software facilitates method comparisons in either a task-centric or combinatorial way to allow pipelines of methods to be evaluated in an effective manner. CellBench automatically runs combinations of methods, provides facilities for measuring running time and delivers output in tabular form which is highly compatible with tidyverse R packages for summary and visualization. Our software has enabled comprehensive benchmarking of single-cell RNA-seq normalization, imputation, clustering, trajectory analysis and data integration methods using various performance metrics obtained from data with available ground truth. CellBench is also amenable to benchmarking other bioinformatics analysis tasks. Availability and implementation Available from https://bioconductor.org/packages/CellBench.

Download Full-text

Presto scales Wilcoxon and auROC analyses to millions of observations

10.1101/653253 ◽

2019 ◽

Cited By ~ 6

Author(s):

Ilya Korsunsky ◽

Aparna Nathan ◽

Nghia Millard ◽

Soumya Raychaudhuri

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Sparse Matrices ◽

R Package ◽

Biological Data ◽

Supplementary Information ◽

Wilcoxon Rank Sum Test ◽

Biological Data Analysis ◽

Simple Interface ◽

Operator Curve

AbstractSummaryThe related Wilcoxon rank sum test and area under the receiver operator curve are ubiquitous in high dimensional biological data analysis. Current implementations do not scale readily to the increasingly large datasets generated by novel high-throughput technologies, such as single cell RNAseq. We introduce a simple and scalable implementation of both analyses, available through the R package Presto. Presto scales to big datasets, with functions optimized for both dense and sparse matrices. On a sparse dataset of 1 million observations, 10 groups, and 1,000 features, Presto performed both rank-sum and auROC analyses in only 17 seconds, compared to 6.4 hours with base R functions. Presto also includes functions to seamlessly integrate with the Seurat single cell analysis pipeline and the Bioconductor SingleCellExperiment class. Presto enables the use of robust classical analyses on big data with a simple interface and optimized implementation.Availability and ImplementationPresto is available as an R package at https://github.com/immunogenomics/[email protected] InformationVignettes are available with the Presto package.

Download Full-text

Single-cell genomics to understand disease pathogenesis

Journal of Human Genetics ◽

10.1038/s10038-020-00844-3 ◽

2020 ◽

Vol 66 (1) ◽

pp. 75-84

Author(s):

Seitaro Nomura

Keyword(s):

Single Cell ◽

Molecular Mechanisms ◽

Single Cell Analysis ◽

Cell Analysis ◽

Disease Pathogenesis ◽

Single Cell Genomics ◽

Molecular Systems ◽

Biological Phenomena ◽

Molecular Behavior ◽

Unbiased Manner

AbstractCells are minimal functional units in biological phenomena, and therefore single-cell analysis is needed to understand the molecular behavior leading to cellular function in organisms. In addition, omics analysis technology can be used to identify essential molecular mechanisms in an unbiased manner. Recently, single-cell genomics has unveiled hidden molecular systems leading to disease pathogenesis in patients. In this review, I summarize the recent advances in single-cell genomics for the understanding of disease pathogenesis and discuss future perspectives.

Download Full-text

CancerInSilico: An R/Bioconductor package for combining mathematical and statistical modeling to simulate time course bulk and single cell gene expression data in cancer

10.1101/328807 ◽

2018 ◽

Author(s):

Thomas D Sherman ◽

Luciane T Kagohara ◽

Raymon Cao ◽

Raymond Cheng ◽

Matthew Satriano ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Time Course ◽

Ground Truth ◽

Real Data ◽

Cellular Systems ◽

Expression Data ◽

Bioconductor Package ◽

Data Set

AbstractBioinformatics techniques to analyze time course bulk and single cell omics data are advancing. The absence of a known ground truth of the dynamics of molecular changes challenges benchmarking their performance on real data. Realistic simulated time-course datasets are essential to assess the performance of time course bioinformatics algorithms. We develop an R/Bioconductor package, CancerInSilico, to simulate bulk and single cell transcriptional data from a known ground truth obtained from mathematical models of cellular systems. This package contains a general R infrastructure for running cell-based models and simulating gene expression data based on the model states. We show how to use this package to simulate a gene expression data set and consequently benchmark analysis methods on this data set with a known ground truth. The package is freely available via Bioconductor: http://bioconductor.org/packages/CancerInSilico/

Download Full-text