scholarly journals projectR: An R/Bioconductor package for transfer learning via PCA, NMF, correlation, and clustering

2019 ◽  
Author(s):  
Gaurav Sharma ◽  
Carlo Colantuoni ◽  
Loyal A Goff ◽  
Elana J Fertig ◽  
Genevieve Stein-O’Brien

AbstractMotivationDimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically import to large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically-driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset.ResultsWe developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation, and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis.AvailabilityprojectR is available on Bioconductor and at https://github.com/genesofeve/[email protected]; [email protected]

2020 ◽  
Vol 36 (11) ◽  
pp. 3592-3593 ◽  
Author(s):  
Gaurav Sharma ◽  
Carlo Colantuoni ◽  
Loyal A Goff ◽  
Elana J Fertig ◽  
Genevieve Stein-O’Brien

Abstract Motivation Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically importent in analysis of large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. Results We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis. Availability and implementation projectR is available on Bioconductor and at https://github.com/genesofeve/projectR. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (7) ◽  
pp. 2288-2290 ◽  
Author(s):  
Shian Su ◽  
Luyi Tian ◽  
Xueyi Dong ◽  
Peter F Hickey ◽  
Saskia Freytag ◽  
...  

Abstract Motivation Bioinformatic analysis of single-cell gene expression data is a rapidly evolving field. Hundreds of bespoke methods have been developed in the past few years to deal with various aspects of single-cell analysis and consensus on the most appropriate methods to use under different settings is still emerging. Benchmarking the many methods is therefore of critical importance and since analysis of single-cell data usually involves multi-step pipelines, effective evaluation of pipelines involving different combinations of methods is required. Current benchmarks of single-cell methods are mostly implemented with ad-hoc code that is often difficult to reproduce or extend, and exhaustive manual coding of many combinations is infeasible in most instances. Therefore, new software is needed to manage pipeline benchmarking. Results The CellBench R software facilitates method comparisons in either a task-centric or combinatorial way to allow pipelines of methods to be evaluated in an effective manner. CellBench automatically runs combinations of methods, provides facilities for measuring running time and delivers output in tabular form which is highly compatible with tidyverse R packages for summary and visualization. Our software has enabled comprehensive benchmarking of single-cell RNA-seq normalization, imputation, clustering, trajectory analysis and data integration methods using various performance metrics obtained from data with available ground truth. CellBench is also amenable to benchmarking other bioinformatics analysis tasks. Availability and implementation Available from https://bioconductor.org/packages/CellBench.


2019 ◽  
Author(s):  
Ilya Korsunsky ◽  
Aparna Nathan ◽  
Nghia Millard ◽  
Soumya Raychaudhuri

AbstractSummaryThe related Wilcoxon rank sum test and area under the receiver operator curve are ubiquitous in high dimensional biological data analysis. Current implementations do not scale readily to the increasingly large datasets generated by novel high-throughput technologies, such as single cell RNAseq. We introduce a simple and scalable implementation of both analyses, available through the R package Presto. Presto scales to big datasets, with functions optimized for both dense and sparse matrices. On a sparse dataset of 1 million observations, 10 groups, and 1,000 features, Presto performed both rank-sum and auROC analyses in only 17 seconds, compared to 6.4 hours with base R functions. Presto also includes functions to seamlessly integrate with the Seurat single cell analysis pipeline and the Bioconductor SingleCellExperiment class. Presto enables the use of robust classical analyses on big data with a simple interface and optimized implementation.Availability and ImplementationPresto is available as an R package at https://github.com/immunogenomics/[email protected] InformationVignettes are available with the Presto package.


2020 ◽  
Vol 66 (1) ◽  
pp. 75-84
Author(s):  
Seitaro Nomura

AbstractCells are minimal functional units in biological phenomena, and therefore single-cell analysis is needed to understand the molecular behavior leading to cellular function in organisms. In addition, omics analysis technology can be used to identify essential molecular mechanisms in an unbiased manner. Recently, single-cell genomics has unveiled hidden molecular systems leading to disease pathogenesis in patients. In this review, I summarize the recent advances in single-cell genomics for the understanding of disease pathogenesis and discuss future perspectives.


2018 ◽  
Author(s):  
Thomas D Sherman ◽  
Luciane T Kagohara ◽  
Raymon Cao ◽  
Raymond Cheng ◽  
Matthew Satriano ◽  
...  

AbstractBioinformatics techniques to analyze time course bulk and single cell omics data are advancing. The absence of a known ground truth of the dynamics of molecular changes challenges benchmarking their performance on real data. Realistic simulated time-course datasets are essential to assess the performance of time course bioinformatics algorithms. We develop an R/Bioconductor package, CancerInSilico, to simulate bulk and single cell transcriptional data from a known ground truth obtained from mathematical models of cellular systems. This package contains a general R infrastructure for running cell-based models and simulating gene expression data based on the model states. We show how to use this package to simulate a gene expression data set and consequently benchmark analysis methods on this data set with a known ground truth. The package is freely available via Bioconductor: http://bioconductor.org/packages/CancerInSilico/


Author(s):  
Alexander Lind ◽  
Falastin Salami ◽  
Anne‐Marie Landtblom ◽  
Lars Palm ◽  
Åke Lernmark ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document