destiny – diffusion maps for large-scale single-cell data in R

Mapping Intimacies ◽

10.1101/023309 ◽

2015 ◽

Cited By ~ 6

Author(s):

Philipp Angerer ◽

Laleh Haghverdi ◽

Maren Büttner ◽

Fabian J. Theis ◽

Carsten Marr ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Cellular Reprogramming ◽

Noise Model ◽

Diffusion Maps ◽

Time Resolved ◽

Describing Functions ◽

Link Type ◽

Cell Expression ◽

Cell Data

ABSTRACTSummaryDiffusion maps are a spectral method for non-linear dimension reduction and have recently been adapted for the visualization of single cell expression data. Here we present destiny, an efficient R implementation of the diffusion map algorithm. Our package includes a single-cell specific noise model allowing for missing and censored values. In contrast to previous implementations, we further present an efficient nearest-neighbour approximation that allows for the processing of hundreds of thousands of cells and a functionality for projecting new data on existing diffusion maps. We exemplarily apply destiny to a recent time-resolved mass cytometry dataset of cellular reprogramming.Availability and implementationdestiny is an open-source R/Bioconductor package http://bioconductor.org/packages/ destiny also available at https://www.helmholtz-muenchen.de/icb/destiny. A detailed vignette describing functions and workflows is provided with the [email protected], [email protected]

Download Full-text

destiny: diffusion maps for large-scale single-cell data in R

Bioinformatics ◽

10.1093/bioinformatics/btv715 ◽

2015 ◽

Vol 32 (8) ◽

pp. 1241-1243 ◽

Cited By ~ 225

Author(s):

Philipp Angerer ◽

Laleh Haghverdi ◽

Maren Büttner ◽

Fabian J. Theis ◽

Carsten Marr ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Diffusion Maps ◽

Cell Data

Download Full-text

SPRING: a kinetic interface for visualizing high dimensional single-cell expression data

10.1101/090332 ◽

2016 ◽

Cited By ~ 10

Author(s):

Caleb Weinreb ◽

Samuel Wolock ◽

Allon Klein

Keyword(s):

Gene Expression ◽

Single Cell ◽

Nearest Neighbor ◽

High Dimensional ◽

K Nearest Neighbor ◽

Link Type ◽

Cell Gene Expression ◽

Graph Layouts ◽

Cell Expression ◽

Cell Data

MotivationSingle-cell gene expression profiling technologies can map the cell states in a tissue or organism. As these technologies become more common, there is a need for computational tools to explore the data they produce. In particular, existing data visualization approaches are imperfect for studying continuous gene expression topologies.ResultsForce-directed layouts of k-nearest-neighbor graphs can visualize continuous gene expression topologies in a manner that preserves high-dimensional relationships and allows manually exploration of different stable two-dimensional representations of the same data. We implemented an interactive web-tool to visualize single-cell data using force-directed graph layouts, called SPRING. SPRING reveals more detailed biological relationships than existing approaches when applied to branching gene expression trajectories from hematopoietic progenitor cells. Visualizations from SPRING are also more reproducible than those of stochastic visualization methods such as tSNE, a state-of-the-art tool.Availabilityhttps://kleintools.hms.harvard.edu/tools/spring.html,https://github.com/AllonKleinLab/SPRING/[email protected], [email protected]

Download Full-text

EpiScanpy: integrated single-cell epigenomic analysis

10.1101/648097 ◽

2019 ◽

Cited By ~ 4

Author(s):

Anna Danese ◽

Maria L. Richter ◽

David S. Fischer ◽

Fabian J. Theis ◽

Maria Colomé-Tatché

Keyword(s):

Dna Methylation ◽

Single Cell ◽

Large Scale ◽

Feature Space ◽

Rna Seq ◽

Computational Framework ◽

Learning Techniques ◽

Multiple Feature ◽

The Many ◽

Cell Data

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.

Download Full-text

ESCO: single cell expression simulation incorporating gene co-expression

10.1101/2020.10.20.347211 ◽

2020 ◽

Author(s):

Jinjin Tian ◽

Jiebiao Wang ◽

Kathryn Roeder

Keyword(s):

Single Cell ◽

R Package ◽

Brain Cell ◽

Gene Interactions ◽

Cell Type ◽

Imputation Methods ◽

Biological Interest ◽

A Cell ◽

Cell Expression ◽

Cell Data

AbstractMotivationGene-gene co-expression networks (GCN) are of biological interest for the useful information they provide for understanding gene-gene interactions. The advent of single cell RNA-sequencing allows us to examine more subtle gene co-expression occurring within a cell type. Many imputation and denoising methods have been developed to deal with the technical challenges observed in single cell data; meanwhile, several simulators have been developed for benchmarking and assessing these methods. Most of these simulators, however, either do not incorporate gene co-expression or generate co-expression in an inconvenient manner.ResultsTherefore, with the focus on gene co-expression, we propose a new simulator, ESCO, which adopts the idea of the copula to impose gene co-expression, while preserving the highlights of available simulators, which perform well for simulation of gene expression marginally. Using ESCO, we assess the performance of imputation methods on GCN recovery and find that imputation generally helps GCN recovery when the data are not too sparse, and the ensemble imputation method works best among leading methods. In contrast, imputation fails to help in the presence of an excessive fraction of zero counts, where simple data aggregating methods are a better choice. These findings are further verified with mouse and human brain cell data.AvailabilityThe ESCO implementation is available as R package SplatterESCO (https://github.com/JINJINT/SplatterESCO)[email protected]

Download Full-text

Single Cell Viewer (SCV): An interactive visualization data portal for single cell RNA sequence data

10.1101/664789 ◽

2019 ◽

Cited By ~ 2

Author(s):

Shuoguo Wang ◽

Constance Brett ◽

Mohan Bolisetty ◽

Ryan Golhar ◽

Isaac Neuhaus ◽

...

Keyword(s):

Single Cell ◽

Sequence Data ◽

Single Cells ◽

Link Type ◽

Technological Advances ◽

R Shiny ◽

Data Volume ◽

Exploratory Data ◽

Cell Data ◽

Shiny Application

AbstractMotivationThanks to technological advances made in the last few years, we are now able to study transcriptomes from thousands of single cells. These have been applied widely to study various aspects of Biology. Nevertheless, comprehending and inferring meaningful biological insights from these large datasets is still a challenge. Although tools are being developed to deal with the data complexity and data volume, we do not have yet an effective visualizations and comparative analysis tools to realize the full value of these datasets.ResultsIn order to address this gap, we implemented a single cell data visualization portal called Single Cell Viewer (SCV). SCV is an R shiny application that offers users rich visualization and exploratory data analysis options for single cell datasets.AvailabilitySource code for the application is available online at GitHub (http://www.github.com/neuhausi/single-cell-viewer) and there is a hosted exploration application using the same example dataset as this publication at http://periscopeapps.org/[email protected]; [email protected]

Download Full-text

GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution

10.1101/567115 ◽

2019 ◽

Author(s):

Magdalena E Strauss ◽

Paul DW Kirk ◽

John E Reid ◽

Lorenz Wernisch

Keyword(s):

Single Cell ◽

Time Course ◽

Gene Clusters ◽

Supplementary Information ◽

Clustering Methods ◽

Link Type ◽

Novel Approach ◽

Broad Array ◽

Recent Method ◽

Cell Data

AbstractMotivationMany methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters.ResultsThe proposed method, GPseudoClust, is a novel approach that jointly infers pseudotem-poral ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with nonparametric Bayesian clustering methods, efficient MCMC sampling, and novel subsampling strategies which aid computation. We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings.AvailabilityAn implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/[email protected] informationSupplementary materials are available.

Download Full-text

scDIOR: single cell RNA-seq data IO software

BMC Bioinformatics ◽

10.1186/s12859-021-04528-3 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Huijian Feng ◽

Lihui Lin ◽

Jiekai Chen

Keyword(s):

Single Cell ◽

Programming Languages ◽

Large Scale ◽

Developmental Trajectories ◽

Rapid Development ◽

Data Transformation ◽

Rna Seq ◽

Data Types ◽

User Friendly ◽

Cell Data

Abstract Background Single-cell RNA sequencing is becoming a powerful tool to identify cell states, reconstruct developmental trajectories, and deconvolute spatial expression. The rapid development of computational methods promotes the insight of heterogeneous single-cell data. An increasing number of tools have been provided for biological analysts, of which two programming languages- R and Python are widely used among researchers. R and Python are complementary, as many methods are implemented specifically in R or Python. However, the different platforms immediately caused the data sharing and transformation problem, especially for Scanpy, Seurat, and SingleCellExperiemnt. Currently, there is no efficient and user-friendly software to perform data transformation of single-cell omics between platforms, which makes users spend unbearable time on data Input and Output (IO), significantly reducing the efficiency of data analysis. Results We developed scDIOR for single-cell data transformation between platforms of R and Python based on Hierarchical Data Format Version 5 (HDF5). We have created a data IO ecosystem between three R packages (Seurat, SingleCellExperiment, Monocle) and a Python package (Scanpy). Importantly, scDIOR accommodates a variety of data types across programming languages and platforms in an ultrafast way, including single-cell RNA-seq and spatial resolved transcriptomics data, using only a few codes in IDE or command line interface. For large scale datasets, users can partially load the needed information, e.g., cell annotation without the gene expression matrices. scDIOR connects the analytical tasks of different platforms, which makes it easy to compare the performance of algorithms between them. Conclusions scDIOR contains two modules, dior in R and diopy in Python. scDIOR is a versatile and user-friendly tool that implements single-cell data transformation between R and Python rapidly and stably. The software is freely accessible at https://github.com/JiekaiLab/scDIOR.

Download Full-text

The Specious Art of Single-Cell Genomics

10.1101/2021.08.25.457696 ◽

2021 ◽

Author(s):

Tara Chari ◽

Joeyta Banerjee ◽

Lior Pachter

Keyword(s):

Single Cell ◽

Large Scale ◽

Three Dimensions ◽

Large Scale Data ◽

Biological Discovery ◽

Low Dimensional ◽

Supervised Dimension Reduction ◽

Cell Expression ◽

Biological Patterns ◽

Global And Local

Dimensionality reduction is standard practice for filtering noise and identifying relevant dimensions in large-scale data analyses. In biology, single-cell expression studies almost always begin with reduction to two or three dimensions to produce 'all-in-one' visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative analysis of cell relationships. However, there is little theoretical support for this practice. We examine the theoretical and practical implications of low-dimensional embedding of single-cell data, and find extensive distortions incurred on the global and local properties of biological patterns relative to the high-dimensional, ambient space. In lieu of this, we propose semi-supervised dimension reduction to higher dimension, and show that such targeted reduction guided by the metadata associated with single-cell experiments provides useful latent space representations for hypothesis-driven biological discovery.

Download Full-text

Agile workflow for interactive analysis of mass cytometry data

10.1101/2020.05.28.120527 ◽

2020 ◽

Author(s):

Julia Casado ◽

Oskari Lehtonen ◽

Ville Rantanen ◽

Katja Kaipio ◽

Luca Pasquini ◽

...

Keyword(s):

Single Cell ◽

Peripheral Blood ◽

Large Scale ◽

Immune Cell ◽

Single Cell Analysis ◽

Supplementary Information ◽

Mass Cytometry ◽

Interactive Analysis ◽

Link Type ◽

Cell Subpopulations

AbstractMotivationSingle-cell proteomics technologies, such as mass cytometry, have enabled characterization of cell-to-cell variation and cell populations at a single cell resolution. These large amounts of data, however, require dedicated, interactive tools for translating the data into knowledge.ResultsWe present a comprehensive, interactive method called Cyto to streamline analysis of large-scale cytometry data. Cyto is a workflow-based open-source solution that automatizes the use of of state-of-the-art single-cell analysis methods with interactive visualization. We show the utility of Cyto by applying it to mass cytometry data from peripheral blood and high-grade serous ovarian cancer (HGSOC) samples. Our results show that Cyto is able to reliably capture the immune cell sub-populations from peripheral blood as well as cellular compositions of unique immune- and cancer cell subpopulations in HGSOC tumor and ascites samples.AvailabilityThe method is available as a Docker container at https://hub.docker.com/r/anduril/cyto and the user guide and source code are available at https://bitbucket.org/anduril-dev/[email protected] informationSupplementary material is available and FCS files are hosted at flowrepository.org/id/FR-FCM-Z2LW

Download Full-text

Exploring Single-Cell Data with Deep Multitasking Neural Networks

10.1101/237065 ◽

2017 ◽

Cited By ~ 7

Author(s):

Matthew Amodio ◽

David van Dijk ◽

Krishnan Srinivasan ◽

William S Chen ◽

Hussein Mohsen ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Data Analysis ◽

Experimental Design ◽

Single Cell ◽

Large Scale ◽

Dengue Infection ◽

Data Representation ◽

Data Generation ◽

Cell Data

AbstractBiomedical researchers are generating high-throughput, high-dimensional single-cell data at a staggering rate. As costs of data generation decrease, experimental design is moving towards measurement of many different single-cell samples in the same dataset. These samples can correspond to different patients, conditions, or treatments. While scalability of methods to datasets of these sizes is a challenge on its own, dealing with large-scale experimental design presents a whole new set of problems, including batch effects and sample comparison issues. Currently, there are no computational tools that can both handle large amounts of data in a scalable manner (many cells) and at the same time deal with many samples (many patients or conditions). Moreover, data analysis currently involves the use of different tools that each operate on their own data representation, not guaranteeing a synchronized analysis pipeline. For instance, data visualization methods can be disjoint and mismatched with the clustering method. For this purpose, we present SAUCIE, a deep neural network that leverages the high degree of parallelization and scalability offered by neural networks, as well as the deep representation of data that can be learned by them to perform many single-cell data analysis tasks, all on a unified representation.A well-known limitation of neural networks is their interpretability. Our key contribution here are newly formulated regularizations (penalties) that render features learned in hidden layers of the neural network interpretable. When large multi-patient datasets are fed into SAUCIE, the various hidden layers contain denoised and batch-corrected data, a low dimensional visualization, unsupervised clustering, as well as other information that can be used to explore the data. We show this capability by analyzing a newly generated 180-sample dataset consisting of T cells from dengue patients in India, measured with mass cytometry. We show that SAUCIE, for the first time, can batch correct and process this 11-million cell data to identify cluster-based signatures of acute dengue infection and create a patient manifold, stratifying immune response to dengue on the basis of single-cell measurements.

Download Full-text