MOFA+: a probabilistic framework for comprehensive integration of structured single-cell data

Mapping Intimacies ◽

10.1101/837104 ◽

2019 ◽

Cited By ~ 8

Author(s):

Ricard Argelaguet ◽

Damien Arnol ◽

Danila Bredikhin ◽

Yonatan Deloro ◽

Britta Velten ◽

...

Keyword(s):

Factor Analysis ◽

Single Cell ◽

Cell Fate ◽

Joint Analysis ◽

Experimental Conditions ◽

Joint Modelling ◽

Stochastic Variational Inference ◽

Technological Advances ◽

Low Dimensional ◽

Cell Data

AbstractTechnological advances have enabled the joint analysis of multiple molecular layers at single cell resolution. At the same time, increased experimental throughput has facilitated the study of larger numbers of experimental conditions. While methods for analysing single-cell data that model the resulting structure of either of these dimensions are beginning to emerge, current methods do not account for complex experimental designs that include both multiple views (modalities or assays) and groups (conditions or experiments). Here we present Multi-Omics Factor Analysis v2 (MOFA+), a statistical framework for the comprehensive and scalable integration of structured single cell multi-modal data. MOFA+ builds upon a Bayesian Factor Analysis framework combined with fast GPU-accelerated stochastic variational inference. Similar to existing factor models, MOFA+ allows for interpreting variation in single-cell datasets by pooling information across cells and features to reconstruct a low-dimensional representation of the data. Uniquely, the model supports flexible group-level sparsity constraints that allow joint modelling of variation across multiple groups and views.To illustrate MOFA+, we applied it to single-cell data sets of different scales and designs, demonstrating practical advantages when analyzing datasets with complex group and/or view structure. In a multi-omics analysis of mouse gastrulation this joint modelling reveals coordinated changes between gene expression and epigenetic variation associated with cell fate commitment.

Download Full-text

SCIM: Universal Single-Cell Matching with Unpaired Feature Sets

10.1101/2020.06.11.146845 ◽

2020 ◽

Author(s):

Stefan G. Stark ◽

Joanna Ficek ◽

Francesco Locatello ◽

Ximena Bonilla ◽

Stéphane Chevrier ◽

...

Keyword(s):

Single Cell ◽

Underlying Structure ◽

Scalable Algorithms ◽

Melanoma Tumor ◽

Bipartite Matching ◽

Technological Advances ◽

Latent Space ◽

Latent Representations ◽

Low Dimensional ◽

Cell Data

AbstractMotivationRecent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed.ResultsWe propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an auto-encoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 93% and 84% cell-matching accuracy for each one of the samples respectively.Availabilityhttps://github.com/ratschlab/scim

Download Full-text

SCIM: universal single-cell matching with unpaired feature sets

Bioinformatics ◽

10.1093/bioinformatics/btaa843 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i919-i927

Author(s):

Stefan G Stark ◽

Joanna Ficek ◽

Francesco Locatello ◽

Ximena Bonilla ◽

Stéphane Chevrier ◽

...

Keyword(s):

Single Cell ◽

Supplementary Information ◽

Underlying Structure ◽

Scalable Algorithms ◽

Melanoma Tumor ◽

Bipartite Matching ◽

Technological Advances ◽

Latent Representations ◽

Low Dimensional ◽

Cell Data

Abstract Motivation Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed. Results We propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an autoencoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 90% and 78% cell-matching accuracy for each one of the samples, respectively. Availability and implementation https://github.com/ratschlab/scim. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Mechanistic models of cell-fate transitions from single-cell data

Current Opinion in Systems Biology ◽

10.1016/j.coisb.2021.04.004 ◽

2021 ◽

Author(s):

Gabriel Torregrosa ◽

Jordi Garcia-Ojalvo

Keyword(s):

Single Cell ◽

Cell Fate ◽

Mechanistic Models ◽

Cell Data

Download Full-text

Mechanistic models of blood cell fate decisions in the era of single-cell data

Current Opinion in Systems Biology ◽

10.1016/j.coisb.2021.100355 ◽

2021 ◽

pp. 100355

Author(s):

Ingmar Glauche ◽

Carsten Marr

Keyword(s):

Blood Cell ◽

Single Cell ◽

Cell Fate ◽

Mechanistic Models ◽

Cell Fate Decisions ◽

Cell Data

Download Full-text

Discovering a sparse set of pairwise discriminating features in high-dimensional data

Bioinformatics ◽

10.1093/bioinformatics/btaa690 ◽

2020 ◽

Author(s):

Samuel Melton ◽

Sharad Ramanathan

Keyword(s):

Single Cell ◽

Dimensional Space ◽

Cell Types ◽

Dimensional Subspace ◽

Supplementary Information ◽

High Dimensional ◽

Technological Advances ◽

Data Points ◽

Low Dimensional ◽

Sparse Set

Abstract Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Bayesian nonparametric semi-supervised model for integration of multiple single-cell experiments

10.1101/2020.01.14.906313 ◽

2020 ◽

Author(s):

Archit Verma ◽

Barbara Engelhardt

Keyword(s):

Single Cell ◽

Latent Variable ◽

Environmental Variability ◽

Simulated Data ◽

Joint Analysis ◽

Variable Model ◽

Manifold Alignment ◽

Multiple Data Sets ◽

Sequencing Platforms ◽

Low Dimensional

Joint analysis of multiple single cell RNA-sequencing (scRNA-seq) data is confounded by technical batch effects across experiments, biological or environmental variability across cells, and different capture processes across sequencing platforms. Manifold alignment is a principled, effective tool for integrating multiple data sets and controlling for confounding factors. We demonstrate that the semi-supervised t-distributed Gaussian process latent variable model (sstGPLVM), which projects the data onto a mixture of fixed and latent dimensions, can learn a unified low-dimensional embedding for multiple single cell experiments with minimal assumptions. We show the efficacy of the model as compared with state-of-the-art methods for single cell data integration on simulated data, pancreas cells from four sequencing technologies, induced pluripotent stem cells from male and female donors, and mouse brain cells from both spatial seqFISH+ and traditional scRNA-seq.Code and data is available at https://github.com/architverma1/sc-manifold-alignment

Download Full-text

Single Cell Viewer (SCV): An interactive visualization data portal for single cell RNA sequence data

10.1101/664789 ◽

2019 ◽

Cited By ~ 2

Author(s):

Shuoguo Wang ◽

Constance Brett ◽

Mohan Bolisetty ◽

Ryan Golhar ◽

Isaac Neuhaus ◽

...

Keyword(s):

Single Cell ◽

Sequence Data ◽

Single Cells ◽

Link Type ◽

Technological Advances ◽

R Shiny ◽

Data Volume ◽

Exploratory Data ◽

Cell Data ◽

Shiny Application

AbstractMotivationThanks to technological advances made in the last few years, we are now able to study transcriptomes from thousands of single cells. These have been applied widely to study various aspects of Biology. Nevertheless, comprehending and inferring meaningful biological insights from these large datasets is still a challenge. Although tools are being developed to deal with the data complexity and data volume, we do not have yet an effective visualizations and comparative analysis tools to realize the full value of these datasets.ResultsIn order to address this gap, we implemented a single cell data visualization portal called Single Cell Viewer (SCV). SCV is an R shiny application that offers users rich visualization and exploratory data analysis options for single cell datasets.AvailabilitySource code for the application is available online at GitHub (http://www.github.com/neuhausi/single-cell-viewer) and there is a hosted exploration application using the same example dataset as this publication at http://periscopeapps.org/[email protected]; [email protected]

Download Full-text

Continuous visualization of differences between biological conditions in single-cell data

10.1101/337485 ◽

2018 ◽

Cited By ~ 1

Author(s):

Tyler J. Burns ◽

Garry P. Nolan ◽

Nikolay Samusik

Keyword(s):

Single Cell ◽

Nearest Neighbor ◽

Developmental Trajectory ◽

Functional Markers ◽

Mass Cytometry ◽

K Nearest Neighbor ◽

Cell Frequency ◽

Low Dimensional ◽

Marker Shift ◽

Cell Data

In high-dimensional single cell data, comparing changes in functional markers between conditions is typically done across manual or algorithm-derived partitions based on population-defining markers. Visualizations of these partitions is commonly done on low-dimensional embeddings (eg. t-SNE), colored by per-partition changes. Here, we provide an analysis and visualization tool that performs these comparisons across overlapping k-nearest neighbor (KNN) groupings. This allows one to color low-dimensional embeddings by marker changes without hard boundaries imposed by partitioning. We devised an objective optimization of k based on minimizing functional marker KNN imputation error. Proof-of-concept work visualized the exact location of an IL-7 responsive subset in a B cell developmental trajectory on a t-SNE map independent of clustering. Per-condition cell frequency analysis revealed that KNN is sensitive to detecting artifacts due to marker shift, and therefore can also be valuable in a quality control pipeline. Overall, we found that KNN groupings lead to useful multiple condition visualizations and efficiently extract a large amount of information from mass cytometry data. Our software is publicly available through the Bioconductor package Sconify.

Download Full-text

Repeated Decision Stumping Distils Simple Rules from Single Cell Data

10.1101/2020.09.08.288662 ◽

2020 ◽

Author(s):

Ivan A. Croydon Veleslavov ◽

Michael P.H. Stumpf

Keyword(s):

Single Cell ◽

Cell Fate ◽

Predictive Power ◽

Published Data ◽

Gene Products ◽

Computationally Efficient ◽

Key Players ◽

Simple Rules ◽

Unbiased Manner ◽

Cell Data

AbstractHere we introduce repeated decision stumping, to distill simple models from single cell data. We develop decision trees of depth one – hence ‘stumps’ – to identify in an inductive manner, gene products involved in driving cell fate transitions, and in applications to published data we are able to discover the key-players involved in these processes in an unbiased manner without prior knowledge. The approach is computationally efficient, has remarkable predictive power, and yields robust and statistically stable predictors: the same set of candidates is generated by applying the algorithm to different subsamples of the data.

Download Full-text

Transcriptional and epigenetic control of cell fate decisions in early embryos

Reproduction Fertility and Development ◽

10.1071/rd17403 ◽

2018 ◽

Vol 30 (1) ◽

pp. 73 ◽

Cited By ~ 4

Author(s):

Ramiro Alberio

Keyword(s):

Single Cell ◽

Cell Fate ◽

Developmental Stages ◽

Current Knowledge ◽

Mammalian Embryo ◽

Cell Specification ◽

Cell Fate Decisions ◽

Transcription Profiles ◽

Technological Advances ◽

Early Embryos

Mammalian embryo development is characterised by regulative mechanisms of lineage segregation and cell specification. A combination of carefully orchestrated gene expression networks, signalling pathways and epigenetic marks defines specific developmental stages that can now be resolved at the single-cell level. These new ways to depict developmental processes have the potential to provide answers to unresolved questions on how lineage allocation and cell fate decisions are made during embryogenesis. Over the past few years, a flurry of studies reporting detailed single-cell transcription profiles in early embryos has complemented observations acquired using live cell imaging following gene editing techniques to manipulate specific genes. The adoption of this newly available toolkit is reshaping how researchers are designing experiments and how they view animal development. This review presents an overview of the current knowledge on lineage segregation and cell specification in mammals, and discusses some of the outstanding questions that current technological advances can help scientists address, like never before.

Download Full-text