scholarly journals Single-Cell Data Analysis Using MMD Variational Autoencoder for a More Informative Latent Representation

2019 ◽  
Author(s):  
Chao Zhang

AbstractVariational Autoencoder (VAE) is a generative model from the computer vision community; it learns a latent representation of images and generates new images in an unsupervised way. Recently, Vanilla VAE has been applied to single-cell data analysis, in the hope of harnessing the representation power of latent space to evade the “curse of dimensionality” of the original dataset. However, Vanilla VAE is suffering from the issue of less informative latent space, which raises a question concerning the reliability of Vanilla VAE latent space in representing the high-dimensional single-cell datasets. Therefore I set up such a study to examine this issue from the multiple perspectives.This paper confirms the issue of Vanilla VAE by comparing it with MMD-VAE, a variant of VAE which has claimed to have overcome this issue based on image data, across a series of single-cell RNAseq and mass cytometry datasets. The result indicates that MMD-VAE is superior to Vanilla VAE in retaining the information not only in the latent space but also the reconstruction space, which suggests that MMD-VAE be a better option for single-cell data analysis than Vanilla VAE.

2020 ◽  
Author(s):  
Giovana Ravizzoni Onzi ◽  
Juliano Luiz Faccioni ◽  
Alvaro G. Alvarado ◽  
Paula Andreghetto Bracco ◽  
Harley I. Kornblum ◽  
...  

Outliers are often ignored or even removed from data analysis. In cancer, however, single outlier cells can be of major importance, since they have uncommon characteristics that may confer capacity to invade, metastasize, or resist to therapy. Here we present the Single-Cell OUTlier analysis (SCOUT), a resource for single-cell data analysis focusing on outlier cells, and the SCOUT Selector (SCOUTS), an application to systematically apply SCOUT on a dataset over a wide range of biological markers. Using publicly available datasets of cancer samples obtained from mass cytometry and single-cell RNA-seq platforms, outlier cells for the expression of proteins or RNAs were identified and compared to their non-outlier counterparts among different samples. Our results show that analyzing single-cell data using SCOUT can uncover key information not easily observed in the analysis of the whole population.


2018 ◽  
Author(s):  
Subarna Palit ◽  
Fabian J. Theis ◽  
Christina E. Zielinski

AbstractRecent advances in cytometry have radically altered the fate of single-cell proteomics by allowing a more accurate understanding of complex biological systems. Mass cytometry (CyTOF) provides simultaneous single-cell measurements that are crucial to understand cellular heterogeneity and identify novel cellular subsets. High-dimensional CyTOF data were traditionally analyzed by gating on bivariate dot plots, which are not only laborious given the quadratic increase of complexity with dimension but are also biased through manual gating. This review aims to discuss the impact of new analysis techniques for in-depths insights into the dynamics of immune regulation obtained from static snapshot data and to provide tools to immunologists to address the high dimensionality of their single-cell data.


2021 ◽  
pp. 338872
Author(s):  
Gerjen H. Tinnevelt ◽  
Kristiaan Wouters ◽  
Geert J. Postma ◽  
Rita Folcarelli ◽  
Jeroen J. Jansen

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Prathitha Kar ◽  
Sriram Tiruvadi-Krishnan ◽  
Jaana Männik ◽  
Jaan Männik ◽  
Ariel Amir

Collection of high-throughput data has become prevalent in biology. Large datasets allow the use of statistical constructs such as binning and linear regression to quantify relationships between variables and hypothesize underlying biological mechanisms based on it. We discuss several such examples in relation to single-cell data and cellular growth. In particular, we show instances where what appears to be ordinary use of these statistical methods leads to incorrect conclusions such as growth being non-exponential as opposed to exponential and vice versa. We propose that the data analysis and its interpretation should be done in the context of a generative model, if possible. In this way, the statistical methods can be validated either analytically or against synthetic data generated via the use of the model, leading to a consistent method for inferring biological mechanisms from data. On applying the validated methods of data analysis to infer cellular growth on our experimental data, we find the growth of length in E. coli to be non-exponential. Our analysis shows that in the later stages of the cell cycle the growth rate is faster than exponential.


2018 ◽  
Author(s):  
Tyler J. Burns ◽  
Garry P. Nolan ◽  
Nikolay Samusik

In high-dimensional single cell data, comparing changes in functional markers between conditions is typically done across manual or algorithm-derived partitions based on population-defining markers. Visualizations of these partitions is commonly done on low-dimensional embeddings (eg. t-SNE), colored by per-partition changes. Here, we provide an analysis and visualization tool that performs these comparisons across overlapping k-nearest neighbor (KNN) groupings. This allows one to color low-dimensional embeddings by marker changes without hard boundaries imposed by partitioning. We devised an objective optimization of k based on minimizing functional marker KNN imputation error. Proof-of-concept work visualized the exact location of an IL-7 responsive subset in a B cell developmental trajectory on a t-SNE map independent of clustering. Per-condition cell frequency analysis revealed that KNN is sensitive to detecting artifacts due to marker shift, and therefore can also be valuable in a quality control pipeline. Overall, we found that KNN groupings lead to useful multiple condition visualizations and efficiently extract a large amount of information from mass cytometry data. Our software is publicly available through the Bioconductor package Sconify.


2021 ◽  
Vol 12 ◽  
Author(s):  
Xu Xiao ◽  
Ying Qiao ◽  
Yudi Jiao ◽  
Na Fu ◽  
Wenxian Yang ◽  
...  

Highly multiplexed imaging technology is a powerful tool to facilitate understanding the composition and interactions of cells in tumor microenvironments at subcellular resolution, which is crucial for both basic research and clinical applications. Imaging mass cytometry (IMC), a multiplex imaging method recently introduced, can measure up to 100 markers simultaneously in one tissue section by using a high-resolution laser with a mass cytometer. However, due to its high resolution and large number of channels, how to process and interpret the image data from IMC remains a key challenge to its further applications. Accurate and reliable single cell segmentation is the first and a critical step to process IMC image data. Unfortunately, existing segmentation pipelines either produce inaccurate cell segmentation results or require manual annotation, which is very time consuming. Here, we developed Dice-XMBD1, a Deep learnIng-based Cell sEgmentation algorithm for tissue multiplexed imaging data. In comparison with other state-of-the-art cell segmentation methods currently used for IMC images, Dice-XMBD generates more accurate single cell masks efficiently on IMC images produced with different nuclear, membrane, and cytoplasm markers. All codes and datasets are available at https://github.com/xmuyulab/Dice-XMBD.


2019 ◽  
Author(s):  
Trung Ngo Trong ◽  
Roger Kramer ◽  
Juha Mehtonen ◽  
Gerardo González ◽  
Ville Hautamäki ◽  
...  

ABSTRACTSingle-cell transcriptomics offers a tool to study the diversity of cell phenotypes through snapshots of the abundance of mRNA in individual cells. Often there is additional information available besides the single cell gene expression counts, such as bulk transcriptome data from the same tissue, or quantification of surface protein levels from the same cells. In this study, we propose models based on the Bayesian generative approach, where protein quantification available as CITE-seq counts from the same cells are used to constrain the learning process, thus forming a semi-supervised model. The generative model is based on the deep variational autoencoder (VAE) neural network architecture.


FEBS Journal ◽  
2018 ◽  
Vol 286 (8) ◽  
pp. 1451-1467 ◽  
Author(s):  
Helena Todorov ◽  
Yvan Saeys

Sign in / Sign up

Export Citation Format

Share Document