Parameter tuning is a key part of dimensionality reduction via deep variational autoencoders for single cell RNA transcriptomics

Mapping Intimacies ◽

10.1101/385534 ◽

2018 ◽

Cited By ~ 5

Author(s):

Qiwen Hu ◽

Casey S. Greene

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Large Scale ◽

Dimensional Space ◽

Parameter Tuning ◽

Generative Models ◽

Underlying Structure ◽

Expression Data ◽

Low Dimensional

Single-cell RNA sequencing (scRNA-seq) is a powerful tool to profile the transcriptomes of a large number of individual cells at a high resolution. These data usually contain measurements of gene expression for many genes in thousands or tens of thousands of cells, though some datasets now reach the million-cell mark. Projecting high-dimensional scRNA-seq data into a low dimensional space aids downstream analysis and data visualization. Many recent preprints accomplish this using variational autoencoders (VAE), generative models that learn underlying structure of data by compress it into a constrained, low dimensional space. The low dimensional spaces generated by VAEs have revealed complex patterns and novel biological signals from large-scale gene expression data and drug response predictions. Here, we evaluate a simple VAE approach for gene expression data, Tybalt, by training and measuring its performance on sets of simulated scRNA-seq data. We find a number of counter-intuitive performance features: i.e., deeper neural networks can struggle when datasets contain more observations under some parameter configurations. We show that these methods are highly sensitive to parameter tuning: when tuned, the performance of the Tybalt model, which was not optimized for scRNA-seq data, outperforms other popular dimension reduction approaches – PCA, ZIFA, UMAP and t-SNE. On the other hand, without tuning performance can also be remarkably poor on the same data. Our results should discourage authors and reviewers from relying on self-reported performance comparisons to evaluate the relative value of contributions in this area at this time. Instead, we recommend that attempts to compare or benchmark autoencoder methods for scRNA-seq data be performed by disinterested third parties or by methods developers only on unseen benchmark data that are provided to all participants simultaneously because the potential for performance differences due to unequal parameter tuning is so high.

Download Full-text

Sampling from Disentangled Representations of Single-Cell Data Using Generative Adversarial Networks

10.1101/2021.01.15.426872 ◽

2021 ◽

Author(s):

Hengshi Yu ◽

Joshua D. Welch

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Generative Models ◽

Generative Adversarial Networks ◽

Expression Data ◽

Gene Expression Response ◽

Adversarial Networks ◽

Cell Gene Expression ◽

Cell Gene

AbstractDeep generative models, including variational autoencoders (VAEs) and generative adversarial networks (GANs), have achieved remarkable successes in generating and manipulating highdimensional images. VAEs excel at learning disentangled image representations, while GANs excel at generating realistic images. Here, we systematically assess disentanglement and generation performance on single-cell gene expression data and find that these strengths and weaknesses of VAEs and GANs apply to single-cell gene expression data in a similar way. We also develop MichiGAN1, a novel neural network that combines the strengths of VAEs and GANs to sample from disentangled representations without sacrificing data generation quality. We learn disentangled representations of two large singlecell RNA-seq datasets [13, 68] and use MichiGAN to sample from these representations. MichiGAN allows us to manipulate semantically distinct aspects of cellular identity and predict single-cell gene expression response to drug treatment.

Download Full-text

SCANPY: large-scale single-cell gene expression data analysis

Genome Biology ◽

10.1186/s13059-017-1382-0 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 667

Author(s):

F. Alexander Wolf ◽

Philipp Angerer ◽

Fabian J. Theis

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Single Cell ◽

Gene Expression Data ◽

Large Scale ◽

Expression Data ◽

Gene Expression Data Analysis ◽

Cell Gene Expression ◽

Cell Gene

Download Full-text

Scanpy for analysis of large-scale single-cell gene expression data

10.1101/174029 ◽

2017 ◽

Cited By ~ 9

Author(s):

F. Alexander Wolf ◽

Philipp Angerer ◽

Fabian J. Theis

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Large Scale ◽

Expression Data ◽

Cell Gene Expression ◽

Gene Regulatory ◽

Cell Gene

We present Scanpy, a scalable toolkit for analyzing single-cell gene expression data. It includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The Python-based implementation efficiently deals with datasets of more than one million cells and enables easy interfacing of advanced machine learning packages. Code is available fromhttps://github.com/theislab/scanpy.

Download Full-text

Interpretable generative deep learning: an illustration with single cell gene expression data

Human Genetics ◽

10.1007/s00439-021-02417-6 ◽

2022 ◽

Author(s):

Martin Treppner ◽

Harald Binder ◽

Moritz Hess

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Latent Variables ◽

Generative Models ◽

Omics Data ◽

Expression Data ◽

Cell Gene Expression ◽

The Relationship ◽

Cell Gene

AbstractDeep generative models can learn the underlying structure, such as pathways or gene programs, from omics data. We provide an introduction as well as an overview of such techniques, specifically illustrating their use with single-cell gene expression data. For example, the low dimensional latent representations offered by various approaches, such as variational auto-encoders, are useful to get a better understanding of the relations between observed gene expressions and experimental factors or phenotypes. Furthermore, by providing a generative model for the latent and observed variables, deep generative models can generate synthetic observations, which allow us to assess the uncertainty in the learned representations. While deep generative models are useful to learn the structure of high-dimensional omics data by efficiently capturing non-linear dependencies between genes, they are sometimes difficult to interpret due to their neural network building blocks. More precisely, to understand the relationship between learned latent variables and observed variables, e.g., gene transcript abundances and external phenotypes, is difficult. Therefore, we also illustrate current approaches that allow us to infer the relationship between learned latent variables and observed variables as well as external phenotypes. Thereby, we render deep learning approaches more interpretable. In an application with single-cell gene expression data, we demonstrate the utility of the discussed methods.

Download Full-text

How to build regulatory networks from single-cell gene expression data

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics ◽

10.1145/3388440.3414213 ◽

2020 ◽

Author(s):

Aditya Pratapa ◽

Amogh P. Jalihal ◽

Jeffrey N. Law ◽

Aditya Bharadwaj ◽

T. M. Murali

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Regulatory Networks ◽

Expression Data ◽

Cell Gene Expression ◽

Cell Gene

Download Full-text

Graph Convolutional Network for Drug Response Prediction Using Gene Expression Data

Mathematics ◽

10.3390/math9070772 ◽

2021 ◽

Vol 9 (7) ◽

pp. 772

Author(s):

Seonghun Kim ◽

Seockhun Bae ◽

Yinhua Piao ◽

Kyuri Jo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Drug Response ◽

Response Prediction ◽

Biological Data ◽

Expression Data ◽

Convolutional Network ◽

Essential Information ◽

Protein Protein Interaction

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.

Download Full-text

GiniClust: detecting rare cell types from single-cell gene expression data with Gini index

Genome Biology ◽

10.1186/s13059-016-1010-4 ◽

2016 ◽

Vol 17 (1) ◽

Cited By ~ 126

Author(s):

Lan Jiang ◽

Huidong Chen ◽

Luca Pinello ◽

Guo-Cheng Yuan

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Gini Index ◽

Cell Types ◽

Expression Data ◽

Cell Gene Expression ◽

Cell Gene

Download Full-text

GENE DISCOVERY METHODS FROM LARGE-SCALE GENE EXPRESSION DATA

Quantum Bio-Informatics III ◽

10.1142/9789814304061_0040 ◽

2010 ◽

Author(s):

AKIFUMI SHIMIZU ◽

KENTARO YANO

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Gene Discovery ◽

Expression Data

Download Full-text

Determining Physical Mechanisms of Gene Expression Regulation from Single Cell Gene Expression Data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1005072 ◽

2016 ◽

Vol 12 (8) ◽

pp. e1005072 ◽

Cited By ~ 13

Author(s):

Daphne Ezer ◽

Victoria Moignard ◽

Berthold Göttgens ◽

Boris Adryan

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Gene Expression Regulation ◽

Expression Regulation ◽

Expression Data ◽

Physical Mechanisms ◽

Cell Gene Expression ◽

Cell Gene

Download Full-text

Inferring time-lagged causality using the derivative of single-cell expression

10.1101/2021.02.03.429525 ◽

2021 ◽

Author(s):

Huan-Huan Wei ◽

Hui Lu ◽

Hongyu Zhao

Keyword(s):

Gene Expression ◽

Causal Inference ◽

Single Cell ◽

Causal Relationship ◽

Gene Expression Data ◽

Expression Data ◽

Causal Relationships ◽

Expression Levels ◽

Gene Pairs ◽

Time Lagged

AbstractMany computational methods have been developed for inferring causality among genes using cross-sectional gene expression data, such as single-cell RNA sequencing (scRNA-seq) data. However, due to the limitations of scRNA-seq technologies, time-lagged causal relationships may be missed by existing methods. In this work, we propose a method, called causal inference with time-lagged information (CITL), to infer time-lagged causal relationships from scRNA-seq data by assessing conditional independence between the changing and current expression levels of genes. CITL estimates the changing expression levels of genes by “RNA velocity”. We demonstrate the accuracy and stability of CITL for inferring time-lagged causality on simulation data against other leading approaches. We have applied CITL to real scRNA data and inferred 878 pairs of time-lagged causal relationships, with many of these inferred results supported by the literature.Author summaryComputational causal inference is a promising way to survey causal relationships between genes efficiently. Though many causal inference methods have been applied to gene expression data, none considers the time-lagged causal relationship, which means that some genes may take some time to affect their target genes with several reactions. If relationships between genes are time-lagged, the existing methods’ assumptions will be violated. The relationships will be challenging to recognize. We demonstrate that this is indeed the case through simulation. Therefore, we develop a method for inferring time-lagged causal relationships of single-cell gene expression data. We assume that a time-lagged causal relationship should present a strong association between the cause and the effect’s changing. To calculate such correlation, we first estimate the derivative of gene expression using the information from unspliced transcripts. Then, we use conditional independent tests to search gene pairs satisfying our assumption. Our results suggest that we could accurately infer time-lagged causal gene pairs validated by published literature. This method may complement gene regulatory analysis and provide candidate gene pairs for further controlled experiments.

Download Full-text