scholarly journals Dhaka: Variational Autoencoder for Unmasking Tumor Heterogeneity from Single Cell Genomic Data

2017 ◽  
Author(s):  
Sabrina Rashid ◽  
Sohrab Shah ◽  
Ziv Bar-Joseph ◽  
Ravi Pandya

AbstractMotivationIntra-tumor heterogeneity is one of the key confounding factors in deciphering tumor evolution. Malignant cells exhibit variations in their gene expression, copy numbers, and mutation even when originating from a single progenitor cell. Single cell sequencing of tumor cells has recently emerged as a viable option for unmasking the underlying tumor heterogeneity. However, extracting features from single cell genomic data in order to infer their evolutionary trajectory remains computationally challenging due to the extremely noisy and sparse nature of the data.ResultsHere we describe ‘Dhaka’, a variational autoencoder method which transforms single cell genomic data to a reduced dimension feature space that is more efficient in differentiating between (hidden) tumor subpopulations. Our method is general and can be applied to several different types of genomic data including copy number variation from scDNA-Seq and gene expression from scRNA-Seq experiments. We tested the method on synthetic and 6 single cell cancer datasets where the number of cells ranges from 250 to 6000 for each sample. Analysis of the resulting feature space revealed subpopulations of cells and their marker genes. The features are also able to infer the lineage and/or differentiation trajectory between cells greatly improving upon prior methods suggested for feature extraction and dimensionality reduction of such data.Availability and ImplementationAll the datasets used in the paper are publicly available and developed software package is available on Github https://github.com/MicrosoftGenomics/Dhaka.Supporting info and Software: https://github.com/MicrosoftGenomics/Dhaka

2020 ◽  
Author(s):  
Erica A. K. DePasquale ◽  
Daniel Schnell ◽  
Kashish Chetal ◽  
Nathan Salomonisi

SUMMARYRetention of multiplet captures in single-cell RNA-sequencing (scRNA-seq) data can hinder identification of discrete or transitional cell populations and associated marker genes. To overcome this challenge, we created DoubletDecon to identify and remove doublets, multiplets of two cells, by using a combination of deconvolution to identify putative doublets and analyses of unique gene expression. Here we provide the protocol for running DoubletDecon on scRNA-seq data.For complete details on the use of this protocol, please see DePasquale et al. (2019) (https://doi.org/10.1016/j.celrep.2019.09.082).GRAPHICAL ABSTRACT


Science ◽  
2018 ◽  
Vol 360 (6392) ◽  
pp. eaar5780 ◽  
Author(s):  
James A. Briggs ◽  
Caleb Weinreb ◽  
Daniel E. Wagner ◽  
Sean Megason ◽  
Leonid Peshkin ◽  
...  

Time series of single-cell transcriptome measurements can reveal dynamic features of cell differentiation pathways. From measurements of whole frog embryos spanning zygotic genome activation through early organogenesis, we derived a detailed catalog of cell states in vertebrate development and a map of differentiation across all lineages over time. The inferred map recapitulates most if not all developmental relationships and associates new regulators and marker genes with each cell state. We find that many embryonic cell states appear earlier than previously appreciated. We also assess conflicting models of neural crest development. Incorporating a matched time series of zebrafish development from a companion paper, we reveal conserved and divergent features of vertebrate early developmental gene expression programs.


2018 ◽  
Vol 29 (8) ◽  
pp. 2060-2068 ◽  
Author(s):  
Nikos Karaiskos ◽  
Mahdieh Rahmatollahi ◽  
Anastasiya Boltengagen ◽  
Haiyue Liu ◽  
Martin Hoehne ◽  
...  

Background Three different cell types constitute the glomerular filter: mesangial cells, endothelial cells, and podocytes. However, to what extent cellular heterogeneity exists within healthy glomerular cell populations remains unknown.Methods We used nanodroplet-based highly parallel transcriptional profiling to characterize the cellular content of purified wild-type mouse glomeruli.Results Unsupervised clustering of nearly 13,000 single-cell transcriptomes identified the three known glomerular cell types. We provide a comprehensive online atlas of gene expression in glomerular cells that can be queried and visualized using an interactive and freely available database. Novel marker genes for all glomerular cell types were identified and supported by immunohistochemistry images obtained from the Human Protein Atlas. Subclustering of endothelial cells revealed a subset of endothelium that expressed marker genes related to endothelial proliferation. By comparison, the podocyte population appeared more homogeneous but contained three smaller, previously unknown subpopulations.Conclusions Our study comprehensively characterized gene expression in individual glomerular cells and sets the stage for the dissection of glomerular function at the single-cell level in health and disease.


Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 575-575
Author(s):  
Alexandra M Poos ◽  
Jan-Philipp Mallm ◽  
Stephan M Tirier ◽  
Nicola Casiraghi ◽  
Hana Susak ◽  
...  

Introduction: Multiple myeloma (MM) is a heterogeneous malignancy of clonal plasma cells that accumulate in the bone marrow (BM). Despite new treatment approaches, in most patients resistant subclones are selected by therapy, resulting in the development of refractory disease. While the subclonal architecture in newly diagnosed patients has been investigated in great detail, intra-tumor heterogeneity in relapsed/refractory (RR) MM is poorly characterized. Recent technological and computational advances provide the opportunity to systematically analyze tumor samples at single-cell (sc) level with high accuracy and througput. Here, we present a pilot study for an integrative analysis of sc Assay for Transposase-Accessible Chromatin with high-throughput sequencing (scATAC-seq) and scRNA-seq with the aim to comprehensively study the regulatory landscape, gene expression, and evolution of individual subclones in RRMM patients. Methods: We have included 20 RRMM patients with longitudinally collected paired BM samples. scATAC- and scRNA-seq data were generated using the 10X Genomics platform. Pre-processing of the sc-seq data was performed with the CellRanger software (reference genome GRCh38). For downstream analyses the R-packages Seurat and Signac (Satija Lab) as well as Cicero (Trapnell Lab) were used. For all patients bulk whole genome sequencing (WGS) data was available, which we used for confirmatory studies of intra-tumor heterogeneity. Results: A comprehensive study at the sc level requires extensive quality controls (QC). All scATAC-seq files passed the QC, including the detected number of cells, number of fragments in peaks or the ratio of mononucleosomal to nucleosome-free fragments. Yet, unsupervised clustering of the differentially accessible regions resulted in two main clusters, strongly associated with sample processing time. Delay of sample processing by 1-2 days, e.g. due to shipment from participating centers, resulted in global change of chromatin accessibility with more than 10,000 regions showing differences compared to directly processed samples. The corresponding scRNA-seq files also consistently failed QC, including detectable genes per cell and the percentage of mitochondrial RNA. We excluded these samples from the study. Analysing scATAC-seq data, we observed distinct clusters before and after treatment of RRMM, indicating clonal adaptation or selection in all samples. Treatment with carfilzomib resulted in highly increased co-accessibility and >100 genes were differentially accessible upon treatment. These genes are related to the activation of immune cells (including T-, and B-cells), cell-cell adhesion, apoptosis and signaling pathways (e.g. NFκB) and include several chaperone proteins (e.g. HSPH1) which were upregulated in the scRNA-seq data upon proteasome inhibition. The power of our comprehensive approach for detection of individual subclones and their evolution is exemplarily illustrated in a patient who was treated with a MEK inhibitor and achieved complete remission. This patient showed two main clusters in the scATAC-seq data before treatment, suggesting presence of two subclones. Using copy number profiles based on WGS and scRNA-seq data and performing a trajectory analysis based on scATAC-seq data, we could confirm two different subclones. At relapse, a seemingly independent dominant clone emerged. Upon comprehensive integration of the datasets, one of the initial subclones could be identified as the precursor of this dominant clone. We observed increased accessibility for 108 regions (e.g. JUND, HSPA5, EGR1, FOSB, ETS1, FOXP2) upon MEK inhibition. The most significant differentially accessible region in this clone and its precursor included the gene coding for krüppel-like factor 2 (KLF2). scRNA-seq data showed overexpression of KLF2 in the MEK-inhibitor resistant clone, confirming KLF2 scATAC-seq data. KLF2 has been reported to play an essential role together with KDM3A and IRF1 for MM cell survival and adhesion to stromal cells in the BM. Conclusions: Our data strongly suggest to use only immediately processed samples for single cell technologies. Integrating scATAC- and scRNA-seq together with bulk WGS data showed that detection of individual clones and longitudinal changes in the activity of cis-regulatory regions and gene expression is feasible and informative in RRMM. Disclosures Goldschmidt: John-Hopkins University: Research Funding; Novartis: Membership on an entity's Board of Directors or advisory committees, Research Funding; John-Hopkins University: Research Funding; Bristol-Myers Squibb: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Mundipharma: Research Funding; Takeda: Membership on an entity's Board of Directors or advisory committees, Research Funding; MSD: Research Funding; Molecular Partners: Research Funding; Dietmar-Hopp-Stiftung: Research Funding; Janssen: Consultancy, Research Funding; Chugai: Honoraria, Research Funding; Janssen: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Sanofi: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Amgen: Consultancy, Research Funding; Celgene: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Adaptive Biotechnology: Membership on an entity's Board of Directors or advisory committees.


2009 ◽  
Vol 2009 ◽  
pp. 1-7 ◽  
Author(s):  
Qihua Tan ◽  
Mads Thomassen ◽  
Kirsten M. Jochumsen ◽  
Ole Mogensen ◽  
Kaare Christensen ◽  
...  

Different from significant gene expression analysis which looks for genes that are differentially regulated, feature selection in the microarray-based prognostic gene expression analysis aims at finding a subset of marker genes that are not only differentially expressed but also informative for prediction. Unfortunately feature selection in literature of microarray study is predominated by the simple heuristic univariate gene filter paradigm that selects differentially expressed genes according to their statistical significances. We introduce a combinatory feature selection strategy that integrates differential gene expression analysis with the Gram-Schmidt process to identify prognostic genes that are both statistically significant and highly informative for predicting tumour survival outcomes. Empirical application to leukemia and ovarian cancer survival data through-within- and cross-study validations shows that the feature space can be largely reduced while achieving improved testing performances.


Author(s):  
Justin Lakkis ◽  
David Wang ◽  
Yuanchao Zhang ◽  
Gang Hu ◽  
Kui Wang ◽  
...  

AbstractRecent development of single-cell RNA-seq (scRNA-seq) technologies has led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effect, which is inevitable in studies involving human tissues. Most existing methods remove batch effect in a low-dimensional embedding space. Although useful for clustering, batch effect is still present in the gene expression space, leaving downstream gene-level analysis susceptible to batch effect. Recent studies have shown that batch effect correction in the gene expression space is much harder than in the embedding space. Popular methods such as Seurat3.0 rely on the mutual nearest neighbor (MNN) approach to remove batch effect in the gene expression space, but MNN can only analyze two batches at a time and it becomes computationally infeasible when the number of batches is large. Here we present CarDEC, a joint deep learning model that simultaneously clusters and denoises scRNA-seq data, while correcting batch effect both in the embedding and the gene expression space. Comprehensive evaluations spanning different species and tissues showed that CarDEC consistently outperforms scVI, DCA, and MNN. With CarDEC denoising, those non-highly variable genes offer as much signal for clustering as the highly variable genes, suggesting that CarDEC substantially boosted information content in scRNA-seq. We also showed that trajectory analysis using CarDEC’s denoised and batch corrected expression as input revealed marker genes and transcription factors that are otherwise obscured in the presence of batch effect. CarDEC is computationally fast, making it a desirable tool for large-scale scRNA-seq studies.


2019 ◽  
Author(s):  
Valentine Svensson ◽  
Lior Pachter

Single cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for such analyses must be scalable, and ideally interpretable. We present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be of interest for other applications.


2017 ◽  
Author(s):  
Dongfang Wang ◽  
Jin Gu

AbstractSingle cell RNA sequencing (scRNA-seq) is a powerful technique to analyze the transcriptomic heterogeneities in single cell level. It is an important step for studying cell sub-populations and lineages based on scRNA-seq data by finding an effective low-dimensional representation and visualization of the original data. The scRNA-seq data are much noiser than traditional bulk RNA-Seq: in the single cell level, the transcriptional fluctuations are much larger than the average of a cell population and the low amount of RNA transcripts will increase the rate of technical dropout events. In this study, we proposed VASC (deep Variational Autoencoder for scRNA-seq data), a deep multi-layer generative model, for the unsupervised dimension reduction and visualization of scRNA-seq data. It can explicitly model the dropout events and find the nonlinear hierarchical feature representations of the original data. Tested on twenty datasets, VASC shows superior performances in most cases and broader dataset compatibility compared with four state-of-the-art dimension reduction methods. Then, for a case study of pre-implantation embryos, VASC successfully re-establishes the cell dynamics and identifies several candidate marker genes associated with the early embryo development.


2016 ◽  
Author(s):  
Caleb Weinreb ◽  
Samuel Wolock ◽  
Allon Klein

MotivationSingle-cell gene expression profiling technologies can map the cell states in a tissue or organism. As these technologies become more common, there is a need for computational tools to explore the data they produce. In particular, existing data visualization approaches are imperfect for studying continuous gene expression topologies.ResultsForce-directed layouts of k-nearest-neighbor graphs can visualize continuous gene expression topologies in a manner that preserves high-dimensional relationships and allows manually exploration of different stable two-dimensional representations of the same data. We implemented an interactive web-tool to visualize single-cell data using force-directed graph layouts, called SPRING. SPRING reveals more detailed biological relationships than existing approaches when applied to branching gene expression trajectories from hematopoietic progenitor cells. Visualizations from SPRING are also more reproducible than those of stochastic visualization methods such as tSNE, a state-of-the-art tool.Availabilityhttps://kleintools.hms.harvard.edu/tools/spring.html,https://github.com/AllonKleinLab/SPRING/[email protected], [email protected]


Sign in / Sign up

Export Citation Format

Share Document