scholarly journals SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble

2019 ◽  
Author(s):  
Ruth Huh ◽  
Yuchen Yang ◽  
Yuchao Jiang ◽  
Yin Shen ◽  
Yun Li

ABSTRACTClustering is an essential step in the analysis of single cell RNA-seq (scRNA-seq) data to shed light on tissue complexity including the number of cell types and transcriptomic signatures of each cell type. Due to its importance, novel methods have been developed recently for this purpose. However, different approaches generate varying estimates regarding the number of clusters and the single-cell level cluster assignments. This type of unsupervised clustering is challenging and it is often times hard to gauge which method to use because none of the existing methods outperform others across all scenarios. We present SAME-clustering, a mixture model-based approach that takes clustering solutions from multiple methods and selects a maximally diverse subset to produce an improved ensemble solution. We tested SAME-clustering across 15 scRNA-seq datasets generated by different platforms, with number of clusters varying from 3 to 15, and number of single cells from 49 to 32,695. Results show that our SAME-clustering ensemble method yields enhanced clustering, in terms of both cluster assignments and number of clusters. The mixture model ensemble clustering is not limited to clustering scRNA-seq data and may be useful to a wide range of clustering applications.

2019 ◽  
Vol 48 (1) ◽  
pp. 86-95 ◽  
Author(s):  
Ruth Huh ◽  
Yuchen Yang ◽  
Yuchao Jiang ◽  
Yin Shen ◽  
Yun Li

Abstract Clustering is an essential step in the analysis of single cell RNA-seq (scRNA-seq) data to shed light on tissue complexity including the number of cell types and transcriptomic signatures of each cell type. Due to its importance, novel methods have been developed recently for this purpose. However, different approaches generate varying estimates regarding the number of clusters and the single-cell level cluster assignments. This type of unsupervised clustering is challenging and it is often times hard to gauge which method to use because none of the existing methods outperform others across all scenarios. We present SAME-clustering, a mixture model-based approach that takes clustering solutions from multiple methods and selects a maximally diverse subset to produce an improved ensemble solution. We tested SAME-clustering across 15 scRNA-seq datasets generated by different platforms, with number of clusters varying from 3 to 15, and number of single cells from 49 to 32 695. Results show that our SAME-clustering ensemble method yields enhanced clustering, in terms of both cluster assignments and number of clusters. The mixture model ensemble clustering is not limited to clustering scRNA-seq data and may be useful to a wide range of clustering applications.


2020 ◽  
Author(s):  
Siamak Yousefi ◽  
Hao Chen ◽  
Jesse F. Ingels ◽  
Melinda S. McCarty ◽  
Arthur G. Centeno ◽  
...  

SUMMARYSingle cell RNA sequencing has enabled quantification of single cells and identification of different cell types and subtypes as well as cell functions in different tissues. Single cell RNA sequence analyses assume acquired RNAs correspond to cells, however, RNAs from contamination within the input data are also captured by these assays. The sequencing of background contamination as well as unwanted cells making their way to the final assay Potentially confound the correct biological interpretation of single cell transcriptomic data. Here we demonstrate two approaches to deal with background contamination as well as profiling of unwanted cells in the assays. We use three real-life datasets of whole-cell capture and nucleotide single-cell captures generated by Fluidigm and 10x technologies and show that these methods reduce the effect of contamination, strengthen clustering of cells and improves biological interpretation.


2020 ◽  
Author(s):  
Jeremy Lombardo ◽  
Marzieh Aliaghaei ◽  
Quy Nguyen ◽  
Kai Kessenbrock ◽  
Jered Haun

Abstract Tissues are composed of highly heterogeneous mixtures of cell subtypes, and this diversity is increasingly being characterized using high-throughput single cell analysis methods. However, these efforts are hindered by the fact that tissues must first be dissociated into single cell suspensions that are viable and still accurately represent phenotypes from the original tissue. Current methods for breaking down tissues are inefficient, labor-intensive, subject to high variability, and potentially biased towards cell subtypes that are easier to release. Here, we present a microfluidic platform consisting of three different tissue processing technologies that can perform the complete tissue to single cell workflow, including digestion, disaggregation, and filtration. First, we developed a new microfluidic digestion device that can be loaded with minced tissue specimens quickly and easily, and then use the combination of proteolytic enzyme activity and fluid shear forces to accelerate tissue breakdown. Next, we integrated dissociation and filter technologies into a single device, which enhanced single cell numbers and fully prepared the sample for single cell analysis. The final multi-device platform was then evaluated using a diverse array of tissue types that exhibited a wide range of properties. For murine kidney and mammary tumor, we found that microfluidic processing produced 2.5-fold more single, viable cells. Single cell RNA sequencing (scRNA-seq) further revealed that device processing enriched for endothelial cells, fibroblasts, and basal epithelium, and did not increase stress responses. For murine liver and heart, which are softer tissues containing fragile cell types, processing time could be reduced to 15 min, and even as short as 1 min. We also demonstrated that periodic recovery at defined time intervals produced substantially more hepatocytes and cardiomyocytes than continuous operation, most likely by preventing damage to fragile cell types. In future work, we will seek to integrate additional operations such as upstream tissue preparation and downstream microfluidic cell sorting and detection to create powerful point-of-care single cell diagnostic platforms.


2020 ◽  
Author(s):  
Alina Isakova ◽  
Norma Neff ◽  
Stephen R. Quake

ABSTRACTThe ability to interrogate total RNA content of single cells would enable better mapping of the transcriptional logic behind emerging cell types and states. However, current RNA-seq methods are unable to simultaneously monitor both short and long, poly(A)+ and poly(A)-transcripts at the single-cell level, and thus deliver only a partial snapshot of the cellular RNAome. Here, we describe Smart-seq-total, a method capable of assaying a broad spectrum of coding and non-coding RNA from a single cell. Built upon the template-switch mechanism, Smart-seq-total bears the key feature of its predecessor, Smart-seq2, namely, the ability to capture full-length transcripts with high yield and quality. It also outperforms current poly(A)–independent total RNA-seq protocols by capturing transcripts of a broad size range, thus, allowing us to simultaneously analyze protein-coding, long non-coding, microRNA and other non-coding RNA transcripts from single cells. We used Smart-seq-total to analyze the total RNAome of human primary fibroblasts, HEK293T and MCF7 cells as well as that of induced murine embryonic stem cells differentiated into embryoid bodies. We show that simultaneous measurement of non-coding RNA and mRNA from the same cell enables elucidation of new roles of non-coding RNA throughout essential processes such as cell cycle or lineage commitment. Moreover, we show that cell types can be distinguished based on the abundance of non-coding transcripts alone.


2021 ◽  
Vol 118 (51) ◽  
pp. e2113568118
Author(s):  
Alina Isakova ◽  
Norma Neff ◽  
Stephen R. Quake

The ability to interrogate total RNA content of single cells would enable better mapping of the transcriptional logic behind emerging cell types and states. However, current single-cell RNA-sequencing (RNA-seq) methods are unable to simultaneously monitor all forms of RNA transcripts at the single-cell level, and thus deliver only a partial snapshot of the cellular RNAome. Here we describe Smart-seq-total, a method capable of assaying a broad spectrum of coding and noncoding RNA from a single cell. Smart-seq-total does not require splitting the RNA content of a cell and allows the incorporation of unique molecular identifiers into short and long RNA molecules for absolute quantification. It outperforms current poly(A)-independent total RNA-seq protocols by capturing transcripts of a broad size range, thus enabling simultaneous analysis of protein-coding, long-noncoding, microRNA, and other noncoding RNA transcripts from single cells. We used Smart-seq-total to analyze the total RNAome of human primary fibroblasts, HEK293T, and MCF7 cells, as well as that of induced murine embryonic stem cells differentiated into embryoid bodies. By analyzing the coexpression patterns of both noncoding RNA and mRNA from the same cell, we were able to discover new roles of noncoding RNA throughout essential processes, such as cell cycle and lineage commitment during embryonic development. Moreover, we show that independent classes of short-noncoding RNA can be used to determine cell-type identity.


Author(s):  
Yan Zhang ◽  
Yaru Zhang ◽  
Jun Hu ◽  
Ji Zhang ◽  
Fangjie Guo ◽  
...  

ABSTRACTThe most fundamental challenge in current single-cell RNA-seq data analysis is functional interpretation and annotation of cell clusters. The biological pathways in distinct cell types have different activation patterns, which facilitates understanding cell functions in single-cell transcriptomics. However, no effective web tool has been implemented for single-cell transcriptomic data analysis based on prior biological pathway knowledge. Here, we introduce scTPA (http://sctpa.bio-data.cn/sctpa), which is a web-based platform providing pathway-based analysis of single-cell RNA-seq data in human and mouse. scTPA incorporates four widely-used gene set enrichment methods to estimate the pathway activation scores of single cells based on a collection of available biological pathways with different functional and taxonomic classifications. The clustering analysis and cell-type-specific activation pathway identification were provided for the functional interpretation of cell types from pathway-oriented perspective. An intuitive interface allows users to conveniently visualize and download single-cell pathway signatures. Together, scTPA is a comprehensive tool to identify pathway activation signatures for dissecting single cell heterogeneity.


2017 ◽  
Author(s):  
Junyue Cao ◽  
Jonathan S. Packer ◽  
Vijay Ramani ◽  
Darren A. Cusanovich ◽  
Chau Huynh ◽  
...  

AbstractConventional methods for profiling the molecular content of biological samples fail to resolve heterogeneity that is present at the level of single cells. In the past few years, single cell RNA sequencing has emerged as a powerful strategy for overcoming this challenge. However, its adoption has been limited by a paucity of methods that are at once simple to implement and cost effective to scale massively. Here, we describe a combinatorial indexing strategy to profile the transcriptomes of large numbers of single cells or single nuclei without requiring the physical isolation of each cell (Single cell Combinatorial Indexing RNA-seq or sci-RNA-seq). We show that sci-RNA-seq can be used to efficiently profile the transcriptomes of tens-of-thousands of single cells per experiment, and demonstrate that we can stratify cell types from these data. Key advantages of sci-RNA-seq over contemporary alternatives such as droplet-based single cell RNA-seq include sublinear cost scaling, a reliance on widely available reagents and equipment, the ability to concurrently process many samples within a single workflow, compatibility with methanol fixation of cells, cell capture based on DNA content rather than cell size, and the flexibility to profile either cells or nuclei. As a demonstration of sci-RNA-seq, we profile the transcriptomes of 42,035 single cells from C. elegans at the L2 stage, effectively 50-fold “shotgun cellular coverage” of the somatic cell composition of this organism at this stage. We identify 27 distinct cell types, including rare cell types such as the two distal tip cells of the developing gonad, estimate consensus expression profiles and define cell-type specific and selective genes. Given that C. elegans is the only organism with a fully mapped cellular lineage, these data represent a rich resource for future methods aimed at defining cell types and states. They will advance our understanding of developmental biology, and constitute a major step towards a comprehensive, single-cell molecular atlas of a whole animal.


Author(s):  
Ling-Ling Zheng ◽  
Jing-Hua Xiong ◽  
Wu-Jian Zheng ◽  
Jun-Hao Wang ◽  
Zi-Liang Huang ◽  
...  

Abstract Although long noncoding RNAs (lncRNAs) have significant tissue specificity, their expression and variability in single cells remain unclear. Here, we developed ColorCells (http://rna.sysu.edu.cn/colorcells/), a resource for comparative analysis of lncRNAs expression, classification and functions in single-cell RNA-Seq data. ColorCells was applied to 167 913 publicly available scRNA-Seq datasets from six species, and identified a batch of cell-specific lncRNAs. These lncRNAs show surprising levels of expression variability between different cell clusters, and has the comparable cell classification ability as known marker genes. Cell-specific lncRNAs have been identified and further validated by in vitro experiments. We found that lncRNAs are typically co-expressed with the mRNAs in the same cell cluster, which can be used to uncover lncRNAs’ functions. Our study emphasizes the need to uncover lncRNAs in all cell types and shows the power of lncRNAs as novel marker genes at single cell resolution.


2020 ◽  
Author(s):  
Edwin Vans ◽  
Ashwini Patil ◽  
Alok Sharma

ABSTRACTAdvances in next-generation sequencing (NGS) have made it possible to carry out transcriptomic studies at single-cell resolution and generate vast amounts of single-cell RNA-seq data rapidly. Thus, tools to analyze this data need to evolve as well to improve accuracy and efficiency. We present FEATS, a python software package that performs clustering on single-cell RNA-seq data. FEATS is capable of performing multiple tasks such as estimating the number of clusters, conducting outlier detection, and integrating data from various experiments. We develop a univariate feature selection based approach for clustering, which involves the selection of top informative features to improve clustering performance. This is motivated by the fact that cell types are often manually determined using the expression of only a few known marker genes. On a variety of single-cell RNA-seq datasets, FEATS gives superior performance compared to the current tools, in terms of adjusted rand index (ARI) and estimating the number of clusters. In addition to cluster estimation, FEATS also performs outlier detection and data integration while giving an excellent computational performance. Thus, FEATS is a comprehensive clustering tool capable of addressing the challenges during the clustering of single-cell RNA-seq data. The installation instructions and documentation of FEATS is available at https://edwinv87.github.io/feats/.


2019 ◽  
Author(s):  
Ralph Patrick ◽  
David T. Humphreys ◽  
Vaibhao Janbandhu ◽  
Alicia Oshlack ◽  
Joshua W.K. Ho ◽  
...  

AbstractHigh-throughput single-cell RNA-seq (scRNA-seq) is a powerful tool for studying gene expression in single cells. Most current scRNA-seq bioinformatics tools focus on analysing overall expression levels, largely ignoring alternative mRNA isoform expression. We present a computational pipeline, Sierra, that readily detects differential transcript usage from data generated by commonly used polyA-captured scRNA-seq technology. We validate Sierra by comparing cardiac scRNA-seq cell-types to bulk RNA-seq of matched populations, finding significant overlap in differential transcripts. Sierra detects differential transcript usage across human peripheral blood mononuclear cells and the Tabula Muris, and 3’UTR shortening in cardiac fibroblasts. Sierra is available at https://github.com/VCCRI/Sierra.


Sign in / Sign up

Export Citation Format

Share Document