scholarly journals Parallelized Inference for Single Cell Transcriptomic Clustering with Split Merge Sampling on DPMM Model

2018 ◽  
Author(s):  
Tiehang Duan ◽  
José P. Pinto ◽  
Xiaohui Xie

Motivation: With the development of droplet based systems, massive single cell transcriptome data has become available, which enables analysis of cellular and molecular processes at single cell resolution and is instrumental to understanding many biological processes. While state-of-the-art clustering methods have been applied to the data, they face challenges in the following aspects: (1) the clustering quality still needs to be improved; (2) most models need prior knowledge on number of clusters, which is not always available; (3) there is a demand for faster computational speed.Results: We propose to tackle these challenges with Parallelized Split Merge Sampling on Dirichlet Process Mixture Model (the Para-DPMM model). Unlike classic DPMM methods that perform sampling on each single data point, the split merge mechanism samples on the cluster level, which significantly improves convergence and optimality of the result. The model is highly parallelized and can utilize the computing power of high performance computing (HPC) clusters, enabling massive inference on huge datasets. Experiment results show the model achieves about 7% improvement in clustering accuracy for small datasets and more than 20% improvement for large challenging datasets compared with current widely used models. In the mean time, the model’s computing speed is significantly faster.Availability: Source code is publicly available on https://github.com/tiehangd/Para_DPMM/tree/master/Para_DPMM_package

2018 ◽  
Vol 35 (6) ◽  
pp. 953-961 ◽  
Author(s):  
Tiehang Duan ◽  
José P Pinto ◽  
Xiaohui Xie

Abstract Motivation With the development of droplet based systems, massive single cell transcriptome data has become available, which enables analysis of cellular and molecular processes at single cell resolution and is instrumental to understanding many biological processes. While state-of-the-art clustering methods have been applied to the data, they face challenges in the following aspects: (i) the clustering quality still needs to be improved; (ii) most models need prior knowledge on number of clusters, which is not always available; (iii) there is a demand for faster computational speed. Results We propose to tackle these challenges with Parallelized Split Merge Sampling on Dirichlet Process Mixture Model (the Para-DPMM model). Unlike classic DPMM methods that perform sampling on each single data point, the split merge mechanism samples on the cluster level, which significantly improves convergence and optimality of the result. The model is highly parallelized and can utilize the computing power of high performance computing (HPC) clusters, enabling massive inference on huge datasets. Experiment results show the model outperforms current widely used models in both clustering quality and computational speed. Availability and implementation Source code is publicly available on https://github.com/tiehangd/Para_DPMM/tree/master/Para_DPMM_package. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Zhe Sun ◽  
Li Chen ◽  
Hongyi Xin ◽  
Qianhui Huang ◽  
Anthony R Cillo ◽  
...  

AbstractThe recently developed droplet-based single cell transcriptome sequencing (scRNA-seq) technology makes it feasible to perform a population-scale scRNA-seq study, in which the transcriptome is measured for tens of thousands of single cells from multiple individuals. Despite the advances of many clustering methods, there are few tailored methods for population-scale scRNA-seq studies. Here, we have developed a BAyesiany Mixture Model for Single Cell sequencing (BAMM-SC) method to cluster scRNA-seq data from multiple individuals simultaneously. Specifically, BAMM-SC takes raw data as input and can account for data heterogeneity and batch effect among multiple individuals in a unified Bayesian hierarchical model framework. Results from extensive simulations and application of BAMM-SC to in-house scRNA-seq datasets using blood, lung and skin cells from humans or mice demonstrated that BAMM-SC outperformed existing clustering methods with improved clustering accuracy and reduced impact from batch effects. BAMM-SC has been implemented in a user-friendly R package with a detailed tutorial available on www.pitt.edu/~Cwec47/singlecell.html.


2021 ◽  
Vol 23 (1) ◽  
Author(s):  
Bhupinder Pal ◽  
Yunshun Chen ◽  
Michael J. G. Milevskiy ◽  
François Vaillant ◽  
Lexie Prokopuk ◽  
...  

Abstract Background Heterogeneity within the mouse mammary epithelium and potential lineage relationships have been recently explored by single-cell RNA profiling. To further understand how cellular diversity changes during mammary ontogeny, we profiled single cells from nine different developmental stages spanning late embryogenesis, early postnatal, prepuberty, adult, mid-pregnancy, late-pregnancy, and post-involution, as well as the transcriptomes of micro-dissected terminal end buds (TEBs) and subtending ducts during puberty. Methods The single cell transcriptomes of 132,599 mammary epithelial cells from 9 different developmental stages were determined on the 10x Genomics Chromium platform, and integrative analyses were performed to compare specific time points. Results The mammary rudiment at E18.5 closely aligned with the basal lineage, while prepubertal epithelial cells exhibited lineage segregation but to a less differentiated state than their adult counterparts. Comparison of micro-dissected TEBs versus ducts showed that luminal cells within TEBs harbored intermediate expression profiles. Ductal basal cells exhibited increased chromatin accessibility of luminal genes compared to their TEB counterparts suggesting that lineage-specific chromatin is established within the subtending ducts during puberty. An integrative analysis of five stages spanning the pregnancy cycle revealed distinct stage-specific profiles and the presence of cycling basal, mixed-lineage, and 'late' alveolar intermediates in pregnancy. Moreover, a number of intermediates were uncovered along the basal-luminal progenitor cell axis, suggesting a continuum of alveolar-restricted progenitor states. Conclusions This extended single cell transcriptome atlas of mouse mammary epithelial cells provides the most complete coverage for mammary epithelial cells during morphogenesis to date. Together with chromatin accessibility analysis of TEB structures, it represents a valuable framework for understanding developmental decisions within the mouse mammary gland.


Cell ◽  
2015 ◽  
Vol 161 (5) ◽  
pp. 1175-1186 ◽  
Author(s):  
Yuping Luo ◽  
Volkan Coskun ◽  
Aibing Liang ◽  
Juehua Yu ◽  
Liming Cheng ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document