scholarly journals Predicting cell types in single cell mass cytometry data

2018 ◽  
Author(s):  
Tamim Abdelaal ◽  
Vincent van Unen ◽  
Thomas Höllt ◽  
Frits Koning ◽  
Marcel J.T. Reinders ◽  
...  

AbstractMotivationMass cytometry (CyTOF) is a valuable technology for high-dimensional analysis at the single cell level. Identification of different cell populations is an important task during the data analysis. Many clustering tools can perform this task, however, they are time consuming, often involve a manual step, and lack reproducibility when new data is included in the analysis. Learning cell types from an annotated set of cells solves these problems. However, currently available mass cytometry classifiers are either complex, dependent on prior knowledge of the cell type markers during the learning process, or can only identify canonical cell types.ResultsWe propose to use a Linear Discriminant Analysis (LDA) classifier to automatically identify cell populations in CyTOF data. LDA shows comparable results with two state-of-the-art algorithms on four benchmark datasets and also outperforms a non-linear classifier such as the k-nearest neighbour classifier. To illustrate its scalability to large datasets with deeply annotated cell subtypes, we apply LDA to a dataset of ~3.5 million cells representing 57 cell types. LDA has high performance on abundant cell types as well as the majority of rare cell types, and provides accurate estimates of cell type frequencies. Further incorporating a rejection option, based on the estimated posterior probabilities, allows LDA to identify cell types that were not encountered during training. Altogether, reproducible prediction of cell type compositions using LDA opens up possibilities to analyse large cohort studies based on mass cytometry data.AvailabilityImplementation is available on GitHub (https://github.com/tabdelaal/CyTOF-Linear-Classifier)[email protected]

2021 ◽  
Author(s):  
Lijun Cheng ◽  
Pratik Karkhanis ◽  
Birkan Gokbag ◽  
Lang Li

Background :  Single-cell mass cytometry, also known as cytometry by time of flight (CyTOF) is a powerful high-throughput technology that allows analysis of up to 50 protein markers per cell for the quantification and classification of single cells. Traditional manual gating utilized to identify new cell populations has been inadequate, inefficient, unreliable, and difficult to use, and no algorithms to identify both calibration and new cell populations has been well established. Methods :   A deep learning with graphic cluster (DGCyTOF) visualization is developed as a new integrated embedding visualization approach in identifying canonical and new cell types. The DGCyTOF combines deep-learning classification and hierarchical stable-clustering methods to sequentially build a tri-layer construct for known cell types and the identification of new cell types. First, deep classification learning is constructed to distinguish calibration cell populations from all cells by softmax classification assignment under a probability threshold, and graph embedding clustering is then used to identify new cell populations sequentially. In the middle of two-layer, cell labels are automatically adjusted between new and unknown cell populations via a feedback loop using an iteration calibration system to reduce the rate of error in the identification of cell types, and a 3-dimensional (3D) visualization platform is finally developed to display the cell clusters with all cell-population types annotated. Results : Utilizing two benchmark CyTOF databases comprising up to 43 million cells, we compared accuracy and speed in the identification of cell types among DGCyTOF, DeepCyTOF, and other technologies including dimension reduction with clustering, including Principal Component Analysis ( PCA ) , Factor Analysis ( FA ), Independent Component Analysis ( ICA ), Isometric Feature Mapping ( Isomap ), t-distributed Stochastic Neighbor Embedding ( t-SNE ), and Uniform Manifold Approximation and Projection ( UMAP ) with k -means clustering and Gaussian mixture clustering. We observed the DGCyTOF represents a robust complete learning system with high accuracy, speed and visualization by eight measurement criteria. The DGCyTOF displayed F-scores of 0.9921 for CyTOF1 and 0.9992 for CyTOF2 datasets, whereas those scores were only 0.507 and 0.529 for the t-SNE + k-means ; 0.565 and 0.59, for UMAP + k-means . Comparison of DGCyTOF with t-SNE and UMAP visualization in accuracy demonstrated its approximately 35% superiority in predicting cell types. In addition, observation of cell-population distribution was more intuitive in the 3D visualization in DGCyTOF than t-SNE and UMAP visualization. Conclusions :  The DGCyTOF model can automatically assign known labels to single cells with high accuracy using deep-learning classification assembling with traditional graph-clustering and dimension-reduction strategies. Guided by a calibration system, the model seeks optimal accuracy balance among calibration cell populations and unknown cell types, yielding a complete and robust learning system that is highly accurate in the identification of cell populations compared to results using other methods in the analysis of single-cell CyTOF data. Application of the DGCyTOF method to identify cell populations could be extended to the analysis of single-cell RNASeq data and other omics data.


2021 ◽  
Author(s):  
Jinyue Liao ◽  
Hoi Ching Suen ◽  
Shitao Rao ◽  
Alfred Chun Shui Luk ◽  
Ruoyu Zhang ◽  
...  

AbstractSpermatogenesis depends on an orchestrated series of developing events in germ cells and full maturation of the somatic microenvironment. To date, the majority of efforts to study cellular heterogeneity in testis has been focused on single-cell gene expression rather than the chromatin landscape shaping gene expression. To advance our understanding of the regulatory programs underlying testicular cell types, we analyzed single-cell chromatin accessibility profiles in more than 25,000 cells from mouse developing testis. We showed that scATAC-Seq allowed us to deconvolve distinct cell populations and identify cis-regulatory elements (CREs) underlying cell type specification. We identified sets of transcription factors associated with cell type-specific accessibility, revealing novel regulators of cell fate specification and maintenance. Pseudotime reconstruction revealed detailed regulatory dynamics coordinating the sequential developmental progressions of germ cells and somatic cells. This high-resolution data also revealed putative stem cells within the Sertoli and Leydig cell populations. Further, we defined candidate target cell types and genes of several GWAS signals, including those associated with testosterone levels and coronary artery disease. Collectively, our data provide a blueprint of the ‘regulon’ of the mouse male germline and supporting somatic cells.


2019 ◽  
Author(s):  
Meenakshi Venkatasubramanian ◽  
Kashish Chetal ◽  
Gowtham Atluri ◽  
Nathan Salomonis

ABSTRACTThe rapid proliferation of single-cell RNA-Sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity have increased, most existing algorithms require significant user-tuning, are heavily reliant on dimensionality reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface. Here, we describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse-NMF, cluster “fitness”, SVM) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from the Human Cell Atlas, we show that the PageRank algorithm effectively down samples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar distinct cell-types and while recovering novel transcriptionally unique cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets.HighlightsICGS2 outperforms alternative approaches in small and ultra-large benchmark datasetsIntegrates multiple solutions for cell-type detection with supervised refinementScales effectively to resolve rare cell-states from ultra-large datasets using PageRank sampling with a low memory footprintIntegrated into AltAnalyze to enable sophisticated and automated downstream analysis


2021 ◽  
Vol 12 ◽  
Author(s):  
Juber Herrera-Uribe ◽  
Jayne E. Wiarda ◽  
Sathesh K. Sivasankaran ◽  
Lance Daharsh ◽  
Haibo Liu ◽  
...  

Pigs are a valuable human biomedical model and an important protein source supporting global food security. The transcriptomes of peripheral blood immune cells in pigs were defined at the bulk cell-type and single cell levels. First, eight cell types were isolated in bulk from peripheral blood mononuclear cells (PBMCs) by cell sorting, representing Myeloid, NK cells and specific populations of T and B-cells. Transcriptomes for each bulk population of cells were generated by RNA-seq with 10,974 expressed genes detected. Pairwise comparisons between cell types revealed specific expression, while enrichment analysis identified 1,885 to 3,591 significantly enriched genes across all 8 cell types. Gene Ontology analysis for the top 25% of significantly enriched genes (SEG) showed high enrichment of biological processes related to the nature of each cell type. Comparison of gene expression indicated highly significant correlations between pig cells and corresponding human PBMC bulk RNA-seq data available in Haemopedia. Second, higher resolution of distinct cell populations was obtained by single-cell RNA-sequencing (scRNA-seq) of PBMC. Seven PBMC samples were partitioned and sequenced that produced 28,810 single cell transcriptomes distributed across 36 clusters and classified into 13 general cell types including plasmacytoid dendritic cells (DC), conventional DCs, monocytes, B-cell, conventional CD4 and CD8 αβ T-cells, NK cells, and γδ T-cells. Signature gene sets from the human Haemopedia data were assessed for relative enrichment in genes expressed in pig cells and integration of pig scRNA-seq with a public human scRNA-seq dataset provided further validation for similarity between human and pig data. The sorted porcine bulk RNAseq dataset informed classification of scRNA-seq PBMC populations; specifically, an integration of the datasets showed that the pig bulk RNAseq data helped define the CD4CD8 double-positive T-cell populations in the scRNA-seq data. Overall, the data provides deep and well-validated transcriptomic data from sorted PBMC populations and the first single-cell transcriptomic data for porcine PBMCs. This resource will be invaluable for annotation of pig genes controlling immunogenetic traits as part of the porcine Functional Annotation of Animal Genomes (FAANG) project, as well as further study of, and development of new reagents for, porcine immunology.


2018 ◽  
Author(s):  
Katerina Boufea ◽  
Sohan Seth ◽  
Nizar N. Batada

AbstractThe power of single cell RNA sequencing (scRNA-seq) stems from its ability to uncover cell type-dependent phenotypes, which rests on the accuracy of cell type identification. However, resolving cell types within and, thus, comparison of scRNA-seq data across conditions is challenging due to technical factors such as sparsity, low number of cells and batch effect. To address these challenges we developed scID (Single Cell IDentification), which uses the framework of Fisher’s Linear Discriminant Analysis to identify transcriptionally related cell types between scRNA-seq datasets. We demonstrate the accuracy and performance of scID relative to existing methods on several published datasets. By increasing power to identify transcriptionally similar cell types across datasets, scID enhances investigator’s ability to extract biological insights from scRNA-seq data.


2019 ◽  
Author(s):  
Christian Feregrino ◽  
Fabio Sacher ◽  
Oren Parnas ◽  
Patrick Tschopp

AbstractBackgroundThrough precise implementation of distinct cell type specification programs, differentially regulated in both space and time, complex patterns emerge during organogenesis. Thanks to its easy experimental accessibility, the developing chicken limb has long served as a paradigm to study vertebrate pattern formation. Through decades’ worth of research, we now have a firm grasp on the molecular mechanisms driving limb formation at the tissue-level. However, to elucidate the dynamic interplay between transcriptional cell type specification programs and pattern formation at its relevant cellular scale, we lack appropriately resolved molecular data at the genome-wide level. Here, making use of droplet-based single-cell RNA-sequencing, we catalogue the developmental emergence of distinct tissue types and their transcriptome dynamics in the distal chicken limb, the so-called autopod, at cellular resolution.ResultsUsing single-cell RNA-sequencing technology, we sequenced a total of 17,628 cells coming from three key developmental stages of chicken autopod patterning. Overall, we identified 23 cell populations with distinct transcriptional profiles. Amongst them were small, albeit essential populations like the apical ectodermal ridge, demonstrating the ability to detect even rare cell types. Moreover, we uncovered the existence of molecularly distinct sub-populations within previously defined compartments of the developing limb, some of which have important signaling functions during autopod pattern formation. Finally, we inferred gene co-expression modules that coincide with distinct tissue types across developmental time, and used them to track patterning-relevant cell populations of the forming digits.ConclusionsWe provide a comprehensive functional genomics resource to study the molecular effectors of chicken limb patterning at cellular resolution. Our single-cell transcriptomic atlas captures all major cell populations of the developing autopod, and highlights the transcriptional complexity in many of its components. Finally, integrating our data-set with other single-cell transcriptomics resources will enable researchers to assess molecular similarities in orthologous cell types across the major tetrapod clades, and provide an extensive candidate gene list to functionally test cell-type-specific drivers of limb morphological diversification.


2019 ◽  
Vol 95 (7) ◽  
pp. 769-781 ◽  
Author(s):  
Tamim Abdelaal ◽  
Vincent Unen ◽  
Thomas Höllt ◽  
Frits Koning ◽  
Marcel J.T. Reinders ◽  
...  

2019 ◽  
Vol 21 (Supplement_6) ◽  
pp. vi63-vi63
Author(s):  
Nalin Leelatian ◽  
Justine Sinnaeve ◽  
Akshitkumar Mistry ◽  
Sierra Barone ◽  
Kirsten Diggins ◽  
...  

Abstract In glioblastoma, changes in signaling, gene sequence, copy number, or transcript expression can define patient subgroups, but these subgroups are not yet associated with differential outcome for most patients with high-risk, IDH wild-type disease. Single cell interrogation of phospho-protein signaling has successfully revealed novel cell types associated with patient outcomes in blood cancers, suggesting that a comparable approach could be used in brain tumors. The goal of this study was to combine a single cell phospho-protein profiling approach with novel, automated computational analysis to identify abnormal glioblastoma cells that stratify patient clinical risk. Effective tissue dissociation strategies and validated antibody panels were created for mass cytometry analyses of resected glioblastoma tissue. These panels simultaneously measured 45 determinants of neural and glioma cell identity, including transcription factors, phospho-proteins, and surface receptors. 28 glioblastoma tumors were stained and analyzed using traditional gating, existing computational tools, and a new risk assessment population identification algorithm (RAPID, https://www.biorxiv.org/content/10.1101/632208v3). RAPID revealed two malignant cell types closely associated with differential patient outcomes. Glioblastoma negative prognostic (GNP) cells were associated with poor survival and defined by phospho-protein signaling in cells with aberrant neural developmental phenotypes. Glioblastoma positive prognostic (GPP) cells were associated with better progression free survival and defined by increased immunogenic signaling. A Cox proportional-hazards regression model was created to assess the influence of GNP and GPP cells on OS and PFS as continuous variables while accounting for other well-known clinical predictors. Each 1% increase of GNP cells was associated with an 7% increase in annual mortality rate (HR=1.07 [95% CI 1.03–1.12], p=0.001). Tumors containing GNP cells also significantly lacked CD45+ immune cell infiltration (Pearson r=-0.8). The signaling events that define these clinically significant glioblastoma cells represent a useful molecular classification, may indicate responsiveness to immunotherapy, and are themselves important targets of opportunity for new therapeutic approaches.


2021 ◽  
Author(s):  
Ju Hee Lee ◽  
Kafi N. Ealey ◽  
Yash Patel ◽  
Navkiran Verma ◽  
Nikita Thakkar ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document