scholarly journals Single-cell entropy to quantify the cellular transcriptome from single-cell RNA-seq data

2019 ◽  
Author(s):  
Jingxin Liu ◽  
You Song ◽  
Jinzhi Lei

We present the use of single-cell entropy (scEntropy) to measure the order of the cellular transcriptome profile from single-cell RNA-seq data, which leads to a method of unsupervised cell type classification through scEntropy followed by the Gaussian mixture model (scEGMM). scEntropy is straightforward in defining an intrinsic transcriptional state of a cell. scEGMM is a coherent method of cell type classification that includes no parameters and no clustering; however, it is comparable to existing machine learning-based methods in benchmarking studies and facilitates biological interpretation.

2020 ◽  
Vol 15 (01) ◽  
pp. 35-49
Author(s):  
Jingxin Liu ◽  
You Song ◽  
Jinzhi Lei

The cell is the basic functional and biological unit of life, and a complex system that contains a huge number of molecular components. How can we quantify the macroscopic state of a cell from the microscopic information of these molecular components? This is a fundamental question to increase the understanding of the human body. The recent maturation of single-cell RNA sequencing (scRNA-seq) technologies has allowed researchers to gain information on the transcriptomes of individual cells. Although considerable progress has been made in terms of cell-type clustering over the past few years, there is no strong consensus about how to define a cell state from scRNA-seq data. Here, we present single-cell entropy (scEntropy) as an order parameter for cellular transcriptome profiles from scRNA-seq data. scEntropy is a straightforward parameter with which to define the intrinsic transcriptional state of a cell that can provide a quantity to measure the developmental process and to distinguish different cell types. The proposed scEntropy followed by Gaussian mixture model (scEGMM) provides a coherent method of cell-type classification that is simple, includes no parameters or clustering and is comparable to existing machine learning-based methods in benchmarking studies. The results of cell-type classification based on scEGMM are robust and easy to biologically interpret.


2019 ◽  
Author(s):  
Matthew N. Bernstein ◽  
Zhongjie Ma ◽  
Michael Gleicher ◽  
Colin N. Dewey

SummaryCell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification by considering the rich hierarchical structure of known cell types, a source of prior knowledge that is not utilized by existing methods. Furthemore, CellO comes pre-trained on a novel, comprehensive dataset of human, healthy, untreated primary samples in the Sequence Read Archive, which to the best of our knowledge, is the most diverse curated collection of primary cell data to date. CellO’s comprehensive training set enables it to run out-of-the-box on diverse cell types and achieves superior or competitive performance when compared to existing state-of-the-art methods. Lastly, CellO’s linear models are easily interpreted, thereby enabling exploration of cell type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO’s models across the ontology.HighlightWe present CellO, a tool for hierarchically classifying cell type from single-cell RNA-seq data against the graph-structured Cell OntologyCellO is pre-trained on a comprehensive dataset comprising nearly all bulk RNA-seq primary cell samples in the Sequence Read ArchiveCellO achieves superior or comparable performance with existing methods while featuring a more comprehensive pre-packaged training setCellO is built with easily interpretable models which we expose through a novel web application, the CellO Viewer, for exploring cell type-specific signatures across the Cell OntologyGraphical Abstract


2019 ◽  
Author(s):  
Kai Yao ◽  
Nash D. Rochman ◽  
Sean X. Sun

AbstractConvolutional neural networks (ConvNets) have been used for both classification and semantic segmentation of cellular images. Here we establish a method for cell type classification utilizing images taken on a benchtop microscope directly from cell culture flasks eliminating the need for a dedicated imaging platform. Significant flask-to-flask heterogeneity was discovered and overcome to support network generalization to novel data. Cell density was found to be a prominent source of heterogeneity even within the single-cell regime indicating the presence of morphological effects due to diffusion-mediated cell-cell interaction. Expert classification was poor for single-cell images and excellent for multi-cell images suggesting experts rely on the identification of characteristic phenotypes within subsets of each population and not ubiquitous identifiers. Finally we introduce Self-Label Clustering, an unsupervised clustering method relying on ConvNet feature extraction able to identify distinct morphological phenotypes within a cell type, some of which are observed to be cell density dependent.Author summaryK.Y., N.D.R., and S.X.S. designed experiments and computational analysis. K.Y. performed experiments and ConvNets design/training, K.Y., N.D.R and S.X.S wrote the paper.


Author(s):  
Yinghao Cao ◽  
Xiaoyue Wang ◽  
Gongxin Peng

AbstractCurrently most methods take manual strategies to annotate cell types after clustering the single-cell RNA sequencing (scRNA-seq) data. Such methods are labor-intensive and heavily rely on user expertise, which may lead to inconsistent results. We present SCSA, an automatic tool to annotate cell types from scRNA-seq data, based on a score annotation model combining differentially expressed genes (DEGs) and confidence levels of cell markers from both known and user-defined information. Evaluation on real scRNA-seq datasets from different sources with other methods shows that SCSA is able to assign the cells into the correct types at a fully automated mode with a desirable precision.


2020 ◽  
Vol 11 ◽  
Author(s):  
Yinghao Cao ◽  
Xiaoyue Wang ◽  
Gongxin Peng
Keyword(s):  
Rna Seq ◽  

Author(s):  
Maria Brbić ◽  
Marinka Zitnik ◽  
Sheng Wang ◽  
Angela O. Pisco ◽  
Russ B. Altman ◽  
...  

Although tremendous effort has been put into cell type annotation and classification, identification of previously uncharacterized cell types in heterogeneous single-cell RNA-seq data remains a challenge. Here we present MARS, a meta-learning approach for identifying and annotating known as well as novel cell types. MARS overcomes the heterogeneity of cell types by transferring latent cell representations across multiple datasets. MARS uses deep learning to learn a cell embedding function as well as a set of landmarks in the cell embedding space. The method annotates cells by probabilistically defining a cell type based on nearest landmarks in the embedding space. MARS has a unique ability to discover cell types that have never been seen before and annotate experiments that are yet unannotated. We apply MARS to a large aging cell atlas of 23 tissues covering the life span of a mouse. MARS accurately identifies cell types, even when it has never seen them before. Further, the method automatically generates interpretable names for novel cell types. Remarkably, MARS estimates meaningful cell-type-specific signatures of aging and visualizes them as trajectories reflecting temporal relationships of cells in a tissue.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
David S. Fischer ◽  
Leander Dony ◽  
Martin König ◽  
Abdul Moeed ◽  
Luke Zappia ◽  
...  

AbstractSingle-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classification tailored to datasets annotated at different levels of coarseness. We demonstrate the utility of sfaira by training models across anatomic data partitions on 8 million cells.


2020 ◽  
Vol 2 (10) ◽  
pp. 607-618 ◽  
Author(s):  
Jian Hu ◽  
Xiangjie Li ◽  
Gang Hu ◽  
Yafei Lyu ◽  
Katalin Susztak ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Wenbo Yu ◽  
Ahmed Mahfouz ◽  
Marcel J. T. Reinders

The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these batches. Typically, these batch effects are resolved by matching similar cells across the different batches. Current approaches, however, do not take into account that we can constrain this matching further as cells can also be matched on their cell type identity. We use an auto-encoder to embed two batches in the same space such that cells are matched. To accomplish this, we use a loss function that preserves: (1) cell-cell distances within each of the two batches, as well as (2) cell-cell distances between two batches when the cells are of the same cell-type. The cell-type guidance is unsupervised, i.e., a cell-type is defined as a cluster in the original batch. We evaluated the performance of our cluster-guided batch alignment (CBA) using pancreas and mouse cell atlas datasets, against six state-of-the-art single cell alignment methods: Seurat v3, BBKNN, Scanorama, Harmony, LIGER, and BERMUDA. Compared to other approaches, CBA preserves the cluster separation in the original datasets while still being able to align the two datasets. We confirm that this separation is biologically meaningful by identifying relevant differential expression of genes for these preserved clusters.


Author(s):  
Jian Hu ◽  
Xiangjie Li ◽  
Gang Hu ◽  
Yafei Lyu ◽  
Katalin Susztak ◽  
...  

AbstractAn important step in single-cell RNA-seq (scRNA-seq) analysis is to cluster cells into different populations or types. Here we describe ItClust, an Iterative Transfer learning algorithm with neural network for scRNA-seq Clustering. ItClust learns cell type knowledge from well-annotated source data, but also leverages information in the target data to make it less dependent on the source data quality. Through extensive evaluations using datasets from different species and tissues generated with diverse scRNA-seq protocols, we show that ItClust significantly improves clustering and cell type classification accuracy compared to popular unsupervised clustering and supervised cell type classification algorithms.


Sign in / Sign up

Export Citation Format

Share Document