scholarly journals Fully-automated cell-type identification with specific markers extracted from single-cell transcriptomic data

2019 ◽  
Author(s):  
Aleksandr Ianevski ◽  
Anil K Giri ◽  
Tero Aittokallio

AbstractSingle-cell transcriptomics enables systematic charting of cellular composition of complex tissues. Identification of cell populations often relies on unsupervised clustering of cells based on the similarity of the scRNA-seq profiles, followed by manual annotation of cell clusters using established marker genes. However, manual selection of marker genes for cell-type annotation is a laborious and error-prone task since the selected markers must be specific both to the individual cell clusters and various cell types. Here, we developed a computational method, termed ScType, which enables data-driven selection of marker genes based solely on given scRNA-seq data. Using a compendium of 7 scRNA-seq datasets from various human and mouse tissues, we demonstrate how ScType enables unbiased, accurate and fully-automated single-cell type annotation by guaranteeing the specificity of marker genes both across cell clusters and cell types. The widely-applicable method is implemented as an interactive web-tool (https://sctype.fimm.fi), connected with comprehensive database of specific markers.

2018 ◽  
Author(s):  
Douglas Abrams ◽  
Parveen Kumar ◽  
R. Krishna Murthy Karuturi ◽  
Joshy George

AbstractBackgroundThe advent of single cell RNA sequencing (scRNA-seq) enabled researchers to study transcriptomic activity within individual cells and identify inherent cell types in the sample. Although numerous computational tools have been developed to analyze single cell transcriptomes, there are no published studies and analytical packages available to guide experimental design and to devise suitable analysis procedure for cell type identification.ResultsWe have developed an empirical methodology to address this important gap in single cell experimental design and analysis into an easy-to-use tool called SCEED (Single Cell Empirical Experimental Design and analysis). With SCEED, user can choose a variety of combinations of tools for analysis, conduct performance analysis of analytical procedures and choose the best procedure, and estimate sample size (number of cells to be profiled) required for a given analytical procedure at varying levels of cell type rarity and other experimental parameters. Using SCEED, we examined 3 single cell algorithms using 48 simulated single cell datasets that were generated for varying number of cell types and their proportions, number of genes expressed per cell, number of marker genes and their fold change, and number of single cells successfully profiled in the experiment.ConclusionsBased on our study, we found that when marker genes are expressed at fold change of 4 or more than the rest of the genes, either Seurat or Simlr algorithm can be used to analyze single cell dataset for any number of single cells isolated (minimum 1000 single cells were tested). However, when marker genes are expected to be only up to fC 2 upregulated, choice of the single cell algorithm is dependent on the number of single cells isolated and proportion of rare cell type to be identified. In conclusion, our work allows the assessment of various single cell methods and also aids in examining the single cell experimental design.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Qingnan Liang ◽  
Rachayata Dharmat ◽  
Leah Owen ◽  
Akbar Shakoor ◽  
Yumei Li ◽  
...  

AbstractSingle-cell RNA-seq is a powerful tool in decoding the heterogeneity in complex tissues by generating transcriptomic profiles of the individual cell. Here, we report a single-nuclei RNA-seq (snRNA-seq) transcriptomic study on human retinal tissue, which is composed of multiple cell types with distinct functions. Six samples from three healthy donors are profiled and high-quality RNA-seq data is obtained for 5873 single nuclei. All major retinal cell types are observed and marker genes for each cell type are identified. The gene expression of the macular and peripheral retina is compared to each other at cell-type level. Furthermore, our dataset shows an improved power for prioritizing genes associated with human retinal diseases compared to both mouse single-cell RNA-seq and human bulk RNA-seq results. In conclusion, we demonstrate that obtaining single cell transcriptomes from human frozen tissues can provide insight missed by either human bulk RNA-seq or animal models.


2019 ◽  
Author(s):  
Kelly M. Bakulski ◽  
John F. Dou ◽  
Robert C. Thompson ◽  
Christopher Lee ◽  
Lauren Y. Middleton ◽  
...  

AbstractBackgroundLead (Pb) exposure is ubiquitous and has permanent developmental effects on childhood intelligence and behavior and adulthood risk of dementia. The hippocampus is a key brain region involved in learning and memory, and its cellular composition is highly heterogeneous. Pb acts on the hippocampus by altering gene expression, but the cell type-specific responses are unknown.ObjectiveExamine the effects of perinatal Pb treatment on adult hippocampus gene expression, at the level of individual cells, in mice.MethodsIn mice perinatally exposed to control water (n=4) or a human physiologically-relevant level (32 ppm in maternal drinking water) of Pb (n=4), two weeks prior to mating through weaning, we tested for gene expression and cellular differences in the hippocampus at 5-months of age. Analysis was performed using single cell RNA-sequencing of 5,258 cells from the hippocampus by 10x Genomics Chromium to 1) test for gene expression differences averaged across all cells by treatment; 2) compare cell cluster composition by treatment; and 3) test for gene expression and pathway differences within cell clusters by treatment.ResultsGene expression patterns revealed 12 cell clusters in the hippocampus, mapping to major expected cell types (e.g. microglia, astrocytes, neurons, oligodendrocytes). Perinatal Pb treatment was associated with 12.4% more oligodendrocytes (P=4.4×10−21) in adult mice. Across all cells, differential gene expression analysis by Pb treatment revealed cluster marker genes. Within cell clusters, differential gene expression with Pb treatment (q<0.05) was observed in endothelial, microglial, pericyte, and astrocyte cells. Pathways up-regulated with Pb treatment were protein folding in microglia (P=3.4×10−9) and stress response in oligodendrocytes (P=3.2×10−5).ConclusionBulk tissue analysis may be confounded by changes in cell type composition and may obscure effects within vulnerable cell types. This study serves as a biological reference for future single cell studies of toxicant or neuronal complications, to ultimately characterize the molecular basis by which Pb influences cognition and behavior.


2020 ◽  
Vol 176 (2) ◽  
pp. 396-409
Author(s):  
Kelly M Bakulski ◽  
John F Dou ◽  
Robert C Thompson ◽  
Christopher Lee ◽  
Lauren Y Middleton ◽  
...  

Abstract Lead (Pb) exposure is ubiquitous with permanent neurodevelopmental effects. The hippocampus brain region is involved in learning and memory with heterogeneous cellular composition. The hippocampus cell type-specific responses to Pb are unknown. The objective of this study is to examine perinatal Pb treatment effects on adult hippocampus gene expression, at the level of individual cells. In mice perinatally exposed to control water or a human physiologically relevant level (32 ppm in maternal drinking water) of Pb, 2 weeks prior to mating through weaning, we tested for hippocampus gene expression and cellular differences at 5 months of age. We sequenced RNA from 5258 hippocampal cells to (1) test for treatment gene expression differences averaged across all cells, (2) compare cell cluster composition by treatment, and (3) test for treatment gene expression and pathway differences within cell clusters. Gene expression patterns revealed 12 hippocampus cell clusters, mapping to major expected cell types (eg, microglia, astrocytes, neurons, and oligodendrocytes). Perinatal Pb treatment was associated with 12.4% more oligodendrocytes (p = 4.4 × 10−21) in adult mice. Across all cells, Pb treatment was associated with expression of cell cluster marker genes. Within cell clusters, Pb treatment (q &lt; 0.05) caused differential gene expression in endothelial, microglial, pericyte, and astrocyte cells. Pb treatment upregulated protein folding pathways in microglia (p = 3.4 × 10−9) and stress response in oligodendrocytes (p = 3.2 × 10−5). Bulk tissue analysis may be influenced by changes in cell type composition, obscuring effects within vulnerable cell types. This study serves as a biological reference for future single-cell toxicant studies, to ultimately characterize molecular effects on cognition and behavior.


2020 ◽  
Author(s):  
Mohit Goyal ◽  
Guillermo Serrano ◽  
Ilan Shomorony ◽  
Mikel Hernaez ◽  
Idoia Ochoa

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254194
Author(s):  
Hong-Tae Park ◽  
Woo Bin Park ◽  
Suji Kim ◽  
Jong-Sung Lim ◽  
Gyoungju Nah ◽  
...  

Mycobacterium avium subsp. paratuberculosis (MAP) is a causative agent of Johne’s disease, which is a chronic and debilitating disease in ruminants. MAP is also considered to be a possible cause of Crohn’s disease in humans. However, few studies have focused on the interactions between MAP and human macrophages to elucidate the pathogenesis of Crohn’s disease. We sought to determine the initial responses of human THP-1 cells against MAP infection using single-cell RNA-seq analysis. Clustering analysis showed that THP-1 cells were divided into seven different clusters in response to phorbol-12-myristate-13-acetate (PMA) treatment. The characteristics of each cluster were investigated by identifying cluster-specific marker genes. From the results, we found that classically differentiated cells express CD14, CD36, and TLR2, and that this cell type showed the most active responses against MAP infection. The responses included the expression of proinflammatory cytokines and chemokines such as CCL4, CCL3, IL1B, IL8, and CCL20. In addition, the Mreg cell type, a novel cell type differentiated from THP-1 cells, was discovered. Thus, it is suggested that different cell types arise even when the same cell line is treated under the same conditions. Overall, analyzing gene expression patterns via scRNA-seq classification allows a more detailed observation of the response to infection by each cell type.


2018 ◽  
Author(s):  
Wennan Chang ◽  
Changlin Wan ◽  
Xiaoyu Lu ◽  
Szu-wei Tu ◽  
Yifan Sun ◽  
...  

AbstractWe developed a novel deconvolution method, namely Inference of Cell Types and Deconvolution (ICTD) that addresses the fundamental issue of identifiability and robustness in current tissue data deconvolution problem. ICTD provides substantially new capabilities for omics data based characterization of a tissue microenvironment, including (1) maximizing the resolution in identifying resident cell and sub types that truly exists in a tissue, (2) identifying the most reliable marker genes for each cell type, which are tissue and data set specific, (3) handling the stability problem with co-linear cell types, (4) co-deconvoluting with available matched multi-omics data, and (5) inferring functional variations specific to one or several cell types. ICTD is empowered by (i) rigorously derived mathematical conditions of identifiable cell type and cell type specific functions in tissue transcriptomics data and (ii) a semi supervised approach to maximize the knowledge transfer of cell type and functional marker genes identified in single cell or bulk cell data in the analysis of tissue data, and (iii) a novel unsupervised approach to minimize the bias brought by training data. Application of ICTD on real and single cell simulated tissue data validated that the method has consistently good performance for tissue data coming from different species, tissue microenvironments, and experimental platforms. Other than the new capabilities, ICTD outperformed other state-of-the-art devolution methods on prediction accuracy, the resolution of identifiable cell, detection of unknown sub cell types, and assessment of cell type specific functions. The premise of ICTD also lies in characterizing cell-cell interactions and discovering cell types and prognostic markers that are predictive of clinical outcomes.


2021 ◽  
Author(s):  
Wenjing Ma ◽  
Sumeet Sharma ◽  
Peng Jin ◽  
Shannon L Gourley ◽  
Zhaohui Qin

The rapid proliferation of single-cell RNA-sequencing (scRNA-seq) datasets have revealed cell heterogeneity at unprecedented scales. Several deconvolution methods have been developed to decompose bulk experiments to reveal cell type contributions. However, these methods lack power in identifying the accurate cell type composition when having a considerable amount of sub-cell types in the reference dataset. Here, we present LRcell, a R Bioconductor package (http://bioconductor.org/packages/release/bioc/html/LRcell.html) aiming to identify specific sub-cell type(s) that drives the changes observed in a bulk RNA-seq differential gene expression experiment. In addition, LRcell provides pre-embedded marker genes computed from putative single-cell RNA-seq experiments as options to execute the analyses.


2021 ◽  
Author(s):  
Risa Karakida Kawaguchi ◽  
Ziqi Tang ◽  
Stephan Fischer ◽  
Rohit Tripathy ◽  
Peter K. Koo ◽  
...  

Background: Single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) measures genome-wide chromatin accessibility for the discovery of cell-type specific regulatory networks. ScATAC-seq combined with single-cell RNA sequencing (scRNA-seq) offers important avenues for ongoing research, such as novel cell-type specific activation of enhancer and transcription factor binding sites as well as chromatin changes specific to cell states. On the other hand, scATAC-seq data is known to be challenging to interpret due to its high number of zeros as well as the heterogeneity derived from different protocols. Because of the stochastic lack of marker gene activities, cell type identification by scATAC-seq remains difficult even at a cluster level. Results: In this study, we exploit reference knowledge obtained from external scATAC-seq or scRNA-seq datasets to define existing cell types and uncover the genomic regions which drive cell-type specific gene regulation. To investigate the robustness of existing cell-typing methods, we collected 7 scATAC-seq datasets targeting mouse brain for a meta-analytic comparison of neuronal cell-type annotation, including a reference atlas generated by the BRAIN Initiative Cell Census Network (BICCN). By comparing the area under the receiver operating characteristics curves (AUROCs) for the three major cell types (inhibitory, excitatory, and non-neuronal cells), cell-typing performance by single markers is found to be highly variable even for known marker genes due to study-specific biases. However, the signal aggregation of a large and redundant marker gene set, optimized via multiple scRNA-seq data, achieves the highest cell-typing performances among 5 existing marker gene sets, from the individual cell to cluster level. That gene set also shows a high consistency with the cluster-specific genes from inhibitory subtypes in two well-annotated datasets, suggesting applicability to rare cell types. Next, we demonstrate a comprehensive assessment of scATAC-seq cell typing using exhaustive combinations of the marker gene sets with supervised learning methods including machine learning classifiers and joint clustering methods. Our results show that the combinations using robust marker gene sets systematically ranked at the top, not only with model based prediction using a large reference data but also with a simple summation of expression strengths across markers. To demonstrate the utility of this robust cell typing approach, we trained a deep neural network to predict chromatin accessibility in each subtype using only DNA sequence. Through model interpretation methods, we identify key motifs enriched about robust gene sets for each neuronal subtype. Conclusions: Through the meta-analytic evaluation of scATAC-seq cell-typing methods, we develop a novel method set to exploit the BICCN reference atlas. Our study strongly supports the value of robust marker gene selection as a feature selection tool and cross-dataset comparison between scATAC-seq datasets to improve alignment of scATAC-seq to known biology. With this novel, high quality epigenetic data, genomic analysis of regulatory regions can reveal sequence motifs that drive cell type-specific regulatory programs.


2020 ◽  
Author(s):  
Xin Shao ◽  
Haihong Yang ◽  
Xiang Zhuang ◽  
Jie Liao ◽  
Yueren Yang ◽  
...  

AbstractAdvances in single-cell RNA sequencing (scRNA-seq) have furthered the simultaneous classification of thousands of cells in a single assay based on transcriptome profiling. In most analysis protocols, single-cell type annotation relies on marker genes or RNA-seq profiles, resulting in poor extrapolation. Here, we introduce scDeepSort (https://github.com/ZJUFanLab/scDeepSort), a reference-free cell-type annotation tool for single-cell transcriptomics that uses a deep learning model with a weighted graph neural network. Using human and mouse scRNA-seq data resources, we demonstrate the feasibility of scDeepSort and its high accuracy in labeling 764,741 cells involving 56 human and 32 mouse tissues. Significantly, scDeepSort outperformed reference-dependent methods in annotating 76 external testing scRNA-seq datasets, including 126,384 cells (85.79%) from ten human tissues and 134,604 cells from 12 mouse tissues (81.30%). scDeepSort accurately revealed cell identities without prior reference knowledge, thus potentially providing new insights into mechanisms underlying biological processes, disease pathogenesis, and disease progression at a single-cell resolution.


Sign in / Sign up

Export Citation Format

Share Document