Panoramic stitching of heterogeneous single-cell transcriptomic data

Mapping Intimacies ◽

10.1101/371179 ◽

2018 ◽

Cited By ~ 17

Author(s):

Brian Hie ◽

Bryan Bryson ◽

Bonnie Berger

Keyword(s):

Single Cell ◽

Cell Types ◽

Data Sets ◽

Cell Type ◽

Data Set ◽

Wide Range ◽

Data Set Integration ◽

Biological Patterns ◽

Insight Into ◽

Comprehensive Reference

AbstractResearchers are generating single-cell RNA sequencing (scRNA-seq) profiles of diverse biological systems1–4 and every cell type in the human body.5 Leveraging this data to gain unprecedented insight into biology and disease will require assembling heterogeneous cell populations across multiple experiments, laboratories, and technologies. Although methods for scRNA-seq data integration exist6,7, they often naively merge data sets together even when the data sets have no cell types in common, leading to results that do not correspond to real biological patterns. Here we present Scanorama, inspired by algorithms for panorama stitching, that overcomes the limitations of existing methods to enable accurate, heterogeneous scRNA-seq data set integration. Our strategy identifies and merges the shared cell types among all pairs of data sets and is orders of magnitude faster than existing techniques. We use Scanorama to combine 105,476 cells from 26 diverse scRNA-seq experiments across 9 different technologies into a single comprehensive reference, demonstrating how Scanorama can be used to obtain a more complete picture of cellular function across a wide range of scRNA-seq experiments.

Download Full-text

ICTD: A semi-supervised cell type identification and deconvolution method for multi-omics data

10.1101/426593 ◽

2018 ◽

Cited By ~ 2

Author(s):

Wennan Chang ◽

Changlin Wan ◽

Xiaoyu Lu ◽

Szu-wei Tu ◽

Yifan Sun ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

Training Data ◽

Marker Genes ◽

Cell Detection ◽

Omics Data ◽

Deconvolution Method ◽

Cell Type ◽

Data Set ◽

Cell Type Specific

AbstractWe developed a novel deconvolution method, namely Inference of Cell Types and Deconvolution (ICTD) that addresses the fundamental issue of identifiability and robustness in current tissue data deconvolution problem. ICTD provides substantially new capabilities for omics data based characterization of a tissue microenvironment, including (1) maximizing the resolution in identifying resident cell and sub types that truly exists in a tissue, (2) identifying the most reliable marker genes for each cell type, which are tissue and data set specific, (3) handling the stability problem with co-linear cell types, (4) co-deconvoluting with available matched multi-omics data, and (5) inferring functional variations specific to one or several cell types. ICTD is empowered by (i) rigorously derived mathematical conditions of identifiable cell type and cell type specific functions in tissue transcriptomics data and (ii) a semi supervised approach to maximize the knowledge transfer of cell type and functional marker genes identified in single cell or bulk cell data in the analysis of tissue data, and (iii) a novel unsupervised approach to minimize the bias brought by training data. Application of ICTD on real and single cell simulated tissue data validated that the method has consistently good performance for tissue data coming from different species, tissue microenvironments, and experimental platforms. Other than the new capabilities, ICTD outperformed other state-of-the-art devolution methods on prediction accuracy, the resolution of identifiable cell, detection of unknown sub cell types, and assessment of cell type specific functions. The premise of ICTD also lies in characterizing cell-cell interactions and discovering cell types and prognostic markers that are predictive of clinical outcomes.

Download Full-text

Probabilistic Harmonization and Annotation of Single-cell Transcriptomics Data with Deep Generative Models

10.1101/532895 ◽

2019 ◽

Cited By ~ 14

Author(s):

Chenling Xu ◽

Romain Lopez ◽

Edouard Mehlman ◽

Jeffrey Regier ◽

Michael I. Jordan ◽

...

Keyword(s):

Single Cell ◽

Probabilistic Approach ◽

Cell Types ◽

Generative Models ◽

Marker Genes ◽

Data Sets ◽

Data Set ◽

Cell State ◽

Transcriptomics Data ◽

Single Data

AbstractAs single-cell transcriptomics becomes a mainstream technology, the natural next step is to integrate the accumulating data in order to achieve a common ontology of cell types and states. However, owing to various nuisance factors of variation, it is not straightforward how to compare gene expression levels across data sets and how to automatically assign cell type labels in a new data set based on existing annotations. In this manuscript, we demonstrate that our previously developed method, scVI, provides an effective and fully probabilistic approach for joint representation and analysis of cohorts of single-cell RNA-seq data sets, while accounting for uncertainty caused by biological and measurement noise. We also introduce single-cell ANnotation using Variational Inference (scANVI), a semi-supervised variant of scVI designed to leverage any available cell state annotations — for instance when only one data set in a cohort is annotated, or when only a few cells in a single data set can be labeled using marker genes. We demonstrate that scVI and scANVI compare favorably to the existing methods for data integration and cell state annotation in terms of accuracy, scalability, and adaptability to challenging settings such as a hierarchical structure of cell state labels. We further show that different from existing methods, scVI and scANVI represent the integrated datasets with a single generative model that can be directly used for any probabilistic decision making task, using differential expression as our case study. scVI and scANVI are available as open source software and can be readily used to facilitate cell state annotation and help ensure consistency and reproducibility across studies.

Download Full-text

CIM-seq

10.21203/rs.3.pex-1365/v1 ◽

2021 ◽

Author(s):

Nathanael Andrews ◽

Martin Enge

Keyword(s):

Single Cell ◽

Single Cells ◽

Likelihood Estimation ◽

Cell Types ◽

Data Sets ◽

Target Tissue ◽

Data Set ◽

Rnaseq Data ◽

The Given ◽

Cell Data

Abstract CIM-seq is a tool for deconvoluting RNA-seq data from cell multiplets (clusters of two or more cells) in order to identify physically interacting cell in a given tissue. The method requires two RNAseq data sets from the same tissue: one of single cells to be used as a reference, and one of cell multiplets to be deconvoluted. CIM-seq is compatible with both droplet based sequencing methods, such as Chromium Single Cell 3′ Kits from 10x genomics; and plate based methods, such as Smartseq2. The pipeline consists of three parts: 1) Dissociation of the target tissue, FACS sorting of single cells and multiplets, and conventional scRNA-seq 2) Feature selection and clustering of cell types in the single cell data set - generating a blueprint of transcriptional profiles in the given tissue 3) Computational deconvolution of multiplets through a maximum likelihood estimation (MLE) to determine the most likely cell type constituents of each multiplet.

Download Full-text

CellKb Immune: a manually curated database of mammalian immune marker gene sets optimized for rapid cell type identification

10.1101/2020.12.01.389890 ◽

2020 ◽

Author(s):

Ajay Patil ◽

Ashwini Patil

Keyword(s):

Single Cell ◽

Search Algorithm ◽

Marker Gene ◽

Cell Types ◽

Reference Database ◽

Rna Seq ◽

Cell Type ◽

Gene Sets ◽

Leave One Out ◽

Comprehensive Reference

AbstractSingle-cell RNA-seq is widely used to study transcriptional patterns of genes in individual cells. In spite of current advances in technology, assigning cell types in single-cell datasets remains a bottleneck due to the lack of a comprehensive reference database and a fast search method in a single tool. CellKb Immune is a knowledgebase of manually collected, curated and annotated marker gene sets from cell types in the mammalian immune response. It finds matching cell types in literature given a list of genes using a novel rank-based algorithm optimized for rapid searching across marker gene lists of differing lengths. We evaluated the contents and search algorithm of CellKb Immune using a leave-one-out approach. We further used CellKb Immune to annotate previously defined marker gene sets from Immgen to confirm its accuracy and coverage. CellKb Immune provides an easy to use database with a fast and reliable method to find matching cell types and annotate cells in single-cell experiments in a single tool. It is available at https://www.cellkb.com/immune.

Download Full-text

A Single Cell Transcriptomic Atlas Characterizes Aging Tissues in the Mouse

10.1101/661728 ◽

2019 ◽

Cited By ~ 20

Author(s):

◽

Angela Oliveira Pisco ◽

Aaron McGeever ◽

Nicholas Schaum ◽

Jim Karkanias ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

Cell Type ◽

Cellular Processes ◽

Progressive Loss ◽

Age Related ◽

Multiple Cell ◽

Cell Type Specific ◽

Molecular Information ◽

Insight Into

AbstractAging is characterized by a progressive loss of physiological integrity, leading to impaired function and increased vulnerability to death1. Despite rapid advances over recent years, many of the molecular and cellular processes which underlie progressive loss of healthy physiology are poorly understood2. To gain a better insight into these processes we have created a single cell transcriptomic atlas across the life span of Mus musculus which includes data from 23 tissues and organs. We discovered cell-specific changes occurring across multiple cell types and organs, as well as age related changes in the cellular composition of different organs. Using single-cell transcriptomic data we were able to assess cell type specific manifestations of different hallmarks of aging, such as senescence3, genomic instability4 and changes in the organism’s immune system2. This Tabula Muris Senis provides a wealth of new molecular information about how the most significant hallmarks of aging are reflected in a broad range of tissues and cell types.

Download Full-text

Optimal marker gene selection for cell type discrimination in single cell analyses

Nature Communications ◽

10.1038/s41467-021-21453-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Bianca Dumitrascu ◽

Soledad Villar ◽

Dustin G. Mixon ◽

Barbara E. Engelhardt

Keyword(s):

Single Cell ◽

Gene Selection ◽

Marker Gene ◽

Cell Types ◽

Specific Cell ◽

Cell Type ◽

Computationally Efficient ◽

Data Set ◽

Gene Markers

AbstractSingle-cell technologies characterize complex cell populations across multiple data modalities at unprecedented scale and resolution. Multi-omic data for single cell gene expression, in situ hybridization, or single cell chromatin states are increasingly available across diverse tissue types. When isolating specific cell types from a sample of disassociated cells or performing in situ sequencing in collections of heterogeneous cells, one challenging task is to select a small set of informative markers that robustly enable the identification and discrimination of specific cell types or cell states as precisely as possible. Given single cell RNA-seq data and a set of cellular labels to discriminate, scGeneFit selects gene markers that jointly optimize cell label recovery using label-aware compressive classification methods. This results in a substantially more robust and less redundant set of markers than existing methods, most of which identify markers that separate each cell label from the rest. When applied to a data set given a hierarchy of cell types as labels, the markers found by our method improves the recovery of the cell type hierarchy with fewer markers than existing methods using a computationally efficient and principled optimization.

Download Full-text

ADAPTS: Automated Deconvolution Augmentation of Profiles for Tissue Specific cells

10.1101/633958 ◽

2019 ◽

Author(s):

Samuel A Danziger ◽

David L Gibbs ◽

Ilya Shmulevich ◽

Mark McConnell ◽

Matthew WB Trotter ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Immune Cell ◽

De Novo ◽

Cell Types ◽

Expression Data ◽

Cell Type ◽

Data Set ◽

Rnaseq Data

AbstractImmune cell infiltration of tumors can be an important component for determining patient outcomes, e.g. by inferring immune cell presence by deconvolving gene expression data drawn from a heterogenous mix of cell types. One particularly powerful family of deconvolution techniques uses signature matrices of genes that uniquely identify each cell type as determined from cell type purified gene expression data. Many methods of this type have been recently published, often including new signature matrices appropriate for a single purpose, such as investigating a specific type of tumor. The package ADAPTS helps users make the most of this expanding knowledge base by introducing a framework for cell type deconvolution. ADAPTS implements modular tools for customizing signature matrices for new tissue types by adding custom cell types or building new matrices de novo, including from single cell RNAseq data. It includes a common interface to several popular deconvolution algorithms that use a signature matrix to estimate the proportion of cell types present in heterogenous samples. ADAPTS also implements a novel method for clustering cell types into groups that are hard to distinguish by deconvolution and then re-splitting those clusters using hierarchical deconvolution. We demonstrate that the techniques implemented in ADAPTS improve the ability to reconstruct the cell types present in a single cell RNAseq data set in a blind predictive analysis. ADAPTS is currently available for use in R on CRAN and GitHub.

Download Full-text

Cell type identification from single-cell transcriptomes in melanoma

BMC Medical Genomics ◽

10.1186/s12920-021-01118-3 ◽

2021 ◽

Vol 14 (S5) ◽

Author(s):

Qiuyan Huo ◽

Yu Yin ◽

Fangfang Liu ◽

Yuying Ma ◽

Liming Wang ◽

...

Keyword(s):

Single Cell ◽

Enrichment Analysis ◽

Cell Types ◽

Marker Genes ◽

Cell Type ◽

Computational Framework ◽

Histocompatibility Complex ◽

Differential Modules ◽

Cell Gene ◽

Insight Into

Abstract Background Single-cell sequencing approaches allow gene expression to be measured at the single-cell level, providing opportunities and challenges to study the aetiology of complex diseases, including cancer. Methods Based on single-cell gene and lncRNA expression levels, we proposed a computational framework for cell type identification that fully considers cell dropout characteristics. First, we defined the dropout features of the cells and identified the dropout clusters. Second, we constructed a differential co-expression network and identified differential modules. Finally, we identified cell types based on the differential modules. Results The method was applied to single-cell melanoma data, and eight cell types were identified. Enrichment analysis of the candidate cell marker genes for the two key cell types showed that both key cell types were closely related to the physiological activities of the major histocompatibility complex (MHC); one key cell type was associated with mitosis-related activities, and the other with pathways related to ten diseases. Conclusions Through identification and analysis of key melanoma-related cell types, we explored the molecular mechanism of melanoma, providing insight into melanoma research. Moreover, the candidate cell markers for the two key cell types are potential therapeutic targets for melanoma.

Download Full-text

A Single-cell Transcriptomic Atlas of the Developing Chicken Limb

10.1101/598227 ◽

2019 ◽

Cited By ~ 1

Author(s):

Christian Feregrino ◽

Fabio Sacher ◽

Oren Parnas ◽

Patrick Tschopp

Keyword(s):

Pattern Formation ◽

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Cell Populations ◽

Cell Type ◽

Data Set ◽

Cellular Resolution ◽

Single Cell Rna Sequencing ◽

Type Specification

AbstractBackgroundThrough precise implementation of distinct cell type specification programs, differentially regulated in both space and time, complex patterns emerge during organogenesis. Thanks to its easy experimental accessibility, the developing chicken limb has long served as a paradigm to study vertebrate pattern formation. Through decades’ worth of research, we now have a firm grasp on the molecular mechanisms driving limb formation at the tissue-level. However, to elucidate the dynamic interplay between transcriptional cell type specification programs and pattern formation at its relevant cellular scale, we lack appropriately resolved molecular data at the genome-wide level. Here, making use of droplet-based single-cell RNA-sequencing, we catalogue the developmental emergence of distinct tissue types and their transcriptome dynamics in the distal chicken limb, the so-called autopod, at cellular resolution.ResultsUsing single-cell RNA-sequencing technology, we sequenced a total of 17,628 cells coming from three key developmental stages of chicken autopod patterning. Overall, we identified 23 cell populations with distinct transcriptional profiles. Amongst them were small, albeit essential populations like the apical ectodermal ridge, demonstrating the ability to detect even rare cell types. Moreover, we uncovered the existence of molecularly distinct sub-populations within previously defined compartments of the developing limb, some of which have important signaling functions during autopod pattern formation. Finally, we inferred gene co-expression modules that coincide with distinct tissue types across developmental time, and used them to track patterning-relevant cell populations of the forming digits.ConclusionsWe provide a comprehensive functional genomics resource to study the molecular effectors of chicken limb patterning at cellular resolution. Our single-cell transcriptomic atlas captures all major cell populations of the developing autopod, and highlights the transcriptional complexity in many of its components. Finally, integrating our data-set with other single-cell transcriptomics resources will enable researchers to assess molecular similarities in orthologous cell types across the major tetrapod clades, and provide an extensive candidate gene list to functionally test cell-type-specific drivers of limb morphological diversification.

Download Full-text

NS-Forest: A machine learning method for the objective identification of minimum marker gene combinations for cell type determination from single cell RNA sequencing

10.1101/2020.09.23.308932 ◽

2020 ◽

Author(s):

Brian Aevermann ◽

Yun Zhang ◽

Mark Novotny ◽

Trygve Bakken ◽

Jeremy Miller ◽

...

Keyword(s):

Machine Learning ◽

Single Cell ◽

Rna Sequencing ◽

Marker Gene ◽

Cell Types ◽

Biological Research ◽

Marker Genes ◽

Cell Type ◽

Type Identity ◽

Wide Range

AbstractSingle cell genomics is rapidly advancing our knowledge of cell phenotypic types and states. Driven by single cell/nucleus RNA sequencing (scRNA-seq) data, comprehensive atlas projects covering a wide range of organisms and tissues are currently underway. As a result, it is critical that the cell transcriptional phenotypes discovered are defined and disseminated in a consistent and concise manner. Molecular biomarkers have historically played an important role in biological research, from defining immune cell-types by surface protein expression to defining diseases by molecular drivers. Here we describe a machine learning-based marker gene selection algorithm, NS-Forest version 2.0, which leverages the non-linear attributes of random forest feature selection and a binary expression scoring approach to discover the minimal marker gene expression combinations that precisely captures the cell type identity represented in the complete scRNA-seq transcriptional profiles. The marker genes selected provide a barcode of the necessary and sufficient characteristics for semantic cell type definition and serve as useful tools for downstream biological investigation. The use of NS-Forest to identify marker genes for human brain middle temporal gyrus cell types reveals the importance of cell signaling and non-coding RNAs in neuronal cell type identity.

Download Full-text