De novo prediction of cell-type complexity in single-cell RNA-seq and tumor microenvironments

Jun Woo; Boris J. Winterhoff; Timothy K. Starr; Constantin Aliferis; Jinhua Wang

doi:10.26508/lsa.201900443

De novo prediction of cell-type complexity in single-cell RNA-seq and tumor microenvironments

Life Science Alliance ◽

10.26508/lsa.201900443 ◽

2019 ◽

Vol 2 (4) ◽

pp. e201900443 ◽

Cited By ~ 1

Author(s):

Jun Woo ◽

Boris J. Winterhoff ◽

Timothy K. Starr ◽

Constantin Aliferis ◽

Jinhua Wang

Keyword(s):

Single Cell ◽

Model Comparison ◽

De Novo ◽

Nonnegative Matrix ◽

Simulated Data ◽

Cell Type ◽

Pancreatic Cell ◽

Bayesian Model Comparison ◽

Cellular Microenvironments ◽

Cell Data

Recent single-cell transcriptomic studies revealed new insights into cell-type heterogeneities in cellular microenvironments unavailable from bulk studies. A significant drawback of currently available algorithms is the need to use empirical parameters or rely on indirect quality measures to estimate the degree of complexity, i.e., the number of subgroups present in the sample. We fill this gap with a single-cell data analysis procedure allowing for unambiguous assessments of the depth of heterogeneity in subclonal compositions supported by data. Our approach combines nonnegative matrix factorization, which takes advantage of the sparse and nonnegative nature of single-cell RNA count data, with Bayesian model comparison enabling de novo prediction of the depth of heterogeneity. We show that the method predicts the correct number of subgroups using simulated data, primary blood mononuclear cell, and pancreatic cell data. We applied our approach to a collection of single-cell tumor samples and found two qualitatively distinct classes of cell-type heterogeneity in cancer microenvironments.

Download Full-text

GT-TS: Experimental design for maximizing cell type discovery in single-cell data

10.1101/386540 ◽

2018 ◽

Cited By ~ 4

Author(s):

Bianca Dumitrascu ◽

Karen Feng ◽

Barbara E Engelhardt

Keyword(s):

Experimental Design ◽

Single Cell ◽

Simulated Data ◽

Computational Method ◽

Rna Seq ◽

Cell Type ◽

Thompson Sampling ◽

Random Strategy ◽

Cell Data ◽

Type Information

We present the Good-Toulmin like estimator via Thompson sampling, a computational method for iterative experimental design in multi-tissue single-cell RNA-seq (scRNA-seq) data. Given a budget and modeling cell type information across tissues, GT-TS estimates how many cells are required for sampling from each tissue with the goal of maximizing cell type discovery across samples from multiple iterations. In both real and simulated data, we demonstrate the advantages of GT-TS in data collection planning when compared to a random strategy in the absence of experimental design.

Download Full-text

Mapping single-cell atlases throughout Metazoa unravels cell type evolution

eLife ◽

10.7554/elife.66747 ◽

2021 ◽

Vol 10 ◽

Author(s):

Alexander J Tarashansky ◽

Jacob M Musser ◽

Margarita Khariton ◽

Pengyang Li ◽

Detlev Arendt ◽

...

Keyword(s):

Stem Cell ◽

Single Cell ◽

Cell Types ◽

The Self ◽

Cell Type ◽

Germ Layers ◽

Animal Evolution ◽

Self Assembling ◽

Animal Phyla ◽

Cell Data

Comparing single-cell transcriptomic atlases from diverse organisms can elucidate the origins of cellular diversity and assist the annotation of new cell atlases. Yet, comparison between distant relatives is hindered by complex gene histories and diversifications in expression programs. Previously, we introduced the self-assembling manifold (SAM) algorithm to robustly reconstruct manifolds from single-cell data (Tarashansky et al., 2019). Here, we build on SAM to map cell atlas manifolds across species. This new method, SAMap, identifies homologous cell types with shared expression programs across distant species within phyla, even in complex examples where homologous tissues emerge from distinct germ layers. SAMap also finds many genes with more similar expression to their paralogs than their orthologs, suggesting paralog substitution may be more common in evolution than previously appreciated. Lastly, comparing species across animal phyla, spanning mouse to sponge, reveals ancient contractile and stem cell families, which may have arisen early in animal evolution.

Download Full-text

Axes of inter-sample variability among transcriptional neighborhoods reveal disease associated cell states in single-cell data

10.1101/2021.04.19.440534 ◽

2021 ◽

Author(s):

Yakir A Reshef ◽

Laurie Rumker ◽

Joyce B Kang ◽

Aparna Nathan ◽

Megan B Murray ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Single Cell ◽

Active Tuberculosis ◽

Cell Populations ◽

Cell Type ◽

Accurate Identification ◽

Shared Function ◽

Statistical Approaches ◽

Cell Data ◽

Notch Activation

As single-cell datasets grow in sample size, there is a critical need to characterize cell states that vary across samples and associate with sample attributes like clinical phenotypes. Current statistical approaches typically map cells to cell-type clusters and examine sample differences through that lens alone. Here we present covarying neighborhood analysis (CNA), an unbiased method to identify cell populations of interest with greater flexibility and granularity. CNA characterizes dominant axes of variation across samples by identifying groups of very small regions in transcriptional space, termed neighborhoods, that covary in abundance across samples, suggesting shared function or regulation. CNA can then rigorously test for associations between any sample-level attribute and the abundances of these covarying neighborhood groups. We show in simulation that CNA enables more powerful and accurate identification of disease-associated cell states than a cluster-based approach. When applied to published datasets, CNA captures a Notch activation signature in rheumatoid arthritis, redefines monocyte populations expanded in sepsis, and identifies a previously undiscovered T-cell population associated with progression to active tuberculosis.

Download Full-text

ESCO: single cell expression simulation incorporating gene co-expression

10.1101/2020.10.20.347211 ◽

2020 ◽

Author(s):

Jinjin Tian ◽

Jiebiao Wang ◽

Kathryn Roeder

Keyword(s):

Single Cell ◽

R Package ◽

Brain Cell ◽

Gene Interactions ◽

Cell Type ◽

Imputation Methods ◽

Biological Interest ◽

A Cell ◽

Cell Expression ◽

Cell Data

AbstractMotivationGene-gene co-expression networks (GCN) are of biological interest for the useful information they provide for understanding gene-gene interactions. The advent of single cell RNA-sequencing allows us to examine more subtle gene co-expression occurring within a cell type. Many imputation and denoising methods have been developed to deal with the technical challenges observed in single cell data; meanwhile, several simulators have been developed for benchmarking and assessing these methods. Most of these simulators, however, either do not incorporate gene co-expression or generate co-expression in an inconvenient manner.ResultsTherefore, with the focus on gene co-expression, we propose a new simulator, ESCO, which adopts the idea of the copula to impose gene co-expression, while preserving the highlights of available simulators, which perform well for simulation of gene expression marginally. Using ESCO, we assess the performance of imputation methods on GCN recovery and find that imputation generally helps GCN recovery when the data are not too sparse, and the ensemble imputation method works best among leading methods. In contrast, imputation fails to help in the presence of an excessive fraction of zero counts, where simple data aggregating methods are a better choice. These findings are further verified with mouse and human brain cell data.AvailabilityThe ESCO implementation is available as R package SplatterESCO (https://github.com/JINJINT/SplatterESCO)[email protected]

Download Full-text

#3101 Imbalanced basal ganglia connectivity is associated with motor deficits and apathy in Huntingtons disease: first evidence from human in vivo neuroimaging

Journal of Neurology Neurosurgery & Psychiatry ◽

10.1136/jnnp-2021-bnpa.15 ◽

2021 ◽

Vol 92 (8) ◽

pp. A6.1-A6

Author(s):

Akshay Nair ◽

Adeel Razi ◽

Sarah Gregory ◽

Robb Rutledge ◽

Geraint Rees ◽

...

Keyword(s):

Basal Ganglia ◽

Bayesian Model ◽

Empirical Bayes ◽

Model Comparison ◽

De Novo ◽

Effective Connectivity ◽

Indirect Pathway ◽

Direct Pathway ◽

Bayesian Model Comparison

BackgroundThe gating of movement in humans is thought to depend on activity within the cortico-striato-thalamic loops. Within these loops, emerging from the cells of the striatum, run two opponent pathways the direct and indirect pathway. Both are complex and polysynaptic but the overall effect of activity within these pathways is to encourage and inhibit movement respectively. In Huntingtons disease (HD), the preferential early loss of striatal neurons forming the indirect pathway is thought to lead to disinhibition that gives rise to the characteristic motor features of the condition. But early HD is also specifically associated with apathy, a failure to engage in goal-directed movement. We hypothesised that in HD, motor signs and apathy may be selectively correlated with indirect and direct pathway dysfunction respectively.MethodsUsing a novel technique for estimating dynamic effective connectivity of the basal ganglia, we tested both of these hypotheses in vivo for the first time in a large cohort of patients with prodromal HD (n = 94). We used spectral dynamic casual modelling of resting state fMRI data to model effective connectivity in a model of these cortico-striatal pathways. We used an advanced approach at the group level by combining Parametric Empirical Bayes and Bayesian Model Reduction procedure to generate large number of competing models and compare them by using Bayesian model comparison.ResultsWith this fully Bayesian approach, associations between clinical measures and connectivity parameters emerge de novo from the data. We found very strong evidence (posterior probability > 0.99) to support both of our hypotheses. Firstly, more severe motor signs in HD were associated with altered connectivity in the indirect pathway and by comparison, loss of goal-direct behaviour or apathy, was associated with changes in the direct pathway component of our model.ConclusionsThe empirical evidence we provide here is the first in vivo demonstration that imbalanced basal ganglia connectivity may play an important role in the pathogenesis of some of commonest and disabling features of HD and may have important implications for therapeutics.

Download Full-text

Ensemble learning for classifying single-cell data and projection across reference atlases

Bioinformatics ◽

10.1093/bioinformatics/btaa137 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3585-3587

Author(s):

Lin Wang ◽

Francisca Catalan ◽

Karin Shamardani ◽

Husam Babikir ◽

Aaron Diaz

Keyword(s):

Single Cell ◽

Cell Types ◽

Status Quo ◽

Supplementary Information ◽

Published Data ◽

Supplementary Data ◽

Cell Type ◽

Low Sensitivity ◽

Project Data ◽

Cell Data

Abstract Summary Single-cell data are being generated at an accelerating pace. How best to project data across single-cell atlases is an open problem. We developed a boosted learner that overcomes the greatest challenge with status quo classifiers: low sensitivity, especially when dealing with rare cell types. By comparing novel and published data from distinct scRNA-seq modalities that were acquired from the same tissues, we show that this approach preserves cell-type labels when mapping across diverse platforms. Availability and implementation https://github.com/diazlab/ELSA Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Bayesian Inference for Single-cell Clustering and Imputing

Genomics and Computational Biology ◽

10.18547/gcb.2017.vol3.iss1.e46 ◽

2017 ◽

Vol 3 (1) ◽

pp. 46 ◽

Cited By ~ 25

Author(s):

Elham Azizi ◽

Sandhya Prabhakaran ◽

Ambrose Carr ◽

Dana Pe'er

Keyword(s):

Single Cell ◽

Cell Types ◽

Superior Performance ◽

Underlying Structure ◽

Specific Information ◽

Cell Type ◽

Cell Clustering ◽

Bayesian Probabilistic Model ◽

Cell Type Specific ◽

Cell Data

Single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is noise-prone due to experimental errors and cell type-specific biases. Current computational approaches for analyzing single-cell data involve a global normalization step which introduces incorrect biases and spurious noise and does not resolve missing data (dropouts). This can lead to misleading conclusions in downstream analyses. Moreover, a single normalization removes important cell type-specific information. We propose a data-driven model, BISCUIT, that iteratively normalizes and clusters cells, thereby separating noise from interesting biological signals. BISCUIT is a Bayesian probabilistic model that learns cell-specific parameters to intelligently drive normalization. This approach displays superior performance to global normalization followed by clustering in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.

Download Full-text

Optimal transport analysis reveals trajectories in steady-state systems

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009466 ◽

2021 ◽

Vol 17 (12) ◽

pp. e1009466

Author(s):

Stephen Zhang ◽

Anton Afanassiev ◽

Laura Greenstreet ◽

Tetsuya Matsumoto ◽

Geoffrey Schiebinger

Keyword(s):

Single Cell ◽

Optimal Transport ◽

Single Cell Analysis ◽

Simulated Data ◽

Unified Approach ◽

Transport Analysis ◽

Time Courses ◽

Cell Trajectories ◽

Cell Data ◽

Natural Way

Understanding how cells change their identity and behaviour in living systems is an important question in many fields of biology. The problem of inferring cell trajectories from single-cell measurements has been a major topic in the single-cell analysis community, with different methods developed for equilibrium and non-equilibrium systems (e.g. haematopoeisis vs. embryonic development). We show that optimal transport analysis, a technique originally designed for analysing time-courses, may also be applied to infer cellular trajectories from a single snapshot of a population in equilibrium. Therefore, optimal transport provides a unified approach to inferring trajectories that is applicable to both stationary and non-stationary systems. Our method, StationaryOT, is mathematically motivated in a natural way from the hypothesis of a Waddington’s epigenetic landscape. We implement StationaryOT as a software package and demonstrate its efficacy in applications to simulated data as well as single-cell data from Arabidopsis thaliana root development.

Download Full-text

Cell type prioritization in single-cell data

Nature Biotechnology ◽

10.1038/s41587-020-0605-1 ◽

2020 ◽

Cited By ~ 2

Author(s):

Michael A. Skinnider ◽

Jordan W. Squair ◽

Claudia Kathe ◽

Mark A. Anderson ◽

Matthieu Gautier ◽

...

Keyword(s):

Single Cell ◽

Cell Type ◽

Cell Data

Download Full-text

Mapping single-cell data to reference atlases by transfer learning

Nature Biotechnology ◽

10.1038/s41587-021-01001-7 ◽

2021 ◽

Cited By ~ 2

Author(s):

Mohammad Lotfollahi ◽

Mohsen Naghipourfar ◽

Malte D. Luecken ◽

Matin Khajavi ◽

Maren Büttner ◽

...

Keyword(s):

Single Cell ◽

Transfer Learning ◽

Learning Strategy ◽

De Novo ◽

Specific Cell ◽

Batch Effects ◽

Raw Data ◽

Computational Resources ◽

Cell Data ◽

Biological State

AbstractLarge single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.

Download Full-text