scholarly journals Automated population identification and sorting algorithms for high-dimensional single-cell data

2016 ◽  
Author(s):  
Benedict Anchang ◽  
Sylvia K. Plevritis

AbstractCell sorting or gating homogenous subpopulations from single-cell data enables cell-type specific characterization, such as cell-type genomic profiling as well as the study of tumor progression. This highlight summarizes recently developed automated gating algorithms that are optimized for both population identification and sorting homogeneous single cells in heterogeneous single-cell data. Data-driven gating strategies identify and/or sort homogeneous subpopulations from a heterogeneous population without relying on expert knowledge thereby removing human bias and variability. We further describe an optimized cell sorting strategy called CCAST based on Clustering, Classification and Sorting Trees which identifies the relevant gating markers, gating hierarchy and partitions that define underlying cell subpopulations. CCAST identifies more homogeneous subpopulations in several applications compared to prior sorting strategies and reveals simultaneous intracellular signaling across different lineage subtypes under different experimental conditions.

2021 ◽  
Author(s):  
Belinda Phipson ◽  
Choon Boon Sim ◽  
Enzo R. Porrello ◽  
Alex W Hewitt ◽  
Joseph Powell ◽  
...  

Single cell RNA Sequencing (scRNA-seq) has rapidly gained popularity over the last few years for profiling the transcriptomes of thousands to millions of single cells. To date, there are more than a thousand software packages that have been developed to analyse scRNA-seq data. These focus predominantly on visualization, dimensionality reduction and cell type identification. Single cell technology is now being used to analyse experiments with complex designs including biological replication. One question that can be asked from single cell experiments which has not been possible to address with bulk RNA-seq data is whether the cell type proportions are different between two or more experimental conditions. As well as gene expression changes, the relative depletion or enrichment of a particular cell type can be the functional consequence of disease or treatment. However, cell type proportions estimates from scRNA-seq data are variable and statistical methods that can correctly account for different sources of variability are needed to confidently identify statistically significant shifts in cell type composition between experimental conditions. We present propeller, a robust and flexible method that leverages biological replication to find statistically significant differences in cell type proportions between groups. The propeller method is publicly available in the open source speckle R package (https://github.com/Oshlack/speckle).


2017 ◽  
Vol 3 (1) ◽  
pp. 46 ◽  
Author(s):  
Elham Azizi ◽  
Sandhya Prabhakaran ◽  
Ambrose Carr ◽  
Dana Pe'er

Single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is noise-prone due to experimental errors and cell type-specific biases. Current computational approaches for analyzing single-cell data involve a global normalization step which introduces incorrect biases and spurious noise and does not resolve missing data (dropouts). This can lead to misleading conclusions in downstream analyses. Moreover, a single normalization removes important cell type-specific information. We propose a data-driven model, BISCUIT, that iteratively normalizes and clusters cells, thereby separating noise from interesting biological signals. BISCUIT is a Bayesian probabilistic model that learns cell-specific parameters to intelligently drive normalization. This approach displays superior performance to global normalization followed by clustering in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yafei Lyu ◽  
Randy Zauhar ◽  
Nicholas Dana ◽  
Christianne E. Strang ◽  
Jian Hu ◽  
...  

AbstractAge‐related macular degeneration (AMD) is a blinding eye disease with no unifying theme for its etiology. We used single-cell RNA sequencing to analyze the transcriptomes of ~ 93,000 cells from the macula and peripheral retina from two adult human donors and bulk RNA sequencing from fifteen adult human donors with and without AMD. Analysis of our single-cell data identified 267 cell-type-specific genes. Comparison of macula and peripheral retinal regions found no cell-type differences but did identify 50 differentially expressed genes (DEGs) with about 1/3 expressed in cones. Integration of our single-cell data with bulk RNA sequencing data from normal and AMD donors showed compositional changes more pronounced in macula in rods, microglia, endothelium, Müller glia, and astrocytes in the transition from normal to advanced AMD. KEGG pathway analysis of our normal vs. advanced AMD eyes identified enrichment in complement and coagulation pathways, antigen presentation, tissue remodeling, and signaling pathways including PI3K-Akt, NOD-like, Toll-like, and Rap1. These results showcase the use of single-cell RNA sequencing to infer cell-type compositional and cell-type-specific gene expression changes in intact bulk tissue and provide a foundation for investigating molecular mechanisms of retinal disease that lead to new therapeutic targets.


2018 ◽  
Author(s):  
Elior Rahmani ◽  
Regev Schweiger ◽  
Brooke Rhead ◽  
Lindsey A. Criswell ◽  
Lisa F. Barcellos ◽  
...  

AbstractHigh costs and technical limitations of cell sorting and single-cell techniques currently restrict the collection of large-scale, cell-type-specific DNA methylation data. This, in turn, impedes our ability to tackle key biological questions that pertain to variation within a population, such as identification of disease-associated genes at a cell-type-specific resolution. Here, we show mathematically and empirically that cell-type-specific methylation levels of an individual can be learned from its tissue-level bulk data, conceptually emulating the case where the individual has been profiled with a single-cell resolution and then signals were aggregated in each cell population separately. Provided with this unprecedented way to perform powerful large-scale epigenetic studies with cell-type-specific resolution, we revisit previous studies with tissue-level bulk methylation and reveal novel associations with leukocyte composition in blood and with rheumatoid arthritis. For the latter, we further show consistency with validation data collected from sorted leukocyte sub-types. Corresponding software is available from: https://github.com/cozygene/TCA.


eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Nathan D Lawson ◽  
Rui Li ◽  
Masahiro Shin ◽  
Ann Grosse ◽  
Onur Yukselen ◽  
...  

The zebrafish is ideal for studying embryogenesis and is increasingly applied to model human disease. In these contexts, RNA-sequencing (RNA-seq) provides mechanistic insights by identifying transcriptome changes between experimental conditions. Application of RNA-seq relies on accurate transcript annotation for a genome of interest. Here, we find discrepancies in analysis from RNA-seq datasets quantified using Ensembl and RefSeq zebrafish annotations. These issues were due, in part, to variably annotated 3' untranslated regions and thousands of gene models missing from each annotation. Since these discrepancies could compromise downstream analyses and biological reproducibility, we built a more comprehensive zebrafish transcriptome annotation that addresses these deficiencies. Our annotation improves detection of cell type-specific genes in both bulk and single cell RNA-seq datasets, where it also improves resolution of cell clustering. Thus, we demonstrate that our new transcriptome annotation can outperform existing annotations, providing an important resource for zebrafish researchers.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Elior Rahmani ◽  
Regev Schweiger ◽  
Brooke Rhead ◽  
Lindsey A. Criswell ◽  
Lisa F. Barcellos ◽  
...  

2018 ◽  
Author(s):  
Vera Zywitza ◽  
Aristotelis Misios ◽  
Lena Bunatyan ◽  
Thomas E. Willnow ◽  
Nikolaus Rajewsky

SUMMARYNeural stem cells (NSCs) contribute to plasticity and repair of the adult brain. Niches harboring NSCs are crucial for regulating stem cell self-renewal and differentiation. We used single-cell RNA profiling to generate an unbiased molecular atlas of all cell types in the largest neurogenic niche of the adult mouse brain, the subventricular zone (SVZ). We characterized > 20 neural and non-neural cell types and gained insights into the dynamics of neurogenesis by predicting future cell states based on computational analysis of RNA kinetics. Furthermore, we apply our single-cell approach to mice lacking LRP2, an endocytic receptor required for SVZ maintenance. The number of NSCs and proliferating progenitors was significantly reduced. Moreover, Wnt and BMP4 signaling was perturbed. We provide a valuable resource for adult neurogenesis, insights into SVZ neurogenesis regulation by LRP2, and a proof-of-principle demonstrating the power of single-cell RNA-seq in pinpointing neural cell type-specific functions in loss-of-function models.HIGHLIGHTSunbiased single-cell transcriptomics characterizes adult NSCs and their nichecell type-specific signatures and marker genes for 22 SVZ cell typesFree online tool to assess gene expression across 9,804 single cellscell type-specific dysfunctions underlying impaired adult neurogenesis


2021 ◽  
Author(s):  
Wancen Mu ◽  
Hirak Sarkar ◽  
Avi Srivastava ◽  
Kwangbom Choi ◽  
Rob Patro ◽  
...  

Motivation: Allelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial-, or time-dependent AI signals may be dampened or not detected. Results: We introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing (scRNA-seq) data, or other spatially- or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower RMSE of allelic ratio estimates than existing methods. In real data, airpart identified differential AI patterns across cell states and could be used to define trends of AI signal over spatial or time axes. Availability: The airpart package is available as a R/Bioconductor package at https://bioconductor.org/packages/airpart.


2021 ◽  
Author(s):  
Lei Xiong ◽  
Kang Tian ◽  
Yuzhe Li ◽  
Qiangfeng Cliff Zhang

Single-cell RNA-seq and ATAC-seq analyses have been widely applied to decipher cell-type and regulation complexities. However, experimental conditions often confound biological variations when comparing data from different samples. For integrative single-cell data analysis, we have developed SCALEX, a deep generative framework that maps cells into a generalized, batch-invariant cell-embedding space. We demonstrate that SCALEX accurately and efficiently integrates heterogenous single-cell data using multiple benchmarks. It outperforms competing methods, especially for datasets with partial overlaps, accurately aligning similar cell populations while retaining true biological differences. We demonstrate the advantages of SCALEX by constructing continuously expandable single-cell atlases for human, mouse, and COVID-19, which were assembled from multiple data sources and can keep growing through the inclusion of new incoming data. Analyses based on these atlases revealed the complex cellular landscapes of human and mouse tissues and identified multiple peripheral immune subtypes associated with COVID-19 disease severity.


2021 ◽  
Author(s):  
Lei Xiong ◽  
Kang Tian ◽  
Yuzhe Li ◽  
Qiangfeng Zhang

Abstract Single-cell RNA-seq and ATAC-seq analyses have been widely applied to decipher cell-type and regulation complexities. However, experimental conditions often confound biological variations when comparing data from different samples. For integrative single-cell data analysis, we have developed SCALEX, a deep generative framework that maps cells into a generalized, batch-invariant cell-embedding space. We demonstrate that SCALEX accurately and efficiently integrates heterogenous single-cell data using multiple benchmarks. It outperforms competing methods, especially for datasets with partial overlaps, accurately aligning similar cell populations while r,etaining true biological differences. We demonstrate the advantages of SCALEX by constructing continuously expandable single-cell atlases for human, mouse, and COVID-19, which were assembled from multiple data sources and can keep growing through the inclusion of new incoming data. Analyses based on these atlases revealed the complex cellular landscapes of human and mouse tissues and identified multiple peripheral immune subtypes associated with COVID-19 disease severity.


Sign in / Sign up

Export Citation Format

Share Document