Learning for single-cell assignment

Bin Duan; Chenyu Zhu; Guohui Chuai; Chen Tang; Xiaohan Chen; Shaoqi Chen; Shaliu Fu; Gaoyang Li; Qi Liu

doi:10.1126/sciadv.abd0855

Learning for single-cell assignment

Science Advances ◽

10.1126/sciadv.abd0855 ◽

2020 ◽

Vol 6 (44) ◽

pp. eabd0855

Author(s):

Bin Duan ◽

Chenyu Zhu ◽

Guohui Chuai ◽

Chen Tang ◽

Xiaohan Chen ◽

...

Keyword(s):

Single Cell ◽

Marker Gene ◽

Cell Types ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Limited Effectiveness ◽

Benchmark Datasets ◽

Cell Assignment ◽

Single Cell Type ◽

Sequencing Data Analysis

Efficient single-cell assignment without prior marker gene annotations is essential for single-cell sequencing data analysis. Current methods, however, have limited effectiveness for distinct single-cell assignment. They failed to achieve a well-generalized performance in different tasks because of the inherent heterogeneity of different single-cell sequencing datasets and different single-cell types. Furthermore, current methods are inefficient to identify novel cell types that are absent in the reference datasets. To this end, we present scLearn, a learning-based framework that automatically infers quantitative measurement/similarity and threshold that can be used for different single-cell assignment tasks, achieving a well-generalized assignment performance on different single-cell types. We evaluated scLearn on a comprehensive set of publicly available benchmark datasets. We proved that scLearn outperformed the comparable existing methods for single-cell assignment from various aspects, demonstrating state-of-the-art effectiveness with a reliable and generalized single-cell type identification and categorizing ability.

Download Full-text

484 Bioturing browser: interactively explore public single cell sequencing data

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0484 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A520-A520

Author(s):

Son Pham ◽

Tri Le ◽

Tan Phan ◽

Minh Pham ◽

Huy Nguyen ◽

...

Keyword(s):

Single Cell ◽

Immune Cell ◽

Expression Profiles ◽

Meta Analysis ◽

Cell Types ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Data Formats ◽

Cancer Types ◽

Cell Data

BackgroundSingle-cell sequencing technology has opened an unprecedented ability to interrogate cancer. It reveals significant insights into the intratumoral heterogeneity, metastasis, therapeutic resistance, which facilitates target discovery and validation in cancer treatment. With rapid advancements in throughput and strategies, a particular immuno-oncology study can produce multi-omics profiles for several thousands of individual cells. This overflow of single-cell data poses formidable challenges, including standardizing data formats across studies, performing reanalysis for individual datasets and meta-analysis.MethodsN/AResultsWe present BioTuring Browser, an interactive platform for accessing and reanalyzing published single-cell omics data. The platform is currently hosting a curated database of more than 10 million cells from 247 projects, covering more than 120 immune cell types and subtypes, and 15 different cancer types. All data are processed and annotated with standardized labels of cell types, diseases, therapeutic responses, etc. to be instantly accessed and explored in a uniform visualization and analytics interface. Based on this massive curated database, BioTuring Browser supports searching similar expression profiles, querying a target across datasets and automatic cell type annotation. The platform supports single-cell RNA-seq, CITE-seq and TCR-seq data. BioTuring Browser is now available for download at www.bioturing.com.ConclusionsN/A

Download Full-text

Integrated analysis of bulk multi omic and single-cell sequencing data confirms the molecular origin of hemodynamic changes in Covid-19 infection explaining coagulopathy and higher geriatric mortality

10.1101/2020.04.26.20081182 ◽

2020 ◽

Author(s):

Shreya Johri ◽

Deepali Jain ◽

Ishaan Gupta

Keyword(s):

Gene Expression ◽

Single Cell ◽

Older Patients ◽

Cell Types ◽

Integrated Analysis ◽

Molecular Evidence ◽

Sequencing Data ◽

Phagocytic Cells ◽

Hemodynamic Changes ◽

Single Cell Sequencing

AbstractBesides severe respiratory distress, recent reports in Covid-19 patients have found a strong association between platelet counts and patient survival. Along with hemodynamic changes such as prolonged clotting time, high fibrin degradation products and D-dimers, increased levels of monocytes with disturbed morphology have also been identified. In this study, through an integrated analysis of bulk RNA-sequencing data from Covid-19 patients with data from single-cell sequencing studies on lung tissues, we found that most of the cell-types that contributed to the altered gene expression were of hematopoietic origin. We also found that differentially expressed genes in Covid-19 patients formed a significant pool of the expressing genes in phagocytic cells such as Monocytes and platelets. Interestingly, while we observed a general enrichment for Monocytes in Covid-19 patients, we found that the signal for FCGRA3+ Monocytes was depleted. Further, we found evidence that age-associated gene expression changes in Monocytes and platelets, associated with inflammation, mirror gene expression changes in Covid-19 patients suggesting that pro-inflammatory signalling during aging may worsen the infection in older patients. We identified more than 20 genes that change in the same direction between Covid-19 infection and aging cells that may act as potential therapeutic targets. Of particular interest were IL2RG, GNLY and GMZA expressed in platelets, which facilitates cytokine signalling in Monocytes through an interaction with platelets. To understand whether infection can directly manipulate the biology of Monocytes and platelets, we hypothesize that these non-ACE2 expressing cells may be infected by the virus through the phagocytic route. We observed that phagocytic cells such as Monocytes, T-cells, and platelets have a significantly higher expression of genes that are a part of the Covid-19 viral interactome. Hence these cell-types may have an active rather than a reactive role in viral pathogenesis to manifest clinical symptoms such as coagulopathy. Therefore, our results present molecular evidence for pursuing both anti-inflammatory and anticoagulation therapy for better patient management especially in older patients.

Download Full-text

Interpretable Autoencoders Trained on Single Cell Sequencing Data Can Transfer Directly to Data from Unseen Tissues

Cells ◽

10.3390/cells11010085 ◽

2021 ◽

Vol 11 (1) ◽

pp. 85

Author(s):

Julie Sparholt Walbech ◽

Savvas Kinalis ◽

Ole Winther ◽

Finn Cilius Nielsen ◽

Frederik Otzen Bagger

Keyword(s):

Single Cell ◽

Cell Types ◽

Biological Pathways ◽

Training Data ◽

Sequencing Data ◽

Data Simulation ◽

Single Cell Sequencing ◽

Saliency Maps ◽

Unseen Data ◽

Biological Concepts

Autoencoders have been used to model single-cell mRNA-sequencing data with the purpose of denoising, visualization, data simulation, and dimensionality reduction. We, and others, have shown that autoencoders can be explainable models and interpreted in terms of biology. Here, we show that such autoencoders can generalize to the extent that they can transfer directly without additional training. In practice, we can extract biological modules, denoise, and classify data correctly from an autoencoder that was trained on a different dataset and with different cells (a foreign model). We deconvoluted the biological signal encoded in the bottleneck layer of scRNA-models using saliency maps and mapped salient features to biological pathways. Biological concepts could be associated with specific nodes and interpreted in relation to biological pathways. Even in this unsupervised framework, with no prior information about cell types or labels, the specific biological pathways deduced from the model were in line with findings in previous research. It was hypothesized that autoencoders could learn and represent meaningful biology; here, we show with a systematic experiment that this is true and even transcends the training data. This means that carefully trained autoencoders can be used to assist the interpretation of new unseen data.

Download Full-text

Accurate and fast cell marker gene identification with COSG

10.1101/2021.06.15.448484 ◽

2021 ◽

Author(s):

Min Dai ◽

Xiaobing Pei ◽

Xiu-Jie Wang

Keyword(s):

Single Cell ◽

Marker Gene ◽

Cell Types ◽

Superior Performance ◽

Gene Identification ◽

Marker Genes ◽

Sequencing Data ◽

Cell Type Specificity ◽

Spatially Resolved ◽

Downstream Analysis

Accurate cell classification is the groundwork for downstream analysis of single-cell sequencing data, yet how to identify marker genes to distinguish different cell types still remains as a big challenge. We developed COSG as a cosine similarity-based method for more accurate and scalable marker gene identification. COSG is applicable to single-cell RNA sequencing data, single-cell ATAC sequencing data and spatially resolved transcriptome data. COSG is fast and scalable for ultra-large datasets of million-scale cells. Application on both simulated and real experimental datasets demonstrates the superior performance of COSG in terms of both accuracy and efficiency as compared with other available methods. Marker genes or genomic regions identified by COSG are more indicative and with greater cell-type specificity.

Download Full-text

Combination of single cell sequencing data and GWAS summary statistics reveals genetically-influenced liver cell types for primary biliary cholangitis

10.1101/2021.08.18.21262250 ◽

2021 ◽

Author(s):

Bingyu Xiang ◽

Chunyu Deng ◽

Jingjing Li ◽

Shanshan Li ◽

Huifang Zhang ◽

...

Keyword(s):

T Cell ◽

Single Cell ◽

Liver Cell ◽

Cell Types ◽

Liver Cells ◽

Primary Biliary Cholangitis ◽

Summary Statistics ◽

Sequencing Data ◽

Immune Microenvironment ◽

Single Cell Sequencing

Many genome-wide association studies (GWAS) have reported that numerous genetic loci were significantly associated with primary biliary cholangitis (PBC). However, the effects of genetic determinants on liver cells and its immune microenvironment for PBC remain unclear. We constructed a powerful computational framework to integrate a large-scale GWAS summary statistics (N = 13,239) with scRNA-seq data to uncover genetics-modulated liver cell subpopulations for PBC. We found that 29 genes including ORMDL3, GSNK2B, and DDAH2 were significantly associated with PBC susceptibility. Gene-property analysis revealed that four immune cell types including Cst3+ dendritic cell, Chil3+ macrophage, Trbc2+ T cell, and Gzma+ T cell were significantly enriched by PBC-risk genes. By combining GWAS summary statistics with scRNA-seq data, we identified that cholangiocytes exhibited a notable enrichment by PBC-related genetic association signals. The ORMDL3 gene showed the highest expression proportion in cholangiocytes than other liver cells (22.38%). Compared with ORMDL3+ cholangiocytes, we identified that ORMDL3- cholangiocytes predispose to play important immune-modulatory roles in the etiology of PBC. To the best of our knowledge, this is the first study to integrate human genetic information with single cell sequencing data for parsing genetics-influenced liver cells and its immune microenvironment for PBC risk.

Download Full-text

Single-cell transcriptomic analysis of mIHC images via antigen mapping

Science Advances ◽

10.1126/sciadv.abc5464 ◽

2021 ◽

Vol 7 (10) ◽

pp. eabc5464

Author(s):

Kiya W. Govek ◽

Emma C. Troisi ◽

Zhen Miao ◽

Rachael G. Aubin ◽

Steven Woodhouse ◽

...

Keyword(s):

Single Cell ◽

Spatial Patterns ◽

Cell Types ◽

Level Of Detail ◽

Cell Populations ◽

Sequencing Data ◽

Spatially Resolved ◽

Murine Spleen ◽

Single Cell Rna Sequencing ◽

Antibody Panel

Highly multiplexed immunohistochemistry (mIHC) enables the staining and quantification of dozens of antigens in a tissue section with single-cell resolution. However, annotating cell populations that differ little in the profiled antigens or for which the antibody panel does not include specific markers is challenging. To overcome this obstacle, we have developed an approach for enriching mIHC images with single-cell RNA sequencing data, building upon recent experimental procedures for augmenting single-cell transcriptomes with concurrent antigen measurements. Spatially-resolved Transcriptomics via Epitope Anchoring (STvEA) performs transcriptome-guided annotation of highly multiplexed cytometry datasets. It increases the level of detail in histological analyses by enabling the systematic annotation of nuanced cell populations, spatial patterns of transcription, and interactions between cell types. We demonstrate the utility of STvEA by uncovering the architecture of poorly characterized cell types in the murine spleen using published cytometry and mIHC data of this organ.

Download Full-text

Single-cell data clustering based on sparse optimization and low-rank matrix factorization

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab098 ◽

2021 ◽

Author(s):

Yinlei Hu ◽

Bin Li ◽

Falai Chen ◽

Kun Qu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Matrix Factorization ◽

Data Clustering ◽

Cell Types ◽

Low Rank ◽

Sequencing Data ◽

Rank Matrix ◽

Single Cell Rna Sequencing ◽

Low Rank Matrix

Abstract Unsupervised clustering is a fundamental step of single-cell RNA sequencing data analysis. This issue has inspired several clustering methods to classify cells in single-cell RNA sequencing data. However, accurate prediction of the cell clusters remains a substantial challenge. In this study, we propose a new algorithm for single-cell RNA sequencing data clustering based on Sparse Optimization and low-rank matrix factorization (scSO). We applied our scSO algorithm to analyze multiple benchmark datasets and showed that the cluster number predicted by scSO was close to the number of reference cell types and that most cells were correctly classified. Our scSO algorithm is available at https://github.com/QuKunLab/scSO. Overall, this study demonstrates a potent cell clustering approach that can help researchers distinguish cell types in single-cell RNA sequencing data.

Download Full-text

Using single-cell cytometry to illustrate integrated multi-perspective evaluation of clustering algorithms using Pareto fronts

Bioinformatics ◽

10.1093/bioinformatics/btab038 ◽

2021 ◽

Author(s):

Givanna H Putri ◽

Irena Koprinska ◽

Thomas M Ashhurst ◽

Nicholas J C King ◽

Mark N Read

Keyword(s):

Single Cell ◽

Performance Metrics ◽

Clustering Algorithms ◽

Latin Hypercube Sampling ◽

Supplementary Information ◽

Sequencing Data ◽

Evaluation Protocol ◽

Benchmark Datasets ◽

Pareto Fronts ◽

Parameter Values

Abstract Motivation Many ‘automated gating’ algorithms now exist to cluster cytometry and single-cell sequencing data into discrete populations. Comparative algorithm evaluations on benchmark datasets rely either on a single performance metric, or a few metrics considered independently of one another. However, single metrics emphasize different aspects of clustering performance and do not rank clustering solutions in the same order. This underlies the lack of consensus between comparative studies regarding optimal clustering algorithms and undermines the translatability of results onto other non-benchmark datasets. Results We propose the Pareto fronts framework as an integrative evaluation protocol, wherein individual metrics are instead leveraged as complementary perspectives. Judged superior are algorithms that provide the best trade-off between the multiple metrics considered simultaneously. This yields a more comprehensive and complete view of clustering performance. Moreover, by broadly and systematically sampling algorithm parameter values using the Latin Hypercube sampling method, our evaluation protocol minimizes (un)fortunate parameter value selections as confounding factors. Furthermore, it reveals how meticulously each algorithm must be tuned in order to obtain good results, vital knowledge for users with novel data. We exemplify the protocol by conducting a comparative study between three clustering algorithms (ChronoClust, FlowSOM and Phenograph) using four common performance metrics applied across four cytometry benchmark datasets. To our knowledge, this is the first time Pareto fronts have been used to evaluate the performance of clustering algorithms in any application domain. Availability and implementation Implementation of our Pareto front methodology and all scripts and datasets to reproduce this article are available at https://github.com/ghar1821/ParetoBench. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text