SCMAG: A Semisupervised Single-Cell Clustering Method Based on Matrix Aggregation Graph Convolutional Neural Network

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/6842752 ◽

2021 ◽

Vol 2021 ◽

pp. 1-6

Author(s):

Wenliang Gao ◽

Yuanyuan Li ◽

Chujie Fang ◽

Wei Fan ◽

Haonan Peng

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Single Cell ◽

Prior Information ◽

Clustering Algorithm ◽

Cell Types ◽

Clustering Method ◽

Cell Clustering ◽

Semisupervised Clustering ◽

Cell Data

Clustering analysis is one of the most important technologies for single-cell data mining. It is widely used in the division of different gene sequences, the identification of functional genes, and the detection of new cell types. Although the traditional unsupervised clustering method does not require label data, the distribution of the original data, the setting of hyperparameters, and other factors all affect the effectiveness of the clustering algorithm. While in some cases the type of some cells is known, it is hoped to achieve high accuracy if the prior information about those cells is utilized sufficiently. In this study, we propose SCMAG (a semisupervised single-cell clustering method based on a matrix aggregation graph convolutional neural network) that takes into full consideration the prior information for single-cell data. To evaluate the performance of the proposed semisupervised clustering method, we test on different single-cell datasets and compare with the current semisupervised clustering algorithm in recognizing cell types on various real scRNA-seq data; the results show that it is a more accurate and significant model.

Download Full-text

Identifying cell types from single-cell data based on similarities and dissimilarities between cells

BMC Bioinformatics ◽

10.1186/s12859-020-03873-z ◽

2021 ◽

Vol 22 (S3) ◽

Author(s):

Yuanyuan Li ◽

Ping Luo ◽

Yi Lu ◽

Fang-Xiang Wu

Keyword(s):

Gene Expression ◽

Single Cell ◽

Spectral Clustering ◽

Incidence Matrix ◽

Expression Patterns ◽

Cell Types ◽

Clustering Method ◽

Different Types ◽

Cell Data ◽

Spectral Clustering Method

Abstract Background With the development of the technology of single-cell sequence, revealing homogeneity and heterogeneity between cells has become a new area of computational systems biology research. However, the clustering of cell types becomes more complex with the mutual penetration between different types of cells and the instability of gene expression. One way of overcoming this problem is to group similar, related single cells together by the means of various clustering analysis methods. Although some methods such as spectral clustering can do well in the identification of cell types, they only consider the similarities between cells and ignore the influence of dissimilarities on clustering results. This methodology may limit the performance of most of the conventional clustering algorithms for the identification of clusters, it needs to develop special methods for high-dimensional sparse categorical data. Results Inspired by the phenomenon that same type cells have similar gene expression patterns, but different types of cells evoke dissimilar gene expression patterns, we improve the existing spectral clustering method for clustering single-cell data that is based on both similarities and dissimilarities between cells. The method first measures the similarity/dissimilarity among cells, then constructs the incidence matrix by fusing similarity matrix with dissimilarity matrix, and, finally, uses the eigenvalues of the incidence matrix to perform dimensionality reduction and employs the K-means algorithm in the low dimensional space to achieve clustering. The proposed improved spectral clustering method is compared with the conventional spectral clustering method in recognizing cell types on several real single-cell RNA-seq datasets. Conclusions In summary, we show that adding intercellular dissimilarity can effectively improve accuracy and achieve robustness and that improved spectral clustering method outperforms the traditional spectral clustering method in grouping cells.

Download Full-text

Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data

BMC Bioinformatics ◽

10.1186/s12859-020-03797-8 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Chunxiang Wang ◽

Xin Gao ◽

Juntao Liu

Keyword(s):

Single Cell ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Preprocessing ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Preprocessing Method ◽

Cell Clustering ◽

Cell Gene Expression

Abstract Background Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. Results We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. Conclusion The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.

Download Full-text

HiCluster: A Robust Single-Cell Hi-C Clustering Method Based on Convolution and Random Walk

10.1101/506717 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jingtian Zhou ◽

Jianzhu Ma ◽

Yusi Chen ◽

Chuankai Cheng ◽

Bokan Bao ◽

...

Keyword(s):

Random Walk ◽

Single Cell ◽

Clustering Algorithm ◽

Single Cell Analysis ◽

Single Cells ◽

Genome Structure ◽

Real Data ◽

Cell Types ◽

3D Genome ◽

Cell Clustering

3D genome structure plays a pivotal role in gene regulation and cellular function. Single-cell analysis of genome architecture has been achieved using imaging and chromatin conformation capture methods such as Hi-C. To study variation in chromosome structure between different cell types, computational approaches are needed that can utilize sparse and heterogeneous single-cell Hi-C data. However, few methods exist that are able to accurately and efficiently cluster such data into constituent cell types. Here, we describe HiCluster, a single-cell clustering algorithm for Hi-C contact matrices that is based on imputations using linear convolution and random walk. Using both simulated and real data as benchmarks, HiCluster significantly improves clustering accuracy when applied to low coverage Hi-C datasets compared to existing methods. After imputation by HiCluster, structures similar to topologically associating domains (TADs) could be identified within single cells, and their consensus boundaries among cells were enriched at the TAD boundaries observed in bulk samples. In summary, HiCluster facilitates visualization and comparison of single-cell 3D genomes.

Download Full-text

Bayesian Inference for Single-cell Clustering and Imputing

Genomics and Computational Biology ◽

10.18547/gcb.2017.vol3.iss1.e46 ◽

2017 ◽

Vol 3 (1) ◽

pp. 46 ◽

Cited By ~ 25

Author(s):

Elham Azizi ◽

Sandhya Prabhakaran ◽

Ambrose Carr ◽

Dana Pe'er

Keyword(s):

Single Cell ◽

Cell Types ◽

Superior Performance ◽

Underlying Structure ◽

Specific Information ◽

Cell Type ◽

Cell Clustering ◽

Bayesian Probabilistic Model ◽

Cell Type Specific ◽

Cell Data

Single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is noise-prone due to experimental errors and cell type-specific biases. Current computational approaches for analyzing single-cell data involve a global normalization step which introduces incorrect biases and spurious noise and does not resolve missing data (dropouts). This can lead to misleading conclusions in downstream analyses. Moreover, a single normalization removes important cell type-specific information. We propose a data-driven model, BISCUIT, that iteratively normalizes and clusters cells, thereby separating noise from interesting biological signals. BISCUIT is a Bayesian probabilistic model that learns cell-specific parameters to intelligently drive normalization. This approach displays superior performance to global normalization followed by clustering in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.

Download Full-text

Comparison Between UMAP and t-SNE for Multiplex-Immunofluorescence Derived Single-Cell Data from Tissue Sections

10.1101/549659 ◽

2019 ◽

Cited By ~ 1

Author(s):

Duoduo Wu ◽

Joe Yeong Poh Sheng ◽

Grace Tan Su-En ◽

Marion Chevrier ◽

Josh Loh Jie Hua ◽

...

Keyword(s):

Single Cell ◽

Clustering Algorithm ◽

Cell Types ◽

Immune Markers ◽

Tissue Samples ◽

Tissue Sections ◽

Reduced Dimensions ◽

Dimensionality Reduction Technique ◽

Cell Data ◽

Worse Prognosis

AbstractUsing human hepatocellular carcinoma (HCC) tissue samples stained with seven immune markers including one nuclear counterstain, we compared and evaluated the use of a new dimensionality reduction technique called Uniform Manifold Approximation and Projection (UMAP), as an alternative to t-Distributed Stochastic Neighbor Embedding (t-SNE) in analysing multiplex-immunofluorescence (mIF) derived single-cell data. We adopted an unsupervised clustering algorithm called FlowSOM to identify eight major cell types present in human HCC tissues. UMAP and t-SNE were ran independently on the dataset to qualitatively compare the distribution of clustered cell types in both reduced dimensions. Our comparison shows that UMAP is superior in runtime. Both techniques provide similar arrangements of cell clusters, with the key difference being UMAP’s extensive characteristic branching. Most interestingly, UMAP’s branching was able to highlight biological lineages, especially in identifying potential hybrid tumour cells (HTC). Survival analysis shows patients with higher proportion of HTC have a worse prognosis (p-value = 0.019). We conclude that both techniques are similar in their visualisation capabilities, but UMAP has a clear advantage over t-SNE in runtime, making it highly plausible to employ UMAP as an alternative to t-SNE in mIF data analysis.

Download Full-text

Robust single-cell Hi-C clustering by convolution- and random-walk–based imputation

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1901423116 ◽

2019 ◽

Vol 116 (28) ◽

pp. 14011-14018 ◽

Cited By ~ 14

Author(s):

Jingtian Zhou ◽

Jianzhu Ma ◽

Yusi Chen ◽

Chuankai Cheng ◽

Bokan Bao ◽

...

Keyword(s):

Random Walk ◽

Single Cell ◽

Clustering Algorithm ◽

Single Cell Analysis ◽

Single Cells ◽

Genome Structure ◽

Three Dimensional ◽

Cell Types ◽

Cell Clustering ◽

Low Coverage

Three-dimensional genome structure plays a pivotal role in gene regulation and cellular function. Single-cell analysis of genome architecture has been achieved using imaging and chromatin conformation capture methods such as Hi-C. To study variation in chromosome structure between different cell types, computational approaches are needed that can utilize sparse and heterogeneous single-cell Hi-C data. However, few methods exist that are able to accurately and efficiently cluster such data into constituent cell types. Here, we describe scHiCluster, a single-cell clustering algorithm for Hi-C contact matrices that is based on imputations using linear convolution and random walk. Using both simulated and real single-cell Hi-C data as benchmarks, scHiCluster significantly improves clustering accuracy when applied to low coverage datasets compared with existing methods. After imputation by scHiCluster, topologically associating domain (TAD)-like structures (TLSs) can be identified within single cells, and their consensus boundaries were enriched at the TAD boundaries observed in bulk cell Hi-C samples. In summary, scHiCluster facilitates visualization and comparison of single-cell 3D genomes.

Download Full-text

Improved Spectral Clustering Method for Identifying Cell Types from Single-Cell Data

Intelligent Computing Theories and Application - Lecture Notes in Computer Science ◽

10.1007/978-3-030-26969-2_17 ◽

2019 ◽

pp. 177-189

Author(s):

Yuanyuan Li ◽

Ping Luo ◽

Yi Lu ◽

Fang-Xiang Wu

Keyword(s):

Single Cell ◽

Spectral Clustering ◽

Cell Types ◽

Clustering Method ◽

Cell Data ◽

Spectral Clustering Method

Download Full-text

Prioritization of cell types responsive to biological perturbations in single-cell data with Augur

Nature Protocols ◽

10.1038/s41596-021-00561-x ◽

2021 ◽

Author(s):

Jordan W. Squair ◽

Michael A. Skinnider ◽

Matthieu Gautier ◽

Leonard J. Foster ◽

Grégoire Courtine

Keyword(s):

Single Cell ◽

Cell Types ◽

Cell Data

Download Full-text

A robust single cell clustering method based on subspace learning and partial imputation

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm49941.2020.9313478 ◽

2020 ◽

Author(s):

Ruiqing Zheng ◽

Zhenlan Liang ◽

Xiangmao Meng ◽

Yu Tian ◽

Min Li

Keyword(s):

Single Cell ◽

Subspace Learning ◽

Clustering Method ◽

Cell Clustering

Download Full-text

TPK: a single-cell clustering algorithm based on novel feature selection genes

Journal of Physics Conference Series ◽

10.1088/1742-6596/1738/1/012078 ◽

2021 ◽

Vol 1738 ◽

pp. 012078

Author(s):

Yaxuan Cui ◽

Kunjie Luo ◽

Zheyu Zhang ◽

Saijia Liu

Keyword(s):

Feature Selection ◽

Single Cell ◽

Clustering Algorithm ◽

Cell Clustering

Download Full-text