CNLLRR: A Novel Low-Rank Representation Method for Single-cell RNA-seq Data Analysis

Mapping Intimacies ◽

10.1101/818062 ◽

2019 ◽

Author(s):

Na Yu ◽

Jin-Xing Liu ◽

Ying-Lian Gao ◽

Chun-Hou Zheng ◽

Junliang Shang ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Low Rank ◽

Cell Populations ◽

Similarity Matrix ◽

Graph Regularization ◽

Cell Clustering ◽

Low Rank Representation ◽

Negative Laplacian ◽

Cell Data

AbstractThe development of single-cell RNA-sequencing (scRNA-seq) technology has enabled the measurement of gene expression in individual cells. This provides an unprecedented opportunity to explore the biological mechanisms at the cellular level. However, existing scRNA-seq analysis methods are susceptible to noise and outliers or ignore the manifold structure inherent in the data. In this paper, a novel method called Cauchy non-negative Laplacian regularized low-rank representation (CNLLRR) is proposed to alleviate the above problem. Specifically, we employ the Cauchy loss function (CLF) instead of the conventional norm constraints in the noise matrix of CNLLRR, which will enhance the robustness of the method. In addition, graph regularization term is applied to the objective function, which can capture the paired geometric relationships between cells. Then, alternating direction method of multipliers (ADMM) is adopted to solve the optimization problem of CNLLRR. Finally, extensive experiments on scRNA-seq data reveal that the proposed CNLLRR method outperforms other state-of-the-art methods for cell clustering, cell visualization and prioritization of gene markers. CNLLRR contributes to understand the heterogeneity between cell populations in complex biological systems.Author summaryAnalysis of single-cell data can help to further study the heterogeneity and complexity of cell populations. The current analysis methods are mainly to learn the similarity between cells and cells. Then they use the clustering algorithm to perform cell clustering or downstream analysis on the obtained similarity matrix. Therefore, constructing accurate cell-to-cell similarity is crucial for single-cell data analysis. In this paper, we design a novel Cauchy non-negative Laplacian regularized low-rank representation (CNLLRR) method to get a better similarity matrix. Specifically, Cauchy loss function (CLF) constraint is applied to punish noise matrix, which will improve the robustness of CNLLRR to noise and outliers. Moreover, graph regularization term is applied to the objective function, which will effectively encode the local manifold information of the data. Further, these will guarantee the quality of the cell-to-cell similarity matrix learned. Finally, single-cell data analysis experiments show that our method is superior to other representative methods.

Download Full-text

scGPS: Determining Cell States and Global Fate Potential of Subpopulations

Frontiers in Genetics ◽

10.3389/fgene.2021.666771 ◽

2021 ◽

Vol 12 ◽

Author(s):

Michael Thompson ◽

Maika Matsumoto ◽

Tianqi Ma ◽

Anne Senabouth ◽

Nathan J. Palpant ◽

...

Keyword(s):

Single Cell ◽

Cell Populations ◽

Driver Genes ◽

Mixed Cell ◽

Machine Learning Classification ◽

Cell Clustering ◽

Novel Approach ◽

User Friendly ◽

Cell Data ◽

Selection Of

Finding cell states and their transcriptional relatedness is a main outcome from analysing single-cell data. In developmental biology, determining whether cells are related in a differentiation lineage remains a major challenge. A seamless analysis pipeline from cell clustering to estimating the probability of transitions between cell clusters is lacking. Here, we present Single Cell Global fate Potential of Subpopulations (scGPS) to characterise transcriptional relationship between cell states. scGPS decomposes mixed cell populations in one or more samples into clusters (SCORE algorithm) and estimates pairwise transitioning potential (scGPS algorithm) of any pair of clusters. SCORE allows for the assessment and selection of stable clustering results, a major challenge in clustering analysis. scGPS implements a novel approach, with machine learning classification, to flexibly construct trajectory connections between clusters. scGPS also has a feature selection functionality by network and modelling approaches to find biological processes and driver genes that connect cell populations. We applied scGPS in diverse developmental contexts and show superior results compared to a range of clustering and trajectory analysis methods. scGPS is able to identify the dynamics of cellular plasticity in a user-friendly workflow, that is fast and memory efficient. scGPS is implemented in R with optimised functions using C++ and is publicly available in Bioconductor.

Download Full-text

High-throughput single cell data analysis – A tutorial

Analytica Chimica Acta ◽

10.1016/j.aca.2021.338872 ◽

2021 ◽

pp. 338872

Author(s):

Gerjen H. Tinnevelt ◽

Kristiaan Wouters ◽

Geert J. Postma ◽

Rita Folcarelli ◽

Jeroen J. Jansen

Keyword(s):

Data Analysis ◽

Single Cell ◽

High Throughput ◽

Cell Data

Download Full-text

Multiview Common Subspace Clustering via Coupled Low Rank Representation

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3465056 ◽

2021 ◽

Vol 12 (4) ◽

pp. 1-25

Author(s):

Stanley Ebhohimhen Abhadiomhen ◽

Zhiyang Wang ◽

Xiangjun Shen ◽

Jianping Fan

Keyword(s):

Common Knowledge ◽

Subspace Clustering ◽

Laplacian Matrix ◽

Low Rank ◽

Connected Components ◽

Similarity Matrix ◽

Benchmark Datasets ◽

Low Rank Representation ◽

Low Dimensional ◽

Shared Structure

Multi-view subspace clustering (MVSC) finds a shared structure in latent low-dimensional subspaces of multi-view data to enhance clustering performance. Nonetheless, we observe that most existing MVSC methods neglect the diversity in multi-view data by considering only the common knowledge to find a shared structure either directly or by merging different similarity matrices learned for each view. In the presence of noise, this predefined shared structure becomes a biased representation of the different views. Thus, in this article, we propose a MVSC method based on coupled low-rank representation to address the above limitation. Our method first obtains a low-rank representation for each view, constrained to be a linear combination of the view-specific representation and the shared representation by simultaneously encouraging the sparsity of view-specific one. Then, it uses the k -block diagonal regularizer to learn a manifold recovery matrix for each view through respective low-rank matrices to recover more manifold structures from them. In this way, the proposed method can find an ideal similarity matrix by approximating clustering projection matrices obtained from the recovery structures. Hence, this similarity matrix denotes our clustering structure with exactly k connected components by applying a rank constraint on the similarity matrix’s relaxed Laplacian matrix to avoid spectral post-processing of the low-dimensional embedding matrix. The core of our idea is such that we introduce dynamic approximation into the low-rank representation to allow the clustering structure and the shared representation to guide each other to learn cleaner low-rank matrices that would lead to a better clustering structure. Therefore, our approach is notably different from existing methods in which the local manifold structure of data is captured in advance. Extensive experiments on six benchmark datasets show that our method outperforms 10 similar state-of-the-art compared methods in six evaluation metrics.

Download Full-text

Axes of inter-sample variability among transcriptional neighborhoods reveal disease associated cell states in single-cell data

10.1101/2021.04.19.440534 ◽

2021 ◽

Author(s):

Yakir A Reshef ◽

Laurie Rumker ◽

Joyce B Kang ◽

Aparna Nathan ◽

Megan B Murray ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Single Cell ◽

Active Tuberculosis ◽

Cell Populations ◽

Cell Type ◽

Accurate Identification ◽

Shared Function ◽

Statistical Approaches ◽

Cell Data ◽

Notch Activation

As single-cell datasets grow in sample size, there is a critical need to characterize cell states that vary across samples and associate with sample attributes like clinical phenotypes. Current statistical approaches typically map cells to cell-type clusters and examine sample differences through that lens alone. Here we present covarying neighborhood analysis (CNA), an unbiased method to identify cell populations of interest with greater flexibility and granularity. CNA characterizes dominant axes of variation across samples by identifying groups of very small regions in transcriptional space, termed neighborhoods, that covary in abundance across samples, suggesting shared function or regulation. CNA can then rigorously test for associations between any sample-level attribute and the abundances of these covarying neighborhood groups. We show in simulation that CNA enables more powerful and accurate identification of disease-associated cell states than a cluster-based approach. When applied to published datasets, CNA captures a Notch activation signature in rheumatoid arthritis, redefines monocyte populations expanded in sepsis, and identifies a previously undiscovered T-cell population associated with progression to active tuberculosis.

Download Full-text

Distinguishing different modes of growth using single-cell data

eLife ◽

10.7554/elife.72565 ◽

2021 ◽

Vol 10 ◽

Author(s):

Prathitha Kar ◽

Sriram Tiruvadi-Krishnan ◽

Jaana Männik ◽

Jaan Männik ◽

Ariel Amir

Keyword(s):

Data Analysis ◽

Single Cell ◽

Statistical Methods ◽

Synthetic Data ◽

Cellular Growth ◽

Biological Mechanisms ◽

E Coli ◽

High Throughput Data ◽

Consistent Method ◽

Cell Data

Collection of high-throughput data has become prevalent in biology. Large datasets allow the use of statistical constructs such as binning and linear regression to quantify relationships between variables and hypothesize underlying biological mechanisms based on it. We discuss several such examples in relation to single-cell data and cellular growth. In particular, we show instances where what appears to be ordinary use of these statistical methods leads to incorrect conclusions such as growth being non-exponential as opposed to exponential and vice versa. We propose that the data analysis and its interpretation should be done in the context of a generative model, if possible. In this way, the statistical methods can be validated either analytically or against synthetic data generated via the use of the model, leading to a consistent method for inferring biological mechanisms from data. On applying the validated methods of data analysis to infer cellular growth on our experimental data, we find the growth of length in E. coli to be non-exponential. Our analysis shows that in the later stages of the cell cycle the growth rate is faster than exponential.

Download Full-text

Low-rank representation with adaptive graph regularization

Neural Networks ◽

10.1016/j.neunet.2018.08.007 ◽

2018 ◽

Vol 108 ◽

pp. 83-96 ◽

Cited By ~ 27

Author(s):

Jie Wen ◽

Xiaozhao Fang ◽

Yong Xu ◽

Chunwei Tian ◽

Lunke Fei

Keyword(s):

Low Rank ◽

Graph Regularization ◽

Low Rank Representation

Download Full-text

Computational Methods for Single-Cell Data Analysis

10.1007/978-1-4939-9057-3 ◽

2019 ◽

Keyword(s):

Data Analysis ◽

Single Cell ◽

Computational Methods ◽

Cell Data

Download Full-text

Multi-cancer samples clustering via graph regularized low-rank representation method under sparse and symmetric constraints

BMC Bioinformatics ◽

10.1186/s12859-019-3231-5 ◽

2019 ◽

Vol 20 (S22) ◽

Author(s):

Juan Wang ◽

Cong-Hai Lu ◽

Jin-Xing Liu ◽

Ling-Yun Dai ◽

Xiang-Zhen Kong

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Clustering Algorithm ◽

Low Rank ◽

Expression Data ◽

Geometrical Structures ◽

Graph Regularization ◽

Raw Data ◽

Clustering Quality ◽

Low Rank Representation

Abstract Background Identifying different types of cancer based on gene expression data has become hotspot in bioinformatics research. Clustering cancer gene expression data from multiple cancers to their own class is a significance solution. However, the characteristics of high-dimensional and small samples of gene expression data and the noise of the data make data mining and research difficult. Although there are many effective and feasible methods to deal with this problem, the possibility remains that these methods are flawed. Results In this paper, we propose the graph regularized low-rank representation under symmetric and sparse constraints (sgLRR) method in which we introduce graph regularization based on manifold learning and symmetric sparse constraints into the traditional low-rank representation (LRR). For the sgLRR method, by means of symmetric constraint and sparse constraint, the effect of raw data noise on low-rank representation is alleviated. Further, sgLRR method preserves the important intrinsic local geometrical structures of the raw data by introducing graph regularization. We apply this method to cluster multi-cancer samples based on gene expression data, which improves the clustering quality. First, the gene expression data are decomposed by sgLRR method. And, a lowest rank representation matrix is obtained, which is symmetric and sparse. Then, an affinity matrix is constructed to perform the multi-cancer sample clustering by using a spectral clustering algorithm, i.e., normalized cuts (Ncuts). Finally, the multi-cancer samples clustering is completed. Conclusions A series of comparative experiments demonstrate that the sgLRR method based on low rank representation has a great advantage and remarkable performance in the clustering of multi-cancer samples.

Download Full-text

Low-rank representation with graph regularization for subspace clustering

Soft Computing ◽

10.1007/s00500-015-1869-0 ◽

2015 ◽

Vol 21 (6) ◽

pp. 1569-1581 ◽

Cited By ~ 5

Author(s):

Wu He ◽

Jim X. Chen ◽

Weihua Zhang

Keyword(s):

Subspace Clustering ◽

Low Rank ◽

Graph Regularization ◽

Low Rank Representation

Download Full-text

SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation

Bioinformatics ◽

10.1093/bioinformatics/btz139 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3642-3650 ◽

Cited By ~ 13

Author(s):

Ruiqing Zheng ◽

Min Li ◽

Zhenlan Liang ◽

Fang-Xiang Wu ◽

Yi Pan ◽

...

Keyword(s):

Single Cell ◽

Low Rank ◽

Supplementary Information ◽

Similarity Matrix ◽

Similarity Learning ◽

Clustering Methods ◽

Cell Type ◽

Gene Markers ◽

Adaptive Penalty ◽

New Perspective

Abstract Motivation The development of single-cell RNA-sequencing (scRNA-seq) provides a new perspective to study biological problems at the single-cell level. One of the key issues in scRNA-seq analysis is to resolve the heterogeneity and diversity of cells, which is to cluster the cells into several groups. However, many existing clustering methods are designed to analyze bulk RNA-seq data, it is urgent to develop the new scRNA-seq clustering methods. Moreover, the high noise in scRNA-seq data also brings a lot of challenges to computational methods. Results In this study, we propose a novel scRNA-seq cell type detection method based on similarity learning, called SinNLRR. The method is motivated by the self-expression of the cells with the same group. Specifically, we impose the non-negative and low rank structure on the similarity matrix. We apply alternating direction method of multipliers to solve the optimization problem and propose an adaptive penalty selection method to avoid the sensitivity to the parameters. The learned similarity matrix could be incorporated with spectral clustering, t-distributed stochastic neighbor embedding for visualization and Laplace score for prioritizing gene markers. In contrast to other scRNA-seq clustering methods, our method achieves more robust and accurate results on different datasets. Availability and implementation Our MATLAB implementation of SinNLRR is available at, https://github.com/zrq0123/SinNLRR. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text