Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis

Zhenqiu Liu

doi:10.3390/ijms21165797

Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis

International Journal of Molecular Sciences ◽

10.3390/ijms21165797 ◽

2020 ◽

Vol 21 (16) ◽

pp. 5797

Author(s):

Zhenqiu Liu

Keyword(s):

Principal Component Analysis ◽

Dimension Reduction ◽

Single Cell ◽

Optimal Solution ◽

Principal Component ◽

Component Analysis ◽

Biological Information ◽

Rna Seq ◽

Computationally Efficient ◽

Leibler Divergence

Single-cell RNA-seq (scRNA-seq) is a powerful tool for analyzing heterogeneous and functionally diverse cell population. Visualizing scRNA-seq data can help us effectively extract meaningful biological information and identify novel cell subtypes. Currently, the most popular methods for scRNA-seq visualization are principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). While PCA is an unsupervised dimension reduction technique, t-SNE incorporates cluster information into pairwise probability, and then maximizes the Kullback–Leibler divergence. Uniform Manifold Approximation and Projection (UMAP) is another recently developed visualization method similar to t-SNE. However, one limitation with UMAP and t-SNE is that they can only capture the local structure of the data, the global structure of the data is not faithfully preserved. In this manuscript, we propose a semisupervised principal component analysis (ssPCA) approach for scRNA-seq visualization. The proposed approach incorporates cluster-labels into dimension reduction and discovers principal components that maximize both data variance and cluster dependence. ssPCA must have cluster-labels as its input. Therefore, it is most useful for visualizing clusters from a scRNA-seq clustering software. Our experiments with simulation and real scRNA-seq data demonstrate that ssPCA is able to preserve both local and global structures of the data, and uncover the transition and progressions in the data, if they exist. In addition, ssPCA is convex and has a global optimal solution. It is also robust and computationally efficient, making it viable for scRNA-seq cluster visualization.

Download Full-text

Truncated Robust Principal Component Analysis and Noise Reduction for Single Cell RNA-seq Data

Bioinformatics Research and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-94968-0_32 ◽

2018 ◽

pp. 335-346

Author(s):

Krzysztof Gogolewski ◽

Maciej Sykulski ◽

Neo Christopher Chung ◽

Anna Gambin

Keyword(s):

Principal Component Analysis ◽

Noise Reduction ◽

Single Cell ◽

Principal Component ◽

Component Analysis ◽

Rna Seq ◽

Robust Principal Component Analysis

Download Full-text

Accurate denoising of single-cell RNA-Seq data using unbiased principal component analysis

10.1101/655365 ◽

2019 ◽

Cited By ~ 11

Author(s):

Florian Wagner ◽

Dalia Barkley ◽

Itai Yanai

Keyword(s):

Principal Component Analysis ◽

Single Cell ◽

Simulated Data ◽

Principal Component ◽

Cell Aggregation ◽

Component Analysis ◽

Rna Seq ◽

Highly Expressed Genes ◽

Cell Subpopulations ◽

Aggregation Step

AbstractSingle-cell RNA-Seq measurements are commonly affected by high levels of technical noise, posing challenges for data analysis and visualization. A diverse array of methods has been proposed to computationally remove noise by sharing information across similar cells or genes, however their respective accuracies have been difficult to establish. Here, we propose a simple denoising strategy based on principal component analysis (PCA). We show that while PCA performed on raw data is biased towards highly expressed genes, this bias can be mitigated with a cell aggregation step, allowing the recovery of denoised expression values for both highly and lowly expressed genes. We benchmark our resulting ENHANCE algorithm and three previously described methods on simulated data that closely mimic real datasets, showing that ENHANCE provides the best overall denoising accuracy, recovering modules of co-expressed genes and cell subpopulations. Implementations of our algorithm are available at https://github.com/yanailab/enhance.

Download Full-text

Structure-Aware Principal Component Analysis for Single-Cell RNA-seq Data

Journal of Computational Biology ◽

10.1089/cmb.2018.0027 ◽

2018 ◽

Vol 25 (12) ◽

pp. 1365-1373 ◽

Cited By ~ 6

Author(s):

Snehalika Lall ◽

Debajyoti Sinha ◽

Sanghamitra Bandyopadhyay ◽

Debarka Sengupta

Keyword(s):

Principal Component Analysis ◽

Single Cell ◽

Principal Component ◽

Component Analysis ◽

Rna Seq

Download Full-text

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

10.1101/642595 ◽

2019 ◽

Cited By ~ 1

Author(s):

Koki Tsuyuzaki ◽

Hiroyuki Sato ◽

Kenta Sato ◽

Itoshi Nikaido

Keyword(s):

Principal Component Analysis ◽

Single Cell ◽

Large Scale ◽

Principal Component ◽

Component Analysis ◽

Rna Seq ◽

Large Memory ◽

Synthetic Datasets ◽

Selection Of ◽

Memory Efficient

AbstractPrincipal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but large-scale scRNA-seq datasets require long computational times and a large memory capacity.In this work, we review 21 fast and memory-efficient PCA implementations (10 algorithms) and evaluate their application using 4 real and 18 synthetic datasets. Our benchmarking showed that some PCA algorithms are faster, more memory efficient, and more accurate than others. In consideration of the differences in the computational environments of users and developers, we have also developed guidelines to assist with selection of appropriate PCA implementations.

Download Full-text

Coronavirus Disease Predictor: An RNA-Seq based pipeline for dimension reduction and prediction of COVID-19

Journal of Physics Conference Series ◽

10.1088/1742-6596/2089/1/012025 ◽

2021 ◽

Vol 2089 (1) ◽

pp. 012025

Author(s):

Naiyar Iqbal ◽

Pradeep Kumar

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Dimension Reduction ◽

Soft Computing ◽

Principal Component ◽

Component Analysis ◽

Formal Concept ◽

Genome Wide Association Studies ◽

Rna Seq ◽

Soft Computing Techniques

Abstract SARS CoV-2, the novel coronavirus behind the COVID-19 infection, has caused destruction around the world with human life, detecting a range of complexity which has knocked medical care specialists to investigate new innovative solutions and diagnosis strategies. The soft computing-based approach has assumed a significant role in resolving complex issues, and numerous societies have been shifted to implement and convert these innovations in response to the encounters created by the COVID-19 pandemic. To perform genome-wide association studies using RNA-Seq of COVID-19 and identify gene biomarkers, classification, and prediction using soft computing techniques of Coronavirus disease studies to fight this emergency pandemic in the epidemiological domain, and disease prognosis. The RNA-Seq profiles of both healthy and COVID-19 positive patients’ samples were considered. We have proposed an integrated pipeline from bioinformatics in-silico phase for-omic profile data processing to dimension reduction using various prominent techniques such as formal concept analysis and principal component analysis followed by machine learning phase for prediction of the disease. In this experimental research, we have applied different eminent machine learning techniques to implement an effective integrated model using Classifier Subset Evaluator (CSE) followed by principal component analysis (PCA) for dimension reduction to select the highly significant features and then to do the classification and prediction of Coronavirus disease, different eminent classifiers have been applied on the selected features. In this analysis, the Hoeffding Tree model found the topmost performance classifier with a classification accuracy of 99.21% as well as sensitivity and specificity of 99% and 100% respectively.

Download Full-text

Erratum to “Data‐driven dimension reduction in functional principal component analysis identifying the change‐point in functional data”

Statistical Analysis and Data Mining The ASA Data Science Journal ◽

10.1002/sam.11510 ◽

2021 ◽

Keyword(s):

Principal Component Analysis ◽

Dimension Reduction ◽

Functional Data ◽

Change Point ◽

Principal Component ◽

Component Analysis ◽

Functional Principal Component Analysis ◽

Data Driven ◽

Functional Principal Component

Download Full-text

Dimension Reduction by Local Principal Component Analysis

Unsupervised Learning ◽

10.7551/mitpress/7011.003.0019 ◽

1999 ◽

Keyword(s):

Principal Component Analysis ◽

Dimension Reduction ◽

Principal Component ◽

Component Analysis

Download Full-text

Computationally Efficient Performance-Driven Surrogate Modeling of Microwave Components Using Principal Component Analysis

2020 IEEE/MTT-S International Microwave Symposium (IMS) ◽

10.1109/ims30576.2020.9223805 ◽

2020 ◽

Author(s):

Slawomir Koziel ◽

Anna Pietrenko-Dabrowska ◽

John W. Bandler

Keyword(s):

Principal Component Analysis ◽

Surrogate Modeling ◽

Principal Component ◽

Component Analysis ◽

Computationally Efficient ◽

Efficient Performance ◽

Microwave Components

Download Full-text

Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model

Genome Biology ◽

10.1186/s13059-019-1861-6 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 35

Author(s):

F. William Townes ◽

Stephanie C. Hicks ◽

Martin J. Aryee ◽

Rafael A. Irizarry

Keyword(s):

Feature Selection ◽

Dimension Reduction ◽

Single Cell ◽

Current Practice ◽

Principal Component ◽

Ground Truth ◽

Rna Seq ◽

Normal Distributions ◽

Multinomial Sampling ◽

Negative Controls

AbstractSingle-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.

Download Full-text

High-Dimensional Data Dimension Reduction Based on KECA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.303-306.1101 ◽

2013 ◽

Vol 303-306 ◽

pp. 1101-1104 ◽

Cited By ~ 2

Author(s):

Yong De Hu ◽

Jing Chang Pan ◽

Xin Tan

Keyword(s):

Principal Component Analysis ◽

Dimension Reduction ◽

High Dimensional Data ◽

Principal Component ◽

Good Method ◽

Component Analysis ◽

Renyi Entropy ◽

Rényi Entropy ◽

Kernel Principal Component Analysis ◽

High Dimensional

Kernel entropy component analysis (KECA) reveals the original data’s structure by kernel matrix. This structure is related to the Renyi entropy of the data. KECA maintains the invariance of the original data’s structure by keeping the data’s Renyi entropy unchanged. This paper described the original data by several components on the purpose of dimension reduction. Then the KECA was applied in celestial spectra reduction and was compared with Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) by experiments. Experimental results show that the KECA is a good method in high-dimensional data reduction.

Download Full-text