Subpopulation identification for single-cell RNA-sequencing data using functional data analysis

Mapping Intimacies ◽

10.1101/760413 ◽

2019 ◽

Author(s):

Kyungmin Ahn ◽

Hironobu Fujiwara

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Single Cell ◽

Gene Expression Data ◽

Functional Data Analysis ◽

Functional Data ◽

Clustering Algorithms ◽

Expression Data ◽

Clustering Methods ◽

Single Cell Rna Sequencing

AbstractBackgroundIn single-cell RNA-sequencing (scRNA-seq) data analysis, a number of statistical tools in multivariate data analysis (MDA) have been developed to help analyze the gene expression data. This MDA approach is typically focused on examining discrete genomic units of genes that ignores the dependency between the data components. In this paper, we propose a functional data analysis (FDA) approach on scRNA-seq data whereby we consider each cell as a single function. To avoid a large number of dropouts (zero or zero-closed values) and reduce the high dimensionality of the data, we first perform a principal component analysis (PCA) and assign PCs to be the amplitude of the function. Then we use the index of PCs directly from PCA for the phase components. This approach allows us to apply FDA clustering methods to scRNA-seq data analysis.ResultsTo demonstrate the robustness of our method, we apply several existing FDA clustering algorithms to the gene expression data to improve the accuracy of the classification of the cell types against the conventional clustering methods in MDA. As a result, the FDA clustering algorithms achieve superior accuracy on simulated data as well as real data such as human and mouse scRNA-seq data.ConclusionsThis new statistical technique enhances the classification performance and ultimately improves the understanding of stochastic biological processes. This new framework provides an essentially different scRNA-seq data analytical approach, which can complement conventional MDA methods. It can be truly effective when current MDA methods cannot detect or uncover the hidden functional nature of the gene expression dynamics.

Download Full-text

Analyzing Time-Course Microarray Data Using Functional Data Analysis - A Review

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1671 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 15

Author(s):

Norma Coffey ◽

John Hinde

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Functional Data Analysis ◽

Functional Data ◽

Time Course ◽

Expression Profiles ◽

Continuous Process ◽

Expression Data ◽

Microarray Gene Expression

Gene expression over time can be viewed as a continuous process and therefore represented as a continuous curve or function. Functional data analysis (FDA) is a statistical methodology used to analyze functional data that has become increasingly popular in the analysis of time-course gene expression data. Several FDA techniques have been applied to gene expression profiles including functional regression analysis (to describe the relationship between expression profiles and other covariate(s)), functional discriminant analysis (to discriminate and classify groups of genes) and functional principal components analysis (for dimension reduction and clustering). This paper reviews the use of FDA and its associated methods to analyze time-course microarray gene expression data.

Download Full-text

Classification using functional data analysis for temporal gene expression data

Bioinformatics ◽

10.1093/bioinformatics/bti742 ◽

2005 ◽

Vol 22 (1) ◽

pp. 68-76 ◽

Cited By ~ 84

Author(s):

Xiaoyan Leng ◽

Hans-Georg Müller

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Functional Data Analysis ◽

Functional Data ◽

Expression Data ◽

Temporal Gene Expression

Download Full-text

Clustering of time-course gene expression data using functional data analysis

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2007.05.006 ◽

2007 ◽

Vol 31 (4) ◽

pp. 265-274 ◽

Cited By ~ 25

Author(s):

Joon Jin Song ◽

Ho-Jin Lee ◽

Jeffrey S. Morris ◽

Sanghoon Kang

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Functional Data Analysis ◽

Functional Data ◽

Time Course ◽

Expression Data

Download Full-text

Current State-of-the-Art of Clustering Methods for Gene Expression Data with RNA-Seq

10.5772/intechopen.94069 ◽

2020 ◽

Author(s):

Ismail Jamail ◽

Ahmed Moussa

Keyword(s):

Gene Expression ◽

Cluster Analysis ◽

Data Analysis ◽

Gene Expression Data ◽

Expression Profile ◽

Clustering Algorithms ◽

Expression Data ◽

Rna Seq ◽

Clustering Methods ◽

Clustering And Classification

Latest developments in high-throughput cDNA sequencing (RNA-seq) have revolutionized gene expression profiling. This analysis aims to compare the expression levels of multiple genes between two or more samples, under specific circumstances or in a specific cell to give a global picture of cellular function. Thanks to these advances, gene expression data are being generated in large throughput. One of the primary data analysis tasks for gene expression studies involves data-mining techniques such as clustering and classification. Clustering, which is an unsupervised learning technique, has been widely used as a computational tool to facilitate our understanding of gene functions and regulations involved in a biological process. Cluster analysis aims to group the large number of genes present in a sample of gene expression profile data, such that similar or related genes are in same clusters, and different or unrelated genes are in distinct ones. Classification on the other hand can be used for grouping samples based on their expression profile. There are many clustering and classification algorithms that can be applied in gene expression experiments, the most widely used are hierarchical clustering, k-means clustering and model-based clustering that depend on a model to sort out the number of clusters. Depending on the data structure, a fitting clustering method must be used. In this chapter, we present a state of art of clustering algorithms and statistical approaches for grouping similar gene expression profiles that can be applied to RNA-seq data analysis and software tools dedicated to these methods. In addition, we discuss challenges in cluster analysis, and compare the performance of height commonly used clustering methods on four different public datasets from recount2.

Download Full-text

Optimal classification for time-course gene expression data using functional data analysis

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2008.07.007 ◽

2008 ◽

Vol 32 (6) ◽

pp. 426-432 ◽

Cited By ~ 19

Author(s):

Joon Jin Song ◽

Weiguo Deng ◽

Ho-Jin Lee ◽

Deukwoo Kwon

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Functional Data Analysis ◽

Functional Data ◽

Time Course ◽

Expression Data ◽

Optimal Classification

Download Full-text

Clustering Algorithms in Gene Expression: Data Analysis

10.1109/icrito51393.2021.9596549 ◽

2021 ◽

Author(s):

Karuna Ghai ◽

Jaspreet Singh

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Clustering Algorithms ◽

Expression Data ◽

Gene Expression Data Analysis

Download Full-text

Identification of Robust Clustering Methods in Gene Expression Data Analysis

Current Bioinformatics ◽

10.2174/1574893611666160610103926 ◽

2017 ◽

Vol 12 (6) ◽

Cited By ~ 2

Author(s):

Md. Bipul Hossen ◽

Md. Siraj-Ud-Doulah

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Expression Data ◽

Clustering Methods ◽

Gene Expression Data Analysis ◽

Robust Clustering

Download Full-text

SCANPY: large-scale single-cell gene expression data analysis

Genome Biology ◽

10.1186/s13059-017-1382-0 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 667

Author(s):

F. Alexander Wolf ◽

Philipp Angerer ◽

Fabian J. Theis

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Single Cell ◽

Gene Expression Data ◽

Large Scale ◽

Expression Data ◽

Gene Expression Data Analysis ◽

Cell Gene Expression ◽

Cell Gene

Download Full-text

Optimization of Clustering Algorithms for Gene Expression Data Analysis using Distance Measures

International Journal of Computer Applications ◽

10.5120/ijca2016909413 ◽

2016 ◽

Vol 139 (13) ◽

pp. 4-8 ◽

Cited By ~ 1

Author(s):

Angela Makolo ◽

Taiwo Adigun

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Clustering Algorithms ◽

Distance Measures ◽

Expression Data ◽

Gene Expression Data Analysis

Download Full-text

Evaluating the Effectiveness of Hard, Hierarchical and Fuzzy Clustering Methods for High Dimensional Gene Expression Data Analysis

2019 19th International Conference on Advances in ICT for Emerging Regions (ICTer) ◽

10.1109/icter48817.2019.9023771 ◽

2019 ◽

Author(s):

SPBM Senadheera ◽

AR Weerasinghe ◽

CR Wijesinghe

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Fuzzy Clustering ◽

High Dimensional ◽

Expression Data ◽

Clustering Methods ◽

Gene Expression Data Analysis ◽

Fuzzy Clustering Methods

Download Full-text