Current State-of-the-Art of Clustering Methods for Gene Expression Data with RNA-Seq

Mapping Intimacies ◽

10.5772/intechopen.94069 ◽

2020 ◽

Author(s):

Ismail Jamail ◽

Ahmed Moussa

Keyword(s):

Gene Expression ◽

Cluster Analysis ◽

Data Analysis ◽

Gene Expression Data ◽

Expression Profile ◽

Clustering Algorithms ◽

Expression Data ◽

Rna Seq ◽

Clustering Methods ◽

Clustering And Classification

Latest developments in high-throughput cDNA sequencing (RNA-seq) have revolutionized gene expression profiling. This analysis aims to compare the expression levels of multiple genes between two or more samples, under specific circumstances or in a specific cell to give a global picture of cellular function. Thanks to these advances, gene expression data are being generated in large throughput. One of the primary data analysis tasks for gene expression studies involves data-mining techniques such as clustering and classification. Clustering, which is an unsupervised learning technique, has been widely used as a computational tool to facilitate our understanding of gene functions and regulations involved in a biological process. Cluster analysis aims to group the large number of genes present in a sample of gene expression profile data, such that similar or related genes are in same clusters, and different or unrelated genes are in distinct ones. Classification on the other hand can be used for grouping samples based on their expression profile. There are many clustering and classification algorithms that can be applied in gene expression experiments, the most widely used are hierarchical clustering, k-means clustering and model-based clustering that depend on a model to sort out the number of clusters. Depending on the data structure, a fitting clustering method must be used. In this chapter, we present a state of art of clustering algorithms and statistical approaches for grouping similar gene expression profiles that can be applied to RNA-seq data analysis and software tools dedicated to these methods. In addition, we discuss challenges in cluster analysis, and compare the performance of height commonly used clustering methods on four different public datasets from recount2.

Download Full-text

Subpopulation identification for single-cell RNA-sequencing data using functional data analysis

10.1101/760413 ◽

2019 ◽

Author(s):

Kyungmin Ahn ◽

Hironobu Fujiwara

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Single Cell ◽

Gene Expression Data ◽

Functional Data Analysis ◽

Functional Data ◽

Clustering Algorithms ◽

Expression Data ◽

Clustering Methods ◽

Single Cell Rna Sequencing

AbstractBackgroundIn single-cell RNA-sequencing (scRNA-seq) data analysis, a number of statistical tools in multivariate data analysis (MDA) have been developed to help analyze the gene expression data. This MDA approach is typically focused on examining discrete genomic units of genes that ignores the dependency between the data components. In this paper, we propose a functional data analysis (FDA) approach on scRNA-seq data whereby we consider each cell as a single function. To avoid a large number of dropouts (zero or zero-closed values) and reduce the high dimensionality of the data, we first perform a principal component analysis (PCA) and assign PCs to be the amplitude of the function. Then we use the index of PCs directly from PCA for the phase components. This approach allows us to apply FDA clustering methods to scRNA-seq data analysis.ResultsTo demonstrate the robustness of our method, we apply several existing FDA clustering algorithms to the gene expression data to improve the accuracy of the classification of the cell types against the conventional clustering methods in MDA. As a result, the FDA clustering algorithms achieve superior accuracy on simulated data as well as real data such as human and mouse scRNA-seq data.ConclusionsThis new statistical technique enhances the classification performance and ultimately improves the understanding of stochastic biological processes. This new framework provides an essentially different scRNA-seq data analytical approach, which can complement conventional MDA methods. It can be truly effective when current MDA methods cannot detect or uncover the hidden functional nature of the gene expression dynamics.

Download Full-text

Clustering Algorithms in Gene Expression: Data Analysis

10.1109/icrito51393.2021.9596549 ◽

2021 ◽

Author(s):

Karuna Ghai ◽

Jaspreet Singh

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Clustering Algorithms ◽

Expression Data ◽

Gene Expression Data Analysis

Download Full-text

IRIS-EDA: An integrated RNA-Seq interpretation system for gene expression data analysis

PLoS Computational Biology ◽

10.1371/journal.pcbi.1006792 ◽

2019 ◽

Vol 15 (2) ◽

pp. e1006792 ◽

Cited By ~ 11

Author(s):

Brandon Monier ◽

Adam McDermaid ◽

Cankun Wang ◽

Jing Zhao ◽

Allison Miller ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Gene Expression Data Analysis ◽

Interpretation System

Download Full-text

Identification of Robust Clustering Methods in Gene Expression Data Analysis

Current Bioinformatics ◽

10.2174/1574893611666160610103926 ◽

2017 ◽

Vol 12 (6) ◽

Cited By ~ 2

Author(s):

Md. Bipul Hossen ◽

Md. Siraj-Ud-Doulah

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Expression Data ◽

Clustering Methods ◽

Gene Expression Data Analysis ◽

Robust Clustering

Download Full-text

Pattern Discovery in Gene Expression Data

Intelligent Data Analysis ◽

10.4018/978-1-59904-982-3.ch003 ◽

2009 ◽

pp. 45-64

Author(s):

Gráinne Kerr ◽

Heather Ruskin ◽

Martin Crane

Keyword(s):

Gene Expression ◽

Data Mining ◽

Cluster Analysis ◽

Gene Regulation ◽

Data Analysis ◽

Gene Expression Data ◽

Mrna Levels ◽

Expression Data ◽

Single Experiment ◽

Gene Regulation Networks

Microarray technology1 provides an opportunity to monitor mRNA levels of expression of thousands of genes simultaneously in a single experiment. The enormous amount of data produced by this high throughput approach presents a challenge for data analysis: to extract meaningful patterns, to evaluate its quality, and to interpret the results. The most commonly used method of identifying such patterns is cluster analysis. Common and sufficient approaches to many data-mining problems, for example, Hierarchical, K-means, do not address well the properties of “typical” gene expression data and fail, in significant ways, to account for its profile. This chapter clarifies some of the issues and provides a framework to evaluate clustering in gene expression analysis. Methods are categorised explicitly in the context of application to data of this type, providing a basis for reverse engineering of gene regulation networks. Finally, areas for possible future development are highlighted.

Download Full-text

Optimization of Clustering Algorithms for Gene Expression Data Analysis using Distance Measures

International Journal of Computer Applications ◽

10.5120/ijca2016909413 ◽

2016 ◽

Vol 139 (13) ◽

pp. 4-8 ◽

Cited By ~ 1

Author(s):

Angela Makolo ◽

Taiwo Adigun

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Clustering Algorithms ◽

Distance Measures ◽

Expression Data ◽

Gene Expression Data Analysis

Download Full-text

Evaluating the Effectiveness of Hard, Hierarchical and Fuzzy Clustering Methods for High Dimensional Gene Expression Data Analysis

2019 19th International Conference on Advances in ICT for Emerging Regions (ICTer) ◽

10.1109/icter48817.2019.9023771 ◽

2019 ◽

Author(s):

SPBM Senadheera ◽

AR Weerasinghe ◽

CR Wijesinghe

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Fuzzy Clustering ◽

High Dimensional ◽

Expression Data ◽

Clustering Methods ◽

Gene Expression Data Analysis ◽

Fuzzy Clustering Methods

Download Full-text

Graph Theoretic Techniques for Clustering and Biclustering gene expression data.

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2012.1136 ◽

2012 ◽

pp. 173-181

Author(s):

Prangyaparamita Mohapatra ◽

Tripti Swarnkar

Keyword(s):

Gene Expression ◽

Data Mining ◽

Gene Expression Data ◽

Biological Networks ◽

Clustering Algorithms ◽

Expression Data ◽

Microarray Technology ◽

Clustering Methods ◽

Experimental Conditions ◽

Data Set

DNA microarray technology has made it possible to simultaneously monitor the expression levels of thousands of genes during biological processes and across collections of related samples. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results of the application of standard clustering methods to genes are limited. These limited results are imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the gene expression matrix have been proposed to date. This simultaneous clustering, usually designated by biclustering, seeks to find submatrices that are subgroups of genes and subgroups of columns, where the genes exhibit highly correlated activities for every condition. This type of algorithms has also been proposed and used in other fields, such as information retrieval and data mining. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches.

Download Full-text

Feature Selection for Gene Expression Data Analysis – A Review

International Journal of Psychosocial Rehabilitation ◽

10.37200/ijpr/v24i5/pr2020695 ◽

2020 ◽

Vol 24 (5) ◽

pp. 6955-6964

Author(s):

Dr. Prema R

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Data Analysis ◽

Gene Expression Data ◽

Expression Data ◽

Gene Expression Data Analysis ◽

Selection For

Download Full-text

State-of-the-art of Cluster Analysis of Gene Expression Data

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2008.00113 ◽

2009 ◽

Vol 34 (2) ◽

pp. 113-120 ◽

Cited By ~ 3

Author(s):

Feng YUE

Keyword(s):

Gene Expression ◽

Cluster Analysis ◽

Gene Expression Data ◽

State Of The Art ◽

Expression Data

Download Full-text