A Rough Set based Gene Expression Clustering Algorithm

Identification of cancer subtypes is the central goal in the cancer gene expression data analysis. Modified symmetry-based clustering is an unsupervised learning technique for detecting symmetrical convex or non-convex shaped clusters. To enable fast automatic clustering of cancer tissues (samples), in this chapter, the authors propose a rough set based hybrid approach for modified symmetry-based clustering algorithm. A natural basis for analyzing gene expression data using the symmetry-based algorithm is to group together genes with similar symmetrical patterns of microarray expressions. Rough-set theory helps in faster convergence and initial automatic optimal classification, thereby solving the problem of unknown knowledge of number of clusters in gene expression measurement data. For rough-set-theoretic decision rule generation, each cluster is classified using heuristically searched optimal reducts to overcome overlapping cluster problem. The rough modified symmetry-based clustering algorithm is compared with another newly implemented rough-improved symmetry-based clustering algorithm and existing K-Means algorithm over five benchmark cancer gene expression data sets, to demonstrate its superiority in terms of validity. The statistical analyses are also performed to establish the significance of this rough modified symmetry-based clustering approach.

Download Full-text

Cancer Gene Expression Data Analysis Using Rough Based Symmetrical Clustering

Handbook of Research on Computational Intelligence for Engineering, Science, and Business ◽

10.4018/978-1-4666-2518-1.ch027 ◽

2013 ◽

pp. 699-715 ◽

Cited By ~ 4

Author(s):

Anasua Sarkar ◽

Ujjwal Maulik

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Gene Expression Data ◽

Rough Set ◽

Clustering Algorithm ◽

Data Sets ◽

Cancer Gene ◽

Expression Data ◽

Gene Expression Data Analysis ◽

Cancer Subtypes

Identification of cancer subtypes is the central goal in the cancer gene expression data analysis. Modified symmetry-based clustering is an unsupervised learning technique for detecting symmetrical convex or non-convex shaped clusters. To enable fast automatic clustering of cancer tissues (samples), in this chapter, the authors propose a rough set based hybrid approach for modified symmetry-based clustering algorithm. A natural basis for analyzing gene expression data using the symmetry-based algorithm is to group together genes with similar symmetrical patterns of microarray expressions. Rough-set theory helps in faster convergence and initial automatic optimal classification, thereby solving the problem of unknown knowledge of number of clusters in gene expression measurement data. For rough-set-theoretic decision rule generation, each cluster is classified using heuristically searched optimal reducts to overcome overlapping cluster problem. The rough modified symmetry-based clustering algorithm is compared with another newly implemented rough-improved symmetry-based clustering algorithm and existing K-Means algorithm over five benchmark cancer gene expression data sets, to demonstrate its superiority in terms of validity. The statistical analyses are also performed to establish the significance of this rough modified symmetry-based clustering approach.

Download Full-text

Analysis on Network Clustering Algorithm of Data Mining Methods Based on Rough Set Theory

2011 Fourth International Symposium on Knowledge Acquisition and Modeling ◽

10.1109/kam.2011.85 ◽

2011 ◽

Author(s):

Xiao-rong Ye

Keyword(s):

Data Mining ◽

Set Theory ◽

Rough Set ◽

Clustering Algorithm ◽

Rough Set Theory ◽

Network Clustering ◽

Mining Methods

Download Full-text

ENTROPY-BASED CLUSTER VALIDATION AND ESTIMATION OF THE NUMBER OF CLUSTERS IN GENE EXPRESSION DATA

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720012500114 ◽

2012 ◽

Vol 10 (05) ◽

pp. 1250011

Author(s):

NATALIA NOVOSELOVA ◽

IGOR TOM

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Clustering Algorithm ◽

Selection Procedure ◽

Biological Knowledge ◽

Consensus Clustering ◽

Expression Data ◽

Cluster Validation ◽

Number Of Clusters ◽

Validity Measure

Many external and internal validity measures have been proposed in order to estimate the number of clusters in gene expression data but as a rule they do not consider the analysis of the stability of the groupings produced by a clustering algorithm. Based on the approach assessing the predictive power or stability of a partitioning, we propose the new measure of cluster validation and the selection procedure to determine the suitable number of clusters. The validity measure is based on the estimation of the "clearness" of the consensus matrix, which is the result of a resampling clustering scheme or consensus clustering. According to the proposed selection procedure the stable clustering result is determined with the reference to the validity measure for the null hypothesis encoding for the absence of clusters. The final number of clusters is selected by analyzing the distance between the validity plots for initial and permutated data sets. We applied the selection procedure to estimate the clustering results on several datasets. As a result the proposed procedure produced an accurate and robust estimate of the number of clusters, which are in agreement with the biological knowledge and gold standards of cluster quality.

Download Full-text

A TRIPARTITE CLUSTERING ANALYSIS ON MICRORNA, GENE AND DISEASE MODEL

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720012400070 ◽

2012 ◽

Vol 10 (01) ◽

pp. 1240007 ◽

Cited By ~ 2

Author(s):

CHENGCHENG SHEN ◽

YING LIU

Keyword(s):

Gene Expression ◽

Clustering Algorithm ◽

Target Genes ◽

Regulation Of Gene Expression ◽

Disease Model ◽

Relational Information ◽

Microrna Gene ◽

Research Findings ◽

Family Based

Alteration of gene expression in response to regulatory molecules or mutations could lead to different diseases. MicroRNAs (miRNAs) have been discovered to be involved in regulation of gene expression and a wide variety of diseases. In a tripartite biological network of human miRNAs, their predicted target genes and the diseases caused by altered expressions of these genes, valuable knowledge about the pathogenicity of miRNAs, involved genes and related disease classes can be revealed by co-clustering miRNAs, target genes and diseases simultaneously. Tripartite co-clustering can lead to more informative results than traditional co-clustering with only two kinds of members and pass the hidden relational information along the relation chain by considering multi-type members. Here we report a spectral co-clustering algorithm for k-partite graph to find clusters with heterogeneous members. We use the method to explore the potential relationships among miRNAs, genes and diseases. The clusters obtained from the algorithm have significantly higher density than randomly selected clusters, which means members in the same cluster are more likely to have common connections. Results also show that miRNAs in the same family based on the hairpin sequences tend to belong to the same cluster. We also validate the clustering results by checking the correlation of enriched gene functions and disease classes in the same cluster. Finally, widely studied miR-17-92 and its paralogs are analyzed as a case study to reveal that genes and diseases co-clustered with the miRNAs are in accordance with current research findings.

Download Full-text

A Fuzzy Clustering Algorithm for Analysis of Gene Expression Profiles

PRICAI 2004: Trends in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-540-28633-2_117 ◽

2004 ◽

pp. 967-968

Author(s):

Han-Saem Park ◽

Si-Ho Yoo ◽

Sung-Bae Cho

Keyword(s):

Gene Expression ◽

Fuzzy Clustering ◽

Clustering Algorithm ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Fuzzy Clustering Algorithm

Download Full-text

A ROUGH SET THEORY APPROACH TO THE ANALYSIS OF GENE EXPRESSION PROFILES

Chemoinformatics for Drug Discovery ◽

10.1002/9781118742785.ch3 ◽

2013 ◽

pp. 51-83

Author(s):

Joachim Petit ◽

Nathalie Meurice ◽

José Luis Medina-Franco ◽

Gerald M. Maggiora

Keyword(s):

Gene Expression ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Theory Approach

Download Full-text

Expansion Research on K-means Clustering Algorithm Based on Rough Set

International Journal of Advancements in Computing Technology ◽

10.4156/ijact.vol4.issue10.26 ◽

2012 ◽

Vol 4 (10) ◽

pp. 221-227 ◽

Cited By ~ 2

Author(s):

Xuexia YANG

Keyword(s):

Rough Set ◽

Clustering Algorithm

Download Full-text

Multi-cancer samples clustering via graph regularized low-rank representation method under sparse and symmetric constraints

BMC Bioinformatics ◽

10.1186/s12859-019-3231-5 ◽

2019 ◽

Vol 20 (S22) ◽

Author(s):

Juan Wang ◽

Cong-Hai Lu ◽

Jin-Xing Liu ◽

Ling-Yun Dai ◽

Xiang-Zhen Kong

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Clustering Algorithm ◽

Low Rank ◽

Expression Data ◽

Geometrical Structures ◽

Graph Regularization ◽

Raw Data ◽

Clustering Quality ◽

Low Rank Representation

Abstract Background Identifying different types of cancer based on gene expression data has become hotspot in bioinformatics research. Clustering cancer gene expression data from multiple cancers to their own class is a significance solution. However, the characteristics of high-dimensional and small samples of gene expression data and the noise of the data make data mining and research difficult. Although there are many effective and feasible methods to deal with this problem, the possibility remains that these methods are flawed. Results In this paper, we propose the graph regularized low-rank representation under symmetric and sparse constraints (sgLRR) method in which we introduce graph regularization based on manifold learning and symmetric sparse constraints into the traditional low-rank representation (LRR). For the sgLRR method, by means of symmetric constraint and sparse constraint, the effect of raw data noise on low-rank representation is alleviated. Further, sgLRR method preserves the important intrinsic local geometrical structures of the raw data by introducing graph regularization. We apply this method to cluster multi-cancer samples based on gene expression data, which improves the clustering quality. First, the gene expression data are decomposed by sgLRR method. And, a lowest rank representation matrix is obtained, which is symmetric and sparse. Then, an affinity matrix is constructed to perform the multi-cancer sample clustering by using a spectral clustering algorithm, i.e., normalized cuts (Ncuts). Finally, the multi-cancer samples clustering is completed. Conclusions A series of comparative experiments demonstrate that the sgLRR method based on low rank representation has a great advantage and remarkable performance in the clustering of multi-cancer samples.

Download Full-text

Security risk assessment of cyber physical power system based on rough set and gene expression programming

IEEE/CAA Journal of Automatica Sinica ◽

10.1109/jas.2015.7296538 ◽

2015 ◽

Vol 2 (4) ◽

pp. 431-439 ◽

Cited By ~ 10

Keyword(s):

Gene Expression ◽

Risk Assessment ◽

Power System ◽

Rough Set ◽

Gene Expression Programming ◽

Security Risk ◽

Security Risk Assessment

Download Full-text