scholarly journals Robust Fuzzy Cluster Ensemble on Cancer Gene Expression Data

10.29007/38sk ◽  
2019 ◽  
Author(s):  
Yan Yan ◽  
Tin Nguyen ◽  
Bobby Bryant ◽  
Frederick C. Harris

Noise remains a particularly challenging and ubiquitous problem in cancer gene expression data clustering research, which may cause inaccurate results and mislead the underlying biological meanings. A clustering method that is robust to noise is highly desirable. No one clustering method performs best across all data sets despite a vast number of methods available. Cluster ensemble provides an approach to automatically combine results from multiple clustering methods for improving robustness and accuracy. We have proposed a novel noise robust fuzzy cluster ensemble algorithm. It employs an improved fuzzy clustering approach with different initializations as its base clusterings to avoid or alleviate the effects of noise in data sets. Its results show effective improvements over most examined noisy real cancer gene expression data sets when compared with most evaluated benchmark clustering methods: it is the top performer on three of the eight data sets, more than any other methods evaluated, and it performs well on most of the other data sets. Also, our fuzzy cluster ensemble is robust on highly noisy synthetic data sets. Moreover, it is computationally efficient.

Author(s):  
Soumya Raychaudhuri

The most interesting and challenging gene expression data sets to analyze are large multidimensional data sets that contain expression values for many genes across multiple conditions. In these data sets the use of scientific text can be particularly useful, since there are a myriad of genes examined under vastly different conditions, each of which may induce or repress expression of the same gene for different reasons. There is an enormous complexity to the data that we are examining—each gene is associated with dozens if not hundreds of expression values as well as multiple documents built up from vocabularies consisting of thousands of words. In Section 2.4 we reviewed common gene expression strategies, most of which revolve around defining groups of genes based on common profiles. A limitation of many gene expression analytic approaches is that they do not incorporate comprehensive background knowledge about the genes into the analysis. We present computational methods that leverage the peer-reviewed literature in the automatic analysis of gene expression data sets. Including the literature in gene expression data analysis offers an opportunity to incorporate background functional information about the genes when defining expression clusters. In Chapter 5 we saw how literature- based approaches could help in the analysis of single condition experiments. Here we will apply the strategies introduced in Chapter 6 to assess the coherence of groups of genes to enhance gene expression analysis approaches. The methods proposed here could, in fact, be applied to any multivariate genomics data type. The key concepts discussed in this chapter are listed in the frame box. We begin with a discussion of gene groups and their role in expression analysis; we briefly discuss strategies to assign keywords to groups and strategies to assess their functional coherence. We apply functional coherence measures to gene expression analysis; for examples we focus on a yeast expression data set. We first demonstrate how functional coherence can be used to focus in on the key biologically relevant gene groups derived by clustering methods such as self-organizing maps and k-means clustering.


2013 ◽  
pp. 1626-1641
Author(s):  
Anasua Sarkar ◽  
Ujjwal Maulik

Identification of cancer subtypes is the central goal in the cancer gene expression data analysis. Modified symmetry-based clustering is an unsupervised learning technique for detecting symmetrical convex or non-convex shaped clusters. To enable fast automatic clustering of cancer tissues (samples), in this chapter, the authors propose a rough set based hybrid approach for modified symmetry-based clustering algorithm. A natural basis for analyzing gene expression data using the symmetry-based algorithm is to group together genes with similar symmetrical patterns of microarray expressions. Rough-set theory helps in faster convergence and initial automatic optimal classification, thereby solving the problem of unknown knowledge of number of clusters in gene expression measurement data. For rough-set-theoretic decision rule generation, each cluster is classified using heuristically searched optimal reducts to overcome overlapping cluster problem. The rough modified symmetry-based clustering algorithm is compared with another newly implemented rough-improved symmetry-based clustering algorithm and existing K-Means algorithm over five benchmark cancer gene expression data sets, to demonstrate its superiority in terms of validity. The statistical analyses are also performed to establish the significance of this rough modified symmetry-based clustering approach.


Author(s):  
Vincent S. Tseng ◽  
Ching-Pin Kao

In recent years, clustering analysis has even become a valuable and useful tool for in-silico analysis of microarray or gene expression data. Although a number of clustering methods have been proposed, they are confronted with difficulties in meeting the requirements of automation, high quality, and high efficiency at the same time. In this chapter, we discuss the issue of parameterless clustering technique for gene expression analysis. We introduce two novel, parameterless and efficient clustering methods that fit for analysis of gene expression data. The unique feature of our methods is they incorporate the validation techniques into the clustering process so that high quality results can be obtained. Through experimental evaluation, these methods are shown to outperform other clustering methods greatly in terms of clustering quality, efficiency, and automation on both of synthetic and real data sets.


Author(s):  
Anasua Sarkar ◽  
Ujjwal Maulik

Identification of cancer subtypes is the central goal in the cancer gene expression data analysis. Modified symmetry-based clustering is an unsupervised learning technique for detecting symmetrical convex or non-convex shaped clusters. To enable fast automatic clustering of cancer tissues (samples), in this chapter, the authors propose a rough set based hybrid approach for modified symmetry-based clustering algorithm. A natural basis for analyzing gene expression data using the symmetry-based algorithm is to group together genes with similar symmetrical patterns of microarray expressions. Rough-set theory helps in faster convergence and initial automatic optimal classification, thereby solving the problem of unknown knowledge of number of clusters in gene expression measurement data. For rough-set-theoretic decision rule generation, each cluster is classified using heuristically searched optimal reducts to overcome overlapping cluster problem. The rough modified symmetry-based clustering algorithm is compared with another newly implemented rough-improved symmetry-based clustering algorithm and existing K-Means algorithm over five benchmark cancer gene expression data sets, to demonstrate its superiority in terms of validity. The statistical analyses are also performed to establish the significance of this rough modified symmetry-based clustering approach.


Sign in / Sign up

Export Citation Format

Share Document