Fuzzy Clustering Using the Convex Hull as Geometrical Model

AN EXTENDED FUZZY CLUSTERING ALGORITHM AND ITS APPLICATION

Journal of Circuits System and Computers ◽

10.1142/s0218126695000175 ◽

1995 ◽

Vol 05 (02) ◽

pp. 239-259

Author(s):

SU HWAN KIM ◽

SEON WOOK KIM ◽

TAE WON RHEE

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Main Memory ◽

Color Image Segmentation ◽

Occurrence Rate ◽

Secondary Memory ◽

Worst Case ◽

Memory Space ◽

Fuzzy Clustering Algorithm

For data analyses, it is very important to combine data with similar attribute values into a categorically homogeneous subset, called a cluster, and this technique is called clustering. Generally crisp clustering algorithms are weak in noise, because each datum should be assigned to exactly one cluster. In order to solve the problem, a fuzzy c-means, a fuzzy maximum likelihood estimation, and an optimal fuzzy clustering algorithms in the fuzzy set theory have been proposed. They, however, require a lot of processing time because of exhaustive iteration with an amount of data and their memberships. Especially large memory space results in the degradation of performance in real-time processing applications, because it takes too much time to swap between the main memory and the secondary memory. To overcome these limitations, an extended fuzzy clustering algorithm based on an unsupervised optimal fuzzy clustering algorithm is proposed in this paper. This algorithm assigns a weight factor to each distinct datum considering its occurrence rate. Also, the proposed extended fuzzy clustering algorithm considers the degree of importances of each attribute, which determines the characteristics of the data. The worst case is that the whole data has an uniformly normal distribution, which means the importance of all attributes are the same. The proposed extended fuzzy clustering algorithm has better performance than the unsupervised optimal fuzzy clustering algorithm in terms of memory space and execution time in most cases. For simulation the proposed algorithm is applied to color image segmentation. Also automatic target detection and multipeak detection are considered as applications. These schemes can be applied to any other fuzzy clustering algorithms.

Download Full-text

Kernel-Based Fuzzy Local Information Clustering Algorithm Self-integrating Non-Local Information

10.21203/rs.3.rs-585259/v1 ◽

2021 ◽

Author(s):

Qiuyu Song ◽

Chengmao Wu ◽

Xiaoping Tian ◽

Yue Song ◽

Xiaokang Guo

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Integration Method ◽

Clustering Algorithms ◽

Noise Removal ◽

Local Information ◽

Gaussian Kernel ◽

Distance Metric ◽

High Noise ◽

Non Local

Abstract The application of fuzzy clustering algorithms in image segmentation is a hot research topic nowadays. Existing fuzzy clustering algorithms have the following three problems: (1)The parameters of spatial information constraints can$'$t be selected adaptively; (2)The image corrupted by high noise can$'$t be segmented effectively; (3)It is difficult to achieve a balance between noise removal and detail preservation. In the fuzzy clustering based on the optimization model, the choice of distance metric is very important. Since the use of Euclidean distance will lead to sensitivity to outliers and noise, it is difficult to obtain satisfactory segmentation results, which will affect the clustering performance. This paper proposes an optimization algorithm based on the kernel-based fuzzy local information clustering integrating non-local information (KFLNLI). The algorithm adopts a self-integration method to introduce local and non-local information of images, which solves the common problems of current clustering algorithm. Firstly, the self-integration method solves the problem of selecting spatial constraint parameters. The algorithm uses continuous self-learning iteration to calculate the weight coefficients; Secondly, the distance metric uses Gaussian kernel function to induce the distance to further enhance the robustness against noise and the adaptivity of processing different images; Finally, both local and non-local information are introduced to achieve a segmentation effect that can eliminate most of the noise and retain the original details of the image. Experimental results show that the algorithm is superior to existing state-of-the-art fuzzy clustering-related algorithm in the presence of high noise.

Download Full-text

A Data Distribution View of Clustering Algorithms

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch059 ◽

2011 ◽

pp. 374-381 ◽

Cited By ~ 1

Author(s):

Junjie Wu ◽

Jian Chen ◽

Hui Xiong

Keyword(s):

Data Mining ◽

Cluster Analysis ◽

Clustering Analysis ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Distribution ◽

Point Of View ◽

Group Method ◽

Data Sets ◽

Distribution Point

Cluster analysis (Jain & Dubes, 1988) provides insight into the data by dividing the objects into groups (clusters), such that objects in a cluster are more similar to each other than objects in other clusters. Cluster analysis has long played an important role in a wide variety of fields, such as psychology, bioinformatics, pattern recognition, information retrieval, machine learning, and data mining. Many clustering algorithms, such as K-means and Unweighted Pair Group Method with Arithmetic Mean (UPGMA), have been wellestablished. A recent research focus on clustering analysis is to understand the strength and weakness of various clustering algorithms with respect to data factors. Indeed, people have identified some data characteristics that may strongly affect clustering analysis including high dimensionality and sparseness, the large size, noise, types of attributes and data sets, and scales of attributes (Tan, Steinbach, & Kumar, 2005). However, further investigation is expected to reveal whether and how the data distributions can have the impact on the performance of clustering algorithms. Along this line, we study clustering algorithms by answering three questions: 1. What are the systematic differences between the distributions of the resultant clusters by different clustering algorithms? 2. How can the distribution of the “true” cluster sizes make impact on the performances of clustering algorithms? 3. How to choose an appropriate clustering algorithm in practice? The answers to these questions can guide us for the better understanding and the use of clustering methods. This is noteworthy, since 1) in theory, people seldom realized that there are strong relationships between the clustering algorithms and the cluster size distributions, and 2) in practice, how to choose an appropriate clustering algorithm is still a challenging task, especially after an algorithm boom in data mining area. This chapter thus tries to fill this void initially. To this end, we carefully select two widely used categories of clustering algorithms, i.e., K-means and Agglomerative Hierarchical Clustering (AHC), as the representative algorithms for illustration. In the chapter, we first show that K-means tends to generate the clusters with a relatively uniform distribution on the cluster sizes. Then we demonstrate that UPGMA, one of the robust AHC methods, acts in an opposite way to K-means; that is, UPGMA tends to generate the clusters with high variation on the cluster sizes. Indeed, the experimental results indicate that the variations of the resultant cluster sizes by K-means and UPGMA, measured by the Coefficient of Variation (CV), are in the specific intervals, say [0.3, 1.0] and [1.0, 2.5] respectively. Finally, we put together K-means and UPGMA for a further comparison, and propose some rules for the better choice of the clustering schemes from the data distribution point of view.

Download Full-text

An Adaptive Multiobjective Genetic Algorithm with Fuzzy c-Means for Automatic Data Clustering

Mathematical Problems in Engineering ◽

10.1155/2018/6123874 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Ze Dong ◽

Hao Jia ◽

Miao Liu

Keyword(s):

Genetic Algorithm ◽

Fuzzy Clustering ◽

Clustering Algorithm ◽

Majority Vote ◽

Clustering Algorithms ◽

Nsga Ii ◽

Number Of Clusters ◽

Automatic Data ◽

Multiobjective Genetic Algorithm ◽

Fuzzy Clustering Method

This paper presents a fuzzy clustering method based on multiobjective genetic algorithm. The ADNSGA2-FCM algorithm was developed to solve the clustering problem by combining the fuzzy clustering algorithm (FCM) with the multiobjective genetic algorithm (NSGA-II) and introducing an adaptive mechanism. The algorithm does not need to give the number of clusters in advance. After the number of initial clusters and the center coordinates are given randomly, the optimal solution set is found by the multiobjective evolutionary algorithm. After determining the optimal number of clusters by majority vote method, the Jm value is continuously optimized through the combination of Canonical Genetic Algorithm and FCM, and finally the best clustering result is obtained. By using standard UCI dataset verification and comparing with existing single-objective and multiobjective clustering algorithms, the effectiveness of this method is proved.

Download Full-text

Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

Evolutionary Computation ◽

10.1162/evco_a_00264 ◽

2020 ◽

Vol 28 (4) ◽

pp. 531-561 ◽

Cited By ~ 1

Author(s):

Andrew Lensen ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Genetic Programming ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Similarity Measures ◽

Small Subset ◽

Similarity Functions ◽

New Approach ◽

Performance Improvements ◽

Consistent Performance ◽

High Dimensional Datasets

Clustering is a difficult and widely studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g., Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally predefined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this article, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.

Download Full-text

A Comparative Analysis of the Direct Clustering Algorithms Based on Different Similarity Measure of Vague Sets

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.1488 ◽

2013 ◽

Vol 380-384 ◽

pp. 1488-1494

Author(s):

Wang Wei ◽

Jin Yue Peng

Keyword(s):

Similarity Measure ◽

Clustering Analysis ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Evaluation Criteria ◽

Experimental Result ◽

Clustering Method ◽

Vague Sets ◽

Intelligence System ◽

System Clustering

In the research and development of intelligence system, clustering analysis is a very important problem. According to the new direct clustering algorithm using similarity measure of Vague sets as evaluation criteria presented by paper, the Vague direct clustering method are used to analysis using different similarity measure of Vague sets. The experimental result shows that the direct clustering method based on the similarity of Vague sets is effective, and the direct clustering method based on different similarity measure of Vague sets is the same basically, but difference on the steps of clustering. To select different algorithms according different conditions in the work of the actual applications.

Download Full-text

Scalable Clustering with Supervised Linkage Methods

10.1101/2021.08.01.454697 ◽

2021 ◽

Author(s):

James Anibal ◽

Alexandre Day ◽

Erol Bahadiroglu ◽

Liam O'Neill ◽

Long Phan ◽

...

Keyword(s):

Single Cell ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Biomedical Sciences ◽

New Approach ◽

Scalable Clustering ◽

Linkage Methods ◽

Density Clustering ◽

Cell Data ◽

Different Levels

Data clustering plays a significant role in biomedical sciences, particularly in single-cell data analysis. Researchers use clustering algorithms to group individual cells into populations that can be evaluated across different levels of disease progression, drug response, and other clinical statuses. In many cases, multiple sets of clusters must be generated to assess varying levels of cluster specificity. For example, there are many subtypes of leukocytes (e.g. T cells), whose individual preponderance and phenotype must be assessed for statistical/functional significance. In this report, we introduce a novel hierarchical density clustering algorithm (HAL-x) that uses supervised linkage methods to build a cluster hierarchy on raw single-cell data. With this new approach, HAL-x can quickly predict multiple sets of labels for immense datasets, achieving a considerable improvement in computational efficiency on large datasets compared to existing methods. We also show that cell clusters generated by HAL-x yield near-perfect F1-scores when classifying different clinical statuses based on single-cell profiles. Our hierarchical density clustering algorithm achieves high accuracy in single cell classification in a scalable, tunable and rapid manner. We make HAL-x publicly available at: https://pypi.org/project/hal-x/

Download Full-text

COMBINING FUZZY PROBABILITY AND FUZZY CLUSTERING FOR MULTISPECTRAL SATELLITE IMAGERY CLASSIFICATION

Vietnam Journal of Science and Technology ◽

10.15625/0866-708x/54/3/6463 ◽

2016 ◽

Vol 54 (3) ◽

pp. 300 ◽

Cited By ~ 2

Author(s):

Mai Dinh Sinh ◽

Le Hung Trinh ◽

Ngo Thanh Long

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Satellite Image ◽

Great Influence ◽

Unsupervised Classification ◽

Classification Algorithms ◽

Fuzzy Probability ◽

Multispectral Satellite Images ◽

The Stability

This paper proposes a method of combining fuzzy probability and fuzzy clustering algorithm to classify on multispectral satellite images by relying on fuzzy probability to calculate the number of clusters and the centroid of clusters then using fuzzy clustering to classifying land-cover on the satellite image. In fact, the classification algorithms, the initialization of the clusters and the initial centroid of clusters have great influence on the stability of the algorithms, dealing time and classification results; the unsupervised classification algorithms such as k-Means, c-Means, Iso-data are used quite common for many problems, but the disadvantages is the low accuracy and unstable, especially when dealing with the problems on the satellite image. Results of the algorithm which are proposed show significant reduction of noise in the clusters and comparison with various clustering algorithms like k-means, iso-data, so on.

Download Full-text

Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

10.26686/wgtn.13058777 ◽

2020 ◽

Author(s):

Andrew Lensen ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Genetic Programming ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Similarity Measures ◽

Small Subset ◽

Similarity Functions ◽

New Approach ◽

Performance Improvements ◽

Consistent Performance ◽

High Dimensional Datasets

Clustering is a difficult and widely studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g., Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally predefined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this article, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.

Download Full-text

Robust Deep Kernel-Based Fuzzy Clustering With Spatial Information for Image Segmentation

10.21203/rs.3.rs-805598/v1 ◽

2021 ◽

Author(s):

Lujia Lei ◽

Chengmao Wu ◽

Xiaoping Tian

Keyword(s):

Image Segmentation ◽

Kernel Function ◽

Loss Function ◽

Fuzzy Clustering ◽

Clustering Algorithm ◽

Spatial Information ◽

Clustering Algorithms ◽

Noise Robustness ◽

Neighborhood Information ◽

Classical Image

Abstract Clustering algorithms with deep neural network have attracted wide attention of scholars. A deep fuzzy K-means clustering algorithm model with adaptive loss function and entropy regularization (DFKM) is proposed by combining automatic encoder and clustering algorithm. Although it introduces adaptive loss function and entropy regularization to improve the robustness of the model, its segmentation effect is not ideal for high noise; At the same time, its model does not use a convolutional auto-encoder, which is not suitable for high-dimensional images.Therefore, on the basis of DFKM, this paper focus on image segmentation, combine neighborhood median and mean information of current pixel, introduce neighborhood information of membership degree, and extend Euclidean distance to kernel space by using kernel function, propose a dual-neighborhood information constrained deep fuzzy clustering based on kernel function (KDFKMS). A large number of experimental results show that compared with DFKM and classical image segmentation algorithms, this algorithm has stronger anti-noise robustness.

Download Full-text