scholarly journals Fuzzy Clustering Using the Convex Hull as Geometrical Model

2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
Luca Liparulo ◽  
Andrea Proietti ◽  
Massimo Panella

A new approach to fuzzy clustering is proposed in this paper. It aims to relax some constraints imposed by known algorithms using a generalized geometrical model for clusters that is based on the convex hull computation. A method is also proposed in order to determine suitable membership functions and hence to represent fuzzy clusters based on the adopted geometrical model. The convex hull is not only used at the end of clustering analysis for the geometric data interpretation but also used during the fuzzy data partitioning within an online sequential procedure in order to calculate the membership function. Consequently, a pure fuzzy clustering algorithm is obtained where clusters are fitted to the data distribution by means of the fuzzy membership of patterns to each cluster. The numerical results reported in the paper show the validity and the efficacy of the proposed approach with respect to other well-known clustering algorithms.

1995 ◽  
Vol 05 (02) ◽  
pp. 239-259
Author(s):  
SU HWAN KIM ◽  
SEON WOOK KIM ◽  
TAE WON RHEE

For data analyses, it is very important to combine data with similar attribute values into a categorically homogeneous subset, called a cluster, and this technique is called clustering. Generally crisp clustering algorithms are weak in noise, because each datum should be assigned to exactly one cluster. In order to solve the problem, a fuzzy c-means, a fuzzy maximum likelihood estimation, and an optimal fuzzy clustering algorithms in the fuzzy set theory have been proposed. They, however, require a lot of processing time because of exhaustive iteration with an amount of data and their memberships. Especially large memory space results in the degradation of performance in real-time processing applications, because it takes too much time to swap between the main memory and the secondary memory. To overcome these limitations, an extended fuzzy clustering algorithm based on an unsupervised optimal fuzzy clustering algorithm is proposed in this paper. This algorithm assigns a weight factor to each distinct datum considering its occurrence rate. Also, the proposed extended fuzzy clustering algorithm considers the degree of importances of each attribute, which determines the characteristics of the data. The worst case is that the whole data has an uniformly normal distribution, which means the importance of all attributes are the same. The proposed extended fuzzy clustering algorithm has better performance than the unsupervised optimal fuzzy clustering algorithm in terms of memory space and execution time in most cases. For simulation the proposed algorithm is applied to color image segmentation. Also automatic target detection and multipeak detection are considered as applications. These schemes can be applied to any other fuzzy clustering algorithms.


2021 ◽  
Author(s):  
Qiuyu Song ◽  
Chengmao Wu ◽  
Xiaoping Tian ◽  
Yue Song ◽  
Xiaokang Guo

Abstract The application of fuzzy clustering algorithms in image segmentation is a hot research topic nowadays. Existing fuzzy clustering algorithms have the following three problems: (1)The parameters of spatial information constraints can$'$t be selected adaptively; (2)The image corrupted by high noise can$'$t be segmented effectively; (3)It is difficult to achieve a balance between noise removal and detail preservation. In the fuzzy clustering based on the optimization model, the choice of distance metric is very important. Since the use of Euclidean distance will lead to sensitivity to outliers and noise, it is difficult to obtain satisfactory segmentation results, which will affect the clustering performance. This paper proposes an optimization algorithm based on the kernel-based fuzzy local information clustering integrating non-local information (KFLNLI). The algorithm adopts a self-integration method to introduce local and non-local information of images, which solves the common problems of current clustering algorithm. Firstly, the self-integration method solves the problem of selecting spatial constraint parameters. The algorithm uses continuous self-learning iteration to calculate the weight coefficients; Secondly, the distance metric uses Gaussian kernel function to induce the distance to further enhance the robustness against noise and the adaptivity of processing different images; Finally, both local and non-local information are introduced to achieve a segmentation effect that can eliminate most of the noise and retain the original details of the image. Experimental results show that the algorithm is superior to existing state-of-the-art fuzzy clustering-related algorithm in the presence of high noise.


Author(s):  
Junjie Wu ◽  
Jian Chen ◽  
Hui Xiong

Cluster analysis (Jain & Dubes, 1988) provides insight into the data by dividing the objects into groups (clusters), such that objects in a cluster are more similar to each other than objects in other clusters. Cluster analysis has long played an important role in a wide variety of fields, such as psychology, bioinformatics, pattern recognition, information retrieval, machine learning, and data mining. Many clustering algorithms, such as K-means and Unweighted Pair Group Method with Arithmetic Mean (UPGMA), have been wellestablished. A recent research focus on clustering analysis is to understand the strength and weakness of various clustering algorithms with respect to data factors. Indeed, people have identified some data characteristics that may strongly affect clustering analysis including high dimensionality and sparseness, the large size, noise, types of attributes and data sets, and scales of attributes (Tan, Steinbach, & Kumar, 2005). However, further investigation is expected to reveal whether and how the data distributions can have the impact on the performance of clustering algorithms. Along this line, we study clustering algorithms by answering three questions: 1. What are the systematic differences between the distributions of the resultant clusters by different clustering algorithms? 2. How can the distribution of the “true” cluster sizes make impact on the performances of clustering algorithms? 3. How to choose an appropriate clustering algorithm in practice? The answers to these questions can guide us for the better understanding and the use of clustering methods. This is noteworthy, since 1) in theory, people seldom realized that there are strong relationships between the clustering algorithms and the cluster size distributions, and 2) in practice, how to choose an appropriate clustering algorithm is still a challenging task, especially after an algorithm boom in data mining area. This chapter thus tries to fill this void initially. To this end, we carefully select two widely used categories of clustering algorithms, i.e., K-means and Agglomerative Hierarchical Clustering (AHC), as the representative algorithms for illustration. In the chapter, we first show that K-means tends to generate the clusters with a relatively uniform distribution on the cluster sizes. Then we demonstrate that UPGMA, one of the robust AHC methods, acts in an opposite way to K-means; that is, UPGMA tends to generate the clusters with high variation on the cluster sizes. Indeed, the experimental results indicate that the variations of the resultant cluster sizes by K-means and UPGMA, measured by the Coefficient of Variation (CV), are in the specific intervals, say [0.3, 1.0] and [1.0, 2.5] respectively. Finally, we put together K-means and UPGMA for a further comparison, and propose some rules for the better choice of the clustering schemes from the data distribution point of view.


2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Ze Dong ◽  
Hao Jia ◽  
Miao Liu

This paper presents a fuzzy clustering method based on multiobjective genetic algorithm. The ADNSGA2-FCM algorithm was developed to solve the clustering problem by combining the fuzzy clustering algorithm (FCM) with the multiobjective genetic algorithm (NSGA-II) and introducing an adaptive mechanism. The algorithm does not need to give the number of clusters in advance. After the number of initial clusters and the center coordinates are given randomly, the optimal solution set is found by the multiobjective evolutionary algorithm. After determining the optimal number of clusters by majority vote method, the Jm value is continuously optimized through the combination of Canonical Genetic Algorithm and FCM, and finally the best clustering result is obtained. By using standard UCI dataset verification and comparing with existing single-objective and multiobjective clustering algorithms, the effectiveness of this method is proved.


2020 ◽  
Vol 28 (4) ◽  
pp. 531-561 ◽  
Author(s):  
Andrew Lensen ◽  
Bing Xue ◽  
Mengjie Zhang

Clustering is a difficult and widely studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g., Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally predefined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this article, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.


2013 ◽  
Vol 380-384 ◽  
pp. 1488-1494
Author(s):  
Wang Wei ◽  
Jin Yue Peng

In the research and development of intelligence system, clustering analysis is a very important problem. According to the new direct clustering algorithm using similarity measure of Vague sets as evaluation criteria presented by paper, the Vague direct clustering method are used to analysis using different similarity measure of Vague sets. The experimental result shows that the direct clustering method based on the similarity of Vague sets is effective, and the direct clustering method based on different similarity measure of Vague sets is the same basically, but difference on the steps of clustering. To select different algorithms according different conditions in the work of the actual applications.


2021 ◽  
Author(s):  
James Anibal ◽  
Alexandre Day ◽  
Erol Bahadiroglu ◽  
Liam O'Neill ◽  
Long Phan ◽  
...  

Data clustering plays a significant role in biomedical sciences, particularly in single-cell data analysis. Researchers use clustering algorithms to group individual cells into populations that can be evaluated across different levels of disease progression, drug response, and other clinical statuses. In many cases, multiple sets of clusters must be generated to assess varying levels of cluster specificity. For example, there are many subtypes of leukocytes (e.g. T cells), whose individual preponderance and phenotype must be assessed for statistical/functional significance. In this report, we introduce a novel hierarchical density clustering algorithm (HAL-x) that uses supervised linkage methods to build a cluster hierarchy on raw single-cell data. With this new approach, HAL-x can quickly predict multiple sets of labels for immense datasets, achieving a considerable improvement in computational efficiency on large datasets compared to existing methods. We also show that cell clusters generated by HAL-x yield near-perfect F1-scores when classifying different clinical statuses based on single-cell profiles. Our hierarchical density clustering algorithm achieves high accuracy in single cell classification in a scalable, tunable and rapid manner. We make HAL-x publicly available at: https://pypi.org/project/hal-x/


2016 ◽  
Vol 54 (3) ◽  
pp. 300 ◽  
Author(s):  
Mai Dinh Sinh ◽  
Le Hung Trinh ◽  
Ngo Thanh Long

This paper proposes a method of combining fuzzy probability and fuzzy clustering algorithm to classify on multispectral satellite images by relying on fuzzy probability to calculate the number of clusters and the centroid of clusters then using fuzzy clustering to classifying land-cover on the satellite image. In fact, the classification algorithms, the initialization of the clusters and the initial centroid of clusters have great influence on the stability of the algorithms, dealing time and classification results; the unsupervised classification algorithms such as k-Means, c-Means, Iso-data are used quite common for many problems, but the disadvantages is the low accuracy and unstable, especially when dealing with the problems on the satellite image. Results of the algorithm which are proposed show significant reduction of noise in the clusters and comparison with various clustering algorithms like k-means, iso-data, so on. 


2020 ◽  
Author(s):  
Andrew Lensen ◽  
Bing Xue ◽  
Mengjie Zhang

Clustering is a difficult and widely studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g., Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally predefined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this article, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.


2021 ◽  
Author(s):  
Lujia Lei ◽  
Chengmao Wu ◽  
Xiaoping Tian

Abstract Clustering algorithms with deep neural network have attracted wide attention of scholars. A deep fuzzy K-means clustering algorithm model with adaptive loss function and entropy regularization (DFKM) is proposed by combining automatic encoder and clustering algorithm. Although it introduces adaptive loss function and entropy regularization to improve the robustness of the model, its segmentation effect is not ideal for high noise; At the same time, its model does not use a convolutional auto-encoder, which is not suitable for high-dimensional images.Therefore, on the basis of DFKM, this paper focus on image segmentation, combine neighborhood median and mean information of current pixel, introduce neighborhood information of membership degree, and extend Euclidean distance to kernel space by using kernel function, propose a dual-neighborhood information constrained deep fuzzy clustering based on kernel function (KDFKMS). A large number of experimental results show that compared with DFKM and classical image segmentation algorithms, this algorithm has stronger anti-noise robustness.


Sign in / Sign up

Export Citation Format

Share Document