scholarly journals A Fast Algorithm to Initialize Cluster Centroids in Fuzzy Clustering Applications

Information ◽  
2020 ◽  
Vol 11 (9) ◽  
pp. 446
Author(s):  
Zeynel Cebeci ◽  
Cagatay Cebeci

The goal of partitioning clustering analysis is to divide a dataset into a predetermined number of homogeneous clusters. The quality of final clusters from a prototype-based partitioning algorithm is highly affected by the initially chosen centroids. In this paper, we propose the InoFrep, a novel data-dependent initialization algorithm for improving computational efficiency and robustness in prototype-based hard and fuzzy clustering. The InoFrep is a single-pass algorithm using the frequency polygon data of the feature with the highest peaks count in a dataset. By using the Fuzzy C-means (FCM) clustering algorithm, we empirically compare the performance of the InoFrep on one synthetic and six real datasets to those of two common initialization methods: Random sampling of data points and K-means++. Our results show that the InoFrep algorithm significantly reduces the number of iterations and the computing time required by the FCM algorithm. Additionally, it can be applied to multidimensional large datasets because of its shorter initialization time and independence from dimensionality due to working with only one feature with the highest number of peaks.

Author(s):  
Ke Li ◽  
Yalei Wu ◽  
Shimin Song ◽  
Yi sun ◽  
Jun Wang ◽  
...  

The measurement of spacecraft electrical characteristics and multi-label classification issues are generally including a large amount of unlabeled test data processing, high-dimensional feature redundancy, time-consumed computation, and identification of slow rate. In this paper, a fuzzy c-means offline (FCM) clustering algorithm and the approximate weighted proximal support vector machine (WPSVM) online recognition approach have been proposed to reduce the feature size and improve the speed of classification of electrical characteristics in the spacecraft. In addition, the main component analysis for the complex signals based on the principal component feature extraction is used for the feature selection process. The data capture contribution approach by using thresholds is furthermore applied to resolve the selection problem of the principal component analysis (PCA), which effectively guarantees the validity and consistency of the data. Experimental results indicate that the proposed approach in this paper can obtain better fault diagnosis results of the spacecraft electrical characteristics’ data, improve the accuracy of identification, and shorten the computing time with high efficiency.


2014 ◽  
Vol 926-930 ◽  
pp. 3608-3611 ◽  
Author(s):  
Yi Fan Zhang ◽  
Yong Tao Qian ◽  
Tai Yu Liu ◽  
Shu Yan Wu

In this paper, first introduce data mining knowledge then focuses on the clustering analysis algorithms, including classification clustering algorithm, and each classification typical cluster analysis algorithms, including the formal description of each algorithm as well as the advantages and disadvantages of each algorithm also has a more detailed description. Then carefully introduce data mining algorithm on the basis of cluster analysis. And using cohesion based clustering algorithm with DBSCAN algorithm and clustering in consumer spending in two-dimensional space, 2,000 data points for each area, and get a reasonable clustering results, resulting in hierarchical clustering results valuable information, so as to realize the practical application of the algorithm and clustering analysis theory combined.


Author(s):  
K. Varada Rajkumar ◽  
Adimulam Yesubabu ◽  
K. Subrahmanyam

A hard partition clustering algorithm assigns equally distant points to one of the clusters, where each datum has the probability to appear in simultaneous assignment to further clusters. The fuzzy cluster analysis assigns membership coefficients of data points which are equidistant between two clusters so the information directs have a place toward in excess of one cluster in the meantime. For a subset of CiteScore dataset, fuzzy clustering (fanny) and fuzzy c-means (fcm) algorithms were implemented to study the data points that lie equally distant from each other. Before analysis, clusterability of the dataset was evaluated with Hopkins statistic which resulted in 0.4371, a value &lt; 0.5, indicating that the data is highly clusterable. The optimal clusters were determined using NbClust package, where it is evidenced that 9 various indices proposed 3 cluster solutions as best clusters. Further, appropriate value of fuzziness parameter <em>m</em> was evaluated to determine the distribution of membership values with variation in <em>m</em> from 1 to 2. Coefficient of variation (CV), also known as relative variability was evaluated to study the spread of data. The time complexity of fuzzy clustering (fanny) and fuzzy c-means algorithms were evaluated by keeping data points constant and varying number of clusters.


Sensors ◽  
2019 ◽  
Vol 19 (15) ◽  
pp. 3285 ◽  
Author(s):  
Hang Zhang ◽  
Jian Liu ◽  
Lin Chen ◽  
Ning Chen ◽  
Xiao Yang

Due to the limitation of the fixed structures of neighborhood windows, the quality of spatial information obtained from the neighborhood pixels may be affected by noise. In order to compensate this drawback, a robust fuzzy c-means clustering with non-neighborhood spatial information (FCM_NNS) is presented. Through incorporating non-neighborhood spatial information, the robustness performance of the proposed FCM_NNS with respect to the noise can be significantly improved. The results indicate that FCM_NNS is very effective and robust to noisy aliasing images. Moreover, the comparison of other seven roughness indexes indicates that the proposed FCM_NNS-based F index can characterize the aliasing degree in the surface images and is highly correlated with surface roughness (R2 = 0.9327 for thirty grinding samples).


2020 ◽  
Vol 23 (1) ◽  
pp. 79-89
Author(s):  
Quy Hoang Van ◽  
Huy Tran Van ◽  
Huy Ngo Hoang ◽  
Tuyet Dao Van ◽  
Sergey Ablameyko

The efficient manifold ranking (EMR) algorithm is used quite effectively in content-based image retrieval (CBIR) for large image databases where images are represented by multiple low-level features to describe about the color, texture and shape. The EMR ranking algorithm requires steps to determine anchor points of the image database by using the k-means hard clustering and the accuracy of the ranking depends strongly on the selected anchor points. This paper describes a new result based on a modified Fuzzy C-Means (FCM) clustering algorithm to select anchor points in the large database in order to increase the efficiency of manifold ranking specially for the large database cases. Experiments have demonstrated the effectiveness of the proposed algorithm for the issue of building an anchor graph, the set of anchor points determined by this novel lvdc-FCM algorithm has actually increased the effective of manifold ranking and the quality of images query results which retrieved of the CBIR.


2013 ◽  
Vol 475-476 ◽  
pp. 968-971
Author(s):  
Hai Xue Liu ◽  
Rui Jun Yang ◽  
Wen Ju Li ◽  
Wan Jun Yu ◽  
Wei Lu

In this paper, we present an improved text clustering algorithm. It not only maintains the self-organizing features of SOM network, but also makes up the disadvantages of the bad clustering effect caused by the inadequate selection of K-means algorithm. Firstly, data is preprocessed to form vector space model for subsequent process. Then, we analyze the features of original clustering algorithm and SOM algorithm, and plan an improved SOM clustering algorithm to overcome low stability and poor quality of original algorithm. The experimental results indicate that the improved algorithm has a higher accuracy and has a better stability, compared with the original algorithm.


Author(s):  
Sonia Goel ◽  
Meena Tushir

Introduction: Incomplete data sets containing some missing attributes is a prevailing problem in many research areas. The reasons for the lack of missing attributes may be several; human error in tabulating/recording the data, machine failure, errors in data acquisition or refusal of a patient/customer to answer few questions in a questionnaire or survey. Further, clustering of such data sets becomes a challenge. Objective: In this paper, we presented a critical review of various methodologies proposed for handling missing data in clustering. The focus of this paper is the comparison of various imputation techniques based FCM clustering and the four clustering strategies proposed by Hathway and Bezdek. Methods: In this paper, we imputed the missing values in incomplete datasets by various imputation/ non-imputation techniques to complete the data set and then conventional fuzzy clustering algorithm is applied to get the clustering results. Results: Experiments on various synthetic data sets and real data sets from UCI repository are carried out. To evaluate the performance of the various imputation/ non-imputation based FCM clustering algorithm, several performance criteria and statistical tests are considered. Experimental results on various data sets show that the linear interpolation based FCM clustering performs significantly better than other imputation as well as non-imputation techniques. Conclusion: It is concluded that the clustering algorithm is data specific, no clustering technique can give good results on all data sets. It depends upon both the data type and the percentage of missing attributes in the dataset. Through this study, we have shown that the linear interpolation based FCM clustering algorithm can be used effectively for clustering of incomplete data set.


2010 ◽  
Vol 40-41 ◽  
pp. 174-182
Author(s):  
Wei Jin Chen ◽  
Huai Lin Dong ◽  
Qing Feng Wu ◽  
Ling Lin

The evaluation of clustering validity is important for clustering analysis, and is one of the hottest spots of cluster analysis. The quality of the evaluation of clustering is that optimal number of clusters is reasonable. For fuzzy clustering, the paper surveys the widely known fuzzy clustering validity evaluation based on the methods of fuzzy partition, geometry structure and statistics.


Author(s):  
Pavel Osipov ◽  
Arkady Borisov

Practice of Web Data Mining Methods ApplicationRecent growth of information on the Internet imposes high demands on the effectiveness of processing algorithms. This paper discusses some algorithms from the field of Web Data Mining which have proved effective in many existing applications. The paper is divided into two logical parts; the first part provides a theoretical description of the algorithms, but the second one contains examples of their successful use to solve real problems. Search algorithms of vague duplicates of documents are currently actively used by all the leading search engines in the world. The paper describes the following algorithms: shingles, signature methods and image-based algorithms. Such methods of classification as a method of fuzzy clustering to-medium (Fuzzy cmeans/ FCM clustering) and clustering by ant colony (Standard Ant Clustering Algorithm SACA) are considered. In conclusion, the experience of the successful application of fuzzy clustering in conjunction with the software toolkit DataEngine to improve the efficiency of the bank "BCI Bank" is described as well as the sharing of the ant colony clustering method in conjunction with linear genetic programming to meet the increasing efficiency of predicting the load on the servers of high load Internet portal Monash Institut.


2018 ◽  
Vol 3 (1) ◽  
pp. 3-16 ◽  
Author(s):  
Wenyuan Zhang ◽  
Guoxin Tan ◽  
Ming Lei ◽  
Xiaomei Guo ◽  
Chuanming Sun

Millions of geo-tagged photos are becoming available due to the wide spread of photo-sharing websites, which provide valuable information to mine spatial patterns from human activities. In this study, we present a simple and fast density-based spatial clustering algorithm to detect popular scenic spots using geo-tagged photos collected from Flickr. In this algorithm, Gaussian kernel is applied to estimate local density of data points, and a decision graph is used to obtain cluster centers easily. More than 289,000 geo-tagged photos located in five typical cities of China are downloaded as case studies, and data pre-processing such as duplicate removing is performed to improve the quality of clustering result. Finally, popular tourist attractions of each sample city are successfully detected with this algorithm, and our result is useful for recommending some interesting destinations which might not be on the list of tourist website or mobile guide applications. The proposed solution is robust with respect to different distributions of photos, and it is efficient by comparing with other popular clustering approaches.


Sign in / Sign up

Export Citation Format

Share Document