scholarly journals An Extended Affinity Propagation Clustering Method Based on Different Data Density Types

2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
XiuLi Zhao ◽  
WeiXiang Xu

Affinity propagation (AP) algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself.

Author(s):  
Ahmed M. Serdah ◽  
Wesam M. Ashour

Abstract Traditional clustering algorithms are no longer suitable for use in data mining applications that make use of large-scale data. There have been many large-scale data clustering algorithms proposed in recent years, but most of them do not achieve clustering with high quality. Despite that Affinity Propagation (AP) is effective and accurate in normal data clustering, but it is not effective for large-scale data. This paper proposes two methods for large-scale data clustering that depend on a modified version of AP algorithm. The proposed methods are set to ensure both low time complexity and good accuracy of the clustering method. Firstly, a data set is divided into several subsets using one of two methods random fragmentation or K-means. Secondly, subsets are clustered into K clusters using K-Affinity Propagation (KAP) algorithm to select local cluster exemplars in each subset. Thirdly, the inverse weighted clustering algorithm is performed on all local cluster exemplars to select well-suited global exemplars of the whole data set. Finally, all the data points are clustered by the similarity between all global exemplars and each data point. Results show that the proposed clustering method can significantly reduce the clustering time and produce better clustering result in a way that is more effective and accurate than AP, KAP, and HAP algorithms.


2008 ◽  
Vol 22 (16) ◽  
pp. 1547-1566 ◽  
Author(s):  
DARONG LAI ◽  
HONGTAO LU

Identifying communities in complex networks has recently attracted considerable attention in different fields. The goal of community identification is to cluster vertices of a network into groups, which is the same as clustering in machine learning and data mining domains. A recent proposed clustering method called affinity propagation shows high performance in clustering data sets into groups, and it does not require that the number of clusters be pre-specified. In this paper, based on a new method for calculating similarity between pairs of vertices and a transforming method for a given similarity from likelihood to log-domain, we apply that affinity propagation clustering method to identify communities in complex networks. Extensive simulation results demonstrate that affinity propagation clustering algorithm is very effective for identifying community structures in both computer-generated and real-world network data.


Author(s):  
Novendri Isra Asriny ◽  
Muhammad Muhajir ◽  
Devi Andrian

There has been a significant increase in the number of part-time workers in the last 3 years. Data collected from sakernas BPS showed that the number of part-time workers was 125,443,748 in the second period of 2016. This number rapidly increased in 2017, 2018 and 2019 in the same period, by 128,062,746, 131,005,641, and 133,560,880 workers. Based on the increase in the last 3 years, East Java province has the highest number of part-time workers that use the internet. This research aims to determine the number of part-time workers that use the internet by using the k-affinity propagation (K-AP) clustering. This method is used to produce the optimal number of cluster points (exemplar) is the affinity propagation (AP). Three clusters were used to determine the sum of the smallest value ratio. The result showed that clusters 1, 2, and 3 have 3, 23, and 5 members in Bondowoso, Jombang, and Surabaya districts.


2011 ◽  
Vol 268-270 ◽  
pp. 811-816
Author(s):  
Yong Zhou ◽  
Yan Xing

Affinity Propagation(AP)is a new clustering algorithm, which is based on the similarity matrix between pairs of data points and messages are exchanged between data points until clustering result emerges. It is efficient and fast , and it can solve the clustering on large data sets. But the traditional Affinity Propagation has many limitations, this paper introduces the Affinity Propagation, and analyzes in depth the advantages and limitations of it, focuses on the improvements of the algorithm — improve the similarity matrix, adjust the preference and the damping-factor, combine with other algorithms. Finally, discusses the development of Affinity Propagation.


2014 ◽  
Vol 687-691 ◽  
pp. 1496-1499
Author(s):  
Yong Lin Leng

Partially missing or blurring attribute values make data become incomplete during collecting data. Generally we use inputation or discarding method to deal with incomplete data before clustering. In this paper we proposed an a new similarity metrics algorithm based on incomplete information system. First algorithm divided the data set into a complete data set and non complete data set, and then the complete data set was clustered using the affinity propagation clustering algorithm, incomplete data according to the design method of the similarity metric is divided into the corresponding cluster. In order to improve the efficiency of the algorithm, designing the distributed clustering algorithm based on cloud computing technology. Experiment demonstrates the proposed algorithm can cluster the incomplete big data directly and improve the accuracy and effectively.


2013 ◽  
Vol 380-384 ◽  
pp. 1290-1293
Author(s):  
Qing Ju Guo ◽  
Wen Tian Ji ◽  
Sheng Zhong

Lots of research findings have been made from home and abroad on clustering algorithm in recent years. In view of the traditional partition clustering method K-means algorithm, this paper, after analyzing its advantages and disadvantages, combines it with ontology-based data set to establish a semantic web model. It improves the existing clustering algorithm in various constraint conditions with the aim of demonstrating that the improved algorithm has better efficiency and accuracy under semantic web.


Sign in / Sign up

Export Citation Format

Share Document