scholarly journals Clustering by Detecting Density Peaks and Assigning Points by Similarity-First Search Based on Weighted K-Nearest Neighbors Graph

Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-17
Author(s):  
Qi Diao ◽  
Yaping Dai ◽  
Qichao An ◽  
Weixing Li ◽  
Xiaoxue Feng ◽  
...  

This paper presents an improved clustering algorithm for categorizing data with arbitrary shapes. Most of the conventional clustering approaches work only with round-shaped clusters. This task can be accomplished by quickly searching and finding clustering methods for density peaks (DPC), but in some cases, it is limited by density peaks and allocation strategy. To overcome these limitations, two improvements are proposed in this paper. To describe the clustering center more comprehensively, the definitions of local density and relative distance are fused with multiple distances, including K-nearest neighbors (KNN) and shared-nearest neighbors (SNN). A similarity-first search algorithm is designed to search the most matching cluster centers for noncenter points in a weighted KNN graph. Extensive comparison with several existing DPC methods, e.g., traditional DPC algorithm, density-based spatial clustering of applications with noise (DBSCAN), affinity propagation (AP), FKNN-DPC, and K-means methods, has been carried out. Experiments based on synthetic data and real data show that the proposed clustering algorithm can outperform DPC, DBSCAN, AP, and K-means in terms of the clustering accuracy (ACC), the adjusted mutual information (AMI), and the adjusted Rand index (ARI).

2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Yaohui Liu ◽  
Dong Liu ◽  
Fang Yu ◽  
Zhengming Ma

Clustering is widely used in data analysis, and density-based methods are developed rapidly in the recent 10 years. Although the state-of-art density peak clustering algorithms are efficient and can detect arbitrary shape clusters, they are nonsphere type of centroid-based methods essentially. In this paper, a novel local density hierarchical clustering algorithm based on reverse nearest neighbors, RNN-LDH, is proposed. By constructing and using a reverse nearest neighbor graph, the extended core regions are found out as initial clusters. Then, a new local density metric is defined to calculate the density of each object; meanwhile, the density hierarchical relationships among the objects are built according to their densities and neighbor relations. Finally, each unclustered object is classified to one of the initial clusters or noise. Results of experiments on synthetic and real data sets show that RNN-LDH outperforms the current clustering methods based on density peak or reverse nearest neighbors.


2021 ◽  
pp. 1-13
Author(s):  
Li Yihong ◽  
Wang Yunpeng ◽  
Li Tao ◽  
Lan Xiaolong ◽  
Song Han

DBSCAN (density-based spatial clustering of applications with noise) is one of the most widely used density-based clustering algorithms, which can find arbitrary shapes of clusters, determine the number of clusters, and identify noise samples automatically. However, the performance of DBSCAN is significantly limited as it is quite sensitive to the parameters of eps and MinPts. Eps represents the eps-neighborhood and MinPts stands for a minimum number of points. Additionally, a dataset with large variations in densities will probably trap the DBSCAN because its parameters are fixed. In order to overcome these limitations, we propose a new density-clustering algorithm called GNN-DBSCAN which uses an adaptive Grid to divide the dataset and defines local core samples by using the Nearest Neighbor. With the help of grid, the dataset space will be divided into a finite number of cells. After that, the nearest neighbor lying in every filled cell and adjacent filled cells are defined as the local core samples. Then, GNN-DBSCAN obtains global core samples by enhancing and screening local core samples. In this way, our algorithm can identify higher-quality core samples than DBSCAN. Lastly, give these global core samples and use dynamic radius based on k-nearest neighbors to cluster the datasets. Dynamic radius can overcome the problems of DBSCAN caused by its fixed parameter eps. Therefore, our method can perform better on dataset with large variations in densities. Experiments on synthetic and real-world datasets were conducted. The results indicate that the average Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI) and V-measure of our proposed algorithm outperform the existing algorithm DBSCAN, DPC, ADBSCAN, and HDBSCAN.


2020 ◽  
Vol 49 (3) ◽  
pp. 395-411
Author(s):  
Qiannan Wu ◽  
Qianqian Zhang ◽  
Ruizhi Sun ◽  
Li Li ◽  
Huiyu Mu ◽  
...  

Cluster analysis plays a crucial component in consumer behavior segment. The density peak clustering algorithm (DPC) is a novel density-based clustering method. However, it performs poorly in high-dimension datasets and the local density for boundary points. In addition, its fault tolerance is affected by one-step allocation strategy. To overcome these disadvantages, an adaptive density peak clustering algorithm based on dimensional-free and reverse k-nearest neighbors (ERK-DPC) is proposed in this paper. First, we compute Euler cosine distance to obtain the similarity of sample points in high-dimension datasets. Then, the adaptive local density formula is used to measure the local density of each point. Finally, the reverse k-nearest neighbor idea is added on two-step allocation strategy, which assigns the remaining points accurately and effectively. The proposed clustering algorithm is experiments on several benchmark datasets and real-world datasets. By comparing the benchmarks, the results demonstrate that the ERK-DPC algorithm superior to some state-of- the-art methods.


2020 ◽  
Author(s):  
Xiaoning Yuan ◽  
Hang Yu ◽  
Jun Liang ◽  
Bing Xu

Abstract Recently the density peaks clustering algorithm (dubbed as DPC) attracts lots of attention. The DPC is able to quickly find cluster centers and complete clustering tasks. And the DPC is suitable for many clustering tasks. However, the cutoff distance 𝑑𝑑𝑐𝑐 is depends on human experience which will greatly affect the clustering results. In addition, the selection of cluster centers requires manual participation which will affect the clustering efficiency. In order to solve these problem, we propose a density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy (dubbed as KNN-ADPC). We propose a clusters merging strategy to automatically aggregate the over-segmented clusters. Additionally, the K nearest neighbors is adopted to divide points more reasonably. The KNN-ADPC only has one parameter and the clustering task can be conducted automatically without human involvement. The experiment results on artificial and real-world datasets prove the higher accuracy of KNN-ADPC compared with DBSCAN, K-means++, DPC and DPC-KNN.


Author(s):  
Xiaoning Yuan ◽  
Hang Yu ◽  
Jun Liang ◽  
Bing Xu

AbstractRecently the density peaks clustering algorithm (DPC) has received a lot of attention from researchers. The DPC algorithm is able to find cluster centers and complete clustering tasks quickly. It is also suitable for different kinds of clustering tasks. However, deciding the cutoff distance $${d}_{c}$$ d c largely depends on human experience which greatly affects clustering results. In addition, the selection of cluster centers requires manual participation which affects the efficiency of the algorithm. In order to solve these problems, we propose a density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy (KNN-ADPC). A clusters merging strategy is proposed to automatically aggregate over-segmented clusters. Additionally, the K nearest neighbors are adopted to divide data points more reasonably. There is only one parameter in KNN-ADPC algorithm, and the clustering task can be conducted automatically without human involvement. The experiment results on artificial and real-world datasets prove higher accuracy of KNN-ADPC compared with DBSCAN, K-means++, DPC, and DPC-KNN.


Sign in / Sign up

Export Citation Format

Share Document