Clustering by Detecting Density Peaks and Assigning Points by Similarity-First Search Based on Weighted K-Nearest Neighbors Graph

Complexity ◽

10.1155/2020/1731075 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

Qi Diao ◽

Yaping Dai ◽

Qichao An ◽

Weixing Li ◽

Xiaoxue Feng ◽

...

Keyword(s):

Clustering Algorithm ◽

Spatial Clustering ◽

Local Density ◽

Search Algorithm ◽

Real Data ◽

Nearest Neighbors ◽

Adjusted Rand Index ◽

Clustering Methods ◽

K Nearest Neighbors ◽

Density Peaks

This paper presents an improved clustering algorithm for categorizing data with arbitrary shapes. Most of the conventional clustering approaches work only with round-shaped clusters. This task can be accomplished by quickly searching and finding clustering methods for density peaks (DPC), but in some cases, it is limited by density peaks and allocation strategy. To overcome these limitations, two improvements are proposed in this paper. To describe the clustering center more comprehensively, the definitions of local density and relative distance are fused with multiple distances, including K-nearest neighbors (KNN) and shared-nearest neighbors (SNN). A similarity-first search algorithm is designed to search the most matching cluster centers for noncenter points in a weighted KNN graph. Extensive comparison with several existing DPC methods, e.g., traditional DPC algorithm, density-based spatial clustering of applications with noise (DBSCAN), affinity propagation (AP), FKNN-DPC, and K-means methods, has been carried out. Experiments based on synthetic data and real data show that the proposed clustering algorithm can outperform DPC, DBSCAN, AP, and K-means in terms of the clustering accuracy (ACC), the adjusted mutual information (AMI), and the adjusted Rand index (ARI).

Download Full-text

A Novel Local Density Hierarchical Clustering Algorithm Based on Reverse Nearest Neighbors

Mathematical Problems in Engineering ◽

10.1155/2019/2959017 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10

Author(s):

Yaohui Liu ◽

Dong Liu ◽

Fang Yu ◽

Zhengming Ma

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Local Density ◽

Clustering Algorithms ◽

Real Data ◽

Nearest Neighbors ◽

Clustering Methods ◽

Density Peak ◽

Hierarchical Clustering Algorithm

Clustering is widely used in data analysis, and density-based methods are developed rapidly in the recent 10 years. Although the state-of-art density peak clustering algorithms are efficient and can detect arbitrary shape clusters, they are nonsphere type of centroid-based methods essentially. In this paper, a novel local density hierarchical clustering algorithm based on reverse nearest neighbors, RNN-LDH, is proposed. By constructing and using a reverse nearest neighbor graph, the extended core regions are found out as initial clusters. Then, a new local density metric is defined to calculate the density of each object; meanwhile, the density hierarchical relationships among the objects are built according to their densities and neighbor relations. Finally, each unclustered object is classified to one of the initial clusters or noise. Results of experiments on synthetic and real data sets show that RNN-LDH outperforms the current clustering methods based on density peak or reverse nearest neighbors.

Download Full-text

A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process

Physica A Statistical Mechanics and its Applications ◽

10.1016/j.physa.2019.03.012 ◽

2019 ◽

Vol 523 ◽

pp. 702-713 ◽

Cited By ~ 16

Author(s):

Jianhua Jiang ◽

Yujun Chen ◽

Xianqiu Meng ◽

Limin Wang ◽

Keqin Li

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Density Peaks ◽

Density Peaks Clustering

Download Full-text

Density Peaks Clustering Algorithm Based on Weighted k-Nearest Neighbors and Geodesic Distance

IEEE Access ◽

10.1109/access.2020.3021903 ◽

2020 ◽

Vol 8 ◽

pp. 168282-168296

Author(s):

Lina Liu ◽

Donghua Yu

Keyword(s):

Clustering Algorithm ◽

Geodesic Distance ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Density Peaks ◽

Density Peaks Clustering

Download Full-text

GNN-DBSCAN: A new density-based algorithm using grid and the nearest neighbor

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211922 ◽

2021 ◽

pp. 1-13

Author(s):

Li Yihong ◽

Wang Yunpeng ◽

Li Tao ◽

Lan Xiaolong ◽

Song Han

Keyword(s):

Mutual Information ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Adjusted Rand Index ◽

K Nearest Neighbors ◽

Normalized Mutual Information ◽

Core Samples ◽

Real World Datasets

DBSCAN (density-based spatial clustering of applications with noise) is one of the most widely used density-based clustering algorithms, which can find arbitrary shapes of clusters, determine the number of clusters, and identify noise samples automatically. However, the performance of DBSCAN is significantly limited as it is quite sensitive to the parameters of eps and MinPts. Eps represents the eps-neighborhood and MinPts stands for a minimum number of points. Additionally, a dataset with large variations in densities will probably trap the DBSCAN because its parameters are fixed. In order to overcome these limitations, we propose a new density-clustering algorithm called GNN-DBSCAN which uses an adaptive Grid to divide the dataset and defines local core samples by using the Nearest Neighbor. With the help of grid, the dataset space will be divided into a finite number of cells. After that, the nearest neighbor lying in every filled cell and adjacent filled cells are defined as the local core samples. Then, GNN-DBSCAN obtains global core samples by enhancing and screening local core samples. In this way, our algorithm can identify higher-quality core samples than DBSCAN. Lastly, give these global core samples and use dynamic radius based on k-nearest neighbors to cluster the datasets. Dynamic radius can overcome the problems of DBSCAN caused by its fixed parameter eps. Therefore, our method can perform better on dataset with large variations in densities. Experiments on synthetic and real-world datasets were conducted. The results indicate that the average Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI) and V-measure of our proposed algorithm outperform the existing algorithm DBSCAN, DPC, ADBSCAN, and HDBSCAN.

Download Full-text

K-nearest neighbors optimized clustering algorithm by fast search and nding the density peaks of a dataset

Scientia Sinica Informationis ◽

10.1360/n112015-00135 ◽

2016 ◽

Vol 46 (2) ◽

pp. 258 ◽

Cited By ~ 9

Author(s):

XIE Juanying ◽

GAO Hongchao ◽

XIE Weixin

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Fast Search ◽

Density Peaks

Download Full-text

A Fuzzy Density Peaks Clustering Algorithm Based on Improved DNA Genetic Algorithm and K-Nearest Neighbors

Lecture Notes in Computer Science - Intelligence Science and Big Data Engineering ◽

10.1007/978-3-030-02698-1_42 ◽

2018 ◽

pp. 476-487

Author(s):

Wenqian Zhang ◽

Wenke Zang

Keyword(s):

Genetic Algorithm ◽

Clustering Algorithm ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Density Peaks ◽

Density Peaks Clustering

Download Full-text

Adaptive density peak clustering based on dimensional-free and reverse k-nearest neighbors

Information Technology And Control ◽

10.5755/j01.itc.49.3.23405 ◽

2020 ◽

Vol 49 (3) ◽

pp. 395-411

Author(s):

Qiannan Wu ◽

Qianqian Zhang ◽

Ruizhi Sun ◽

Li Li ◽

Huiyu Mu ◽

...

Keyword(s):

High Dimension ◽

Clustering Algorithm ◽

Local Density ◽

Nearest Neighbors ◽

Allocation Strategy ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Density Peak ◽

Real World Datasets ◽

Density Peak Clustering

Cluster analysis plays a crucial component in consumer behavior segment. The density peak clustering algorithm (DPC) is a novel density-based clustering method. However, it performs poorly in high-dimension datasets and the local density for boundary points. In addition, its fault tolerance is affected by one-step allocation strategy. To overcome these disadvantages, an adaptive density peak clustering algorithm based on dimensional-free and reverse k-nearest neighbors (ERK-DPC) is proposed in this paper. First, we compute Euler cosine distance to obtain the similarity of sample points in high-dimension datasets. Then, the adaptive local density formula is used to measure the local density of each point. Finally, the reverse k-nearest neighbor idea is added on two-step allocation strategy, which assigns the remaining points accurately and effectively. The proposed clustering algorithm is experiments on several benchmark datasets and real-world datasets. By comparing the benchmarks, the results demonstrate that the ERK-DPC algorithm superior to some state-of- the-art methods.

Download Full-text

Effective Density Peaks Clustering Algorithm Based on the Layered K-Nearest Neighbors and Subcluster Merging

IEEE Access ◽

10.1109/access.2020.3006069 ◽

2020 ◽

Vol 8 ◽

pp. 123449-123468

Author(s):

Chunhua Ren ◽

Linfu Sun ◽

Yang Yu ◽

Qishi Wu

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbors ◽

Effective Density ◽

K Nearest Neighbors ◽

Density Peaks ◽

Density Peaks Clustering

Download Full-text

A Novel Density Peaks Clustering Algorithm Based on K Nearest Neighbors With Adaptive Merging Strategy

10.21203/rs.3.rs-95747/v1 ◽

2020 ◽

Author(s):

Xiaoning Yuan ◽

Hang Yu ◽

Jun Liang ◽

Bing Xu

Keyword(s):

Real World ◽

Clustering Algorithm ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Density Peaks ◽

Density Peaks Clustering ◽

Cutoff Distance ◽

Real World Datasets ◽

Merging Strategy ◽

Selection Of

Abstract Recently the density peaks clustering algorithm (dubbed as DPC) attracts lots of attention. The DPC is able to quickly find cluster centers and complete clustering tasks. And the DPC is suitable for many clustering tasks. However, the cutoff distance 𝑑𝑑𝑐𝑐 is depends on human experience which will greatly affect the clustering results. In addition, the selection of cluster centers requires manual participation which will affect the clustering efficiency. In order to solve these problem, we propose a density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy (dubbed as KNN-ADPC). We propose a clusters merging strategy to automatically aggregate the over-segmented clusters. Additionally, the K nearest neighbors is adopted to divide points more reasonably. The KNN-ADPC only has one parameter and the clustering task can be conducted automatically without human involvement. The experiment results on artificial and real-world datasets prove the higher accuracy of KNN-ADPC compared with DBSCAN, K-means++, DPC and DPC-KNN.

Download Full-text

A novel density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy

International Journal of Machine Learning and Cybernetics ◽

10.1007/s13042-021-01369-7 ◽

2021 ◽

Author(s):

Xiaoning Yuan ◽

Hang Yu ◽

Jun Liang ◽

Bing Xu

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Density Peaks ◽

Data Points ◽

Density Peaks Clustering ◽

Cutoff Distance ◽

Real World Datasets ◽

Merging Strategy ◽

Selection Of

AbstractRecently the density peaks clustering algorithm (DPC) has received a lot of attention from researchers. The DPC algorithm is able to find cluster centers and complete clustering tasks quickly. It is also suitable for different kinds of clustering tasks. However, deciding the cutoff distance $${d}_{c}$$ d c largely depends on human experience which greatly affects clustering results. In addition, the selection of cluster centers requires manual participation which affects the efficiency of the algorithm. In order to solve these problems, we propose a density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy (KNN-ADPC). A clusters merging strategy is proposed to automatically aggregate over-segmented clusters. Additionally, the K nearest neighbors are adopted to divide data points more reasonably. There is only one parameter in KNN-ADPC algorithm, and the clustering task can be conducted automatically without human involvement. The experiment results on artificial and real-world datasets prove higher accuracy of KNN-ADPC compared with DBSCAN, K-means++, DPC, and DPC-KNN.

Download Full-text