High-Dimensional Shared Nearest Neighbor Clustering Algorithm

Author(s):  
Jian Yin ◽  
Xianli Fan ◽  
Yiqun Chen ◽  
Jiangtao Ren
2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
JingDong Tan ◽  
RuJing Wang

Sharing nearest neighbor (SNN) is a novel metric measure of similarity, and it can conquer two hardships: the low similarities between samples and the different densities of classes. At present, there are two popular SNN similarity based clustering methods: JP clustering and SNN density based clustering. Their clustering results highly rely on the weighting value of the single edge, and thus they are very vulnerable. Motivated by the idea of smooth splicing in computing geometry, the authors design a novel SNN similarity based clustering algorithm within the structure of graph theory. Since it inherits complementary intensity-smoothness principle, its generalizing ability surpasses those of the previously mentioned two methods. The experiments on text datasets show its effectiveness.


2021 ◽  
Vol 13 (6) ◽  
pp. 1136
Author(s):  
Yongjun Zhang ◽  
Wangshan Yang ◽  
Xinyi Liu ◽  
Yi Wan ◽  
Xianzhang Zhu ◽  
...  

Efficient building instance segmentation is necessary for many applications such as parallel reconstruction, management and analysis. However, most of the existing instance segmentation methods still suffer from low completeness, low correctness and low quality for building instance segmentation, which are especially obvious for complex building scenes. This paper proposes a novel unsupervised building instance segmentation (UBIS) method of airborne Light Detection and Ranging (LiDAR) point clouds for parallel reconstruction analysis, which combines a clustering algorithm and a novel model consistency evaluation method. The proposed method first divides building point clouds into building instances by the improved kd tree 2D shared nearest neighbor clustering algorithm (Ikd-2DSNN). Then, the geometric feature of the building instance is obtained using the model consistency evaluation method, which is used to determine whether the building instance is a single building instance or a multi-building instance. Finally, for multiple building instances, the improved kd tree 3D shared nearest neighbor clustering algorithm (Ikd-3DSNN) is used to divide multi-building instances again to improve the accuracy of building instance segmentation. Our experimental results demonstrate that the proposed UBIS method obtained good performances for various buildings in different scenes such as high-rise building, podium buildings and a residential area with detached houses. A comparative analysis confirms that the proposed UBIS method performed better than state-of-the-art methods.


Symmetry ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 2014
Author(s):  
Yi Lv ◽  
Mandan Liu ◽  
Yue Xiang

The clustering analysis algorithm is used to reveal the internal relationships among the data without prior knowledge and to further gather some data with common attributes into a group. In order to solve the problem that the existing algorithms always need prior knowledge, we proposed a fast searching density peak clustering algorithm based on the shared nearest neighbor and adaptive clustering center (DPC-SNNACC) algorithm. It can automatically ascertain the number of knee points in the decision graph according to the characteristics of different datasets, and further determine the number of clustering centers without human intervention. First, an improved calculation method of local density based on the symmetric distance matrix was proposed. Then, the position of knee point was obtained by calculating the change in the difference between decision values. Finally, the experimental and comparative evaluation of several datasets from diverse domains established the viability of the DPC-SNNACC algorithm.


2011 ◽  
Vol 145 ◽  
pp. 189-193 ◽  
Author(s):  
Horng Lin Shieh

In this paper, a hybrid method combining rough set and shared nearest neighbor algorithms is proposed for data clustering with non-globular shapes. The roughk-means algorithm is based on the distances between data and cluster centers. It partitions a data set with globular shapes well, but when the data are non-globular shapes, the results obtained by a roughk-means algorithm are not very satisfactory. In order to resolve this problem, a combined rough set and shared nearest neighbor algorithm is proposed. The proposed algorithm first adopts a shared nearest neighbor algorithm to evaluate the similarity among data, then the lower and upper approximations of a rough set algorithm are used to partition the data set into clusters.


2016 ◽  
Author(s):  
Sawsan Kanj ◽  
Thomas Brüls ◽  
Stéphane Gazut

AbstractWe present a new algorithm to cluster high dimensional sequence data, and its application to the field of metagenomics, which aims to reconstruct individual genomes from a mixture of genomes sampled from an environ-mental site, without any prior knowledge of reference data (genomes) or the shape of clusters. Such problems typically cannot be solved directly with classical approaches seeking to estimate the density of clusters, e.g., using the shared nearest neighbors rule, due to the prohibitive size of contemporary sequence datasets. We explore here a new method based on combining the shared nearest neighbor (SNN) rule with the concept of Locality Sensitive Hashing (LSH). The proposed method, called LSH-SNN, works by randomly splitting the input data into smaller-sized subsets (buckets) and, employing the shared nearest neighbor rule on each of these buckets. Links can be created among neighbors sharing a sufficient number of elements, hence allowing clusters to be grown from linked elements. LSH-SNN can scale up to larger datasets consisting of millions of sequences, while achieving high accuracy across a variety of sample sizes and complexities.


Sign in / Sign up

Export Citation Format

Share Document