Shared Nearest Neighbor clustering in a Locality Sensitive Hashing framework

Mapping Intimacies ◽

10.1101/093898 ◽

2016 ◽

Author(s):

Sawsan Kanj ◽

Thomas Brüls ◽

Stéphane Gazut

Keyword(s):

Nearest Neighbor ◽

Reference Data ◽

Sequence Data ◽

Scale Up ◽

Nearest Neighbors ◽

High Accuracy ◽

Locality Sensitive Hashing ◽

High Dimensional ◽

Nearest Neighbor Rule ◽

Shared Nearest Neighbor

AbstractWe present a new algorithm to cluster high dimensional sequence data, and its application to the field of metagenomics, which aims to reconstruct individual genomes from a mixture of genomes sampled from an environ-mental site, without any prior knowledge of reference data (genomes) or the shape of clusters. Such problems typically cannot be solved directly with classical approaches seeking to estimate the density of clusters, e.g., using the shared nearest neighbors rule, due to the prohibitive size of contemporary sequence datasets. We explore here a new method based on combining the shared nearest neighbor (SNN) rule with the concept of Locality Sensitive Hashing (LSH). The proposed method, called LSH-SNN, works by randomly splitting the input data into smaller-sized subsets (buckets) and, employing the shared nearest neighbor rule on each of these buckets. Links can be created among neighbors sharing a sufficient number of elements, hence allowing clusters to be grown from linked elements. LSH-SNN can scale up to larger datasets consisting of millions of sequences, while achieving high accuracy across a variety of sample sizes and complexities.

Download Full-text

Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees

Mathematics ◽

10.3390/math9070779 ◽

2021 ◽

Vol 9 (7) ◽

pp. 779

Author(s):

Ruriko Yoshida

Keyword(s):

Supervised Learning ◽

Phylogenetic Trees ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

High Dimensional ◽

Learning Method ◽

Dimensional Vector ◽

K Nearest Neighbor ◽

K Nearest Neighbors

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.

Download Full-text

A Modified Method for High Dimensional Data Clustering Based on the Combined Approach of Shared Nearest Neighbor Clustering and Unscented Transform

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2018.7405 ◽

2018 ◽

Vol 15 (6) ◽

pp. 2050-2054

Author(s):

M Ravichandran ◽

K. M Subramanian ◽

P Ganesan ◽

R Jothikumar

Keyword(s):

Data Clustering ◽

Nearest Neighbor ◽

High Dimensional Data ◽

High Dimensional ◽

Combined Approach ◽

Unscented Transform ◽

Modified Method ◽

Shared Nearest Neighbor

Download Full-text

Nearest neighbors and Voronoi volumes in high-dimensional point processes with various distance functions

Advances in Applied Probability ◽

10.2307/1427088 ◽

1985 ◽

Vol 17 (4) ◽

pp. 794-809 ◽

Cited By ~ 5

Author(s):

Charles M. Newman ◽

Yosef Rinott

Keyword(s):

Asymptotic Behavior ◽

Point Process ◽

Poisson Point Process ◽

Point Processes ◽

Nearest Neighbor ◽

Correlation Coefficients ◽

Nearest Neighbors ◽

Distance Functions ◽

High Dimensional ◽

Voronoi Region

Consider a Poisson point process of density 1 in Rd, centered so that the origin is one of the points. Using lv distances, 1≦p≦∞, define Nd as the number of other points which have the origin as their nearest neighbor and Vol Vd as the volume of the Voronoi region of the origin. We prove that Nd → Poisson (λ = 1) and Vol Vd → 1 in distribution as d →∞, thus extending previous results from the case p = 2. More generally, for a variety of exchangeable distributions for n + 1 points, e0, · ··, en, in Rd and a variety of distances, we obtain the asymptotic behavior of Ndn, the number of points which have e0 as their nearest neighbor, as n, d → ∞ in one or both of the possible iterated orders. The distributions treated include points distributed on the unit l2 sphere and the distances treated include non-lp distances related to correlation coefficients.

Download Full-text

On the number of reflexive and shared nearest neighbor pairs in one-dimensional uniform data

Probability and Mathematical Statistics ◽

10.19195/0208-4147.38.1.7 ◽

2018 ◽

Vol 38 (1) ◽

pp. 123-137 ◽

Cited By ~ 1

Author(s):

Selim Bahadir ◽

Elvan Ceyhan

Keyword(s):

Random Sample ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

Mass Function ◽

Probability Mass Function ◽

Exact Probability ◽

One Dimensional ◽

Novel Method ◽

Probability Mass ◽

Shared Nearest Neighbor

For a random sample of points in R, we consider the number of pairs whose members are nearest neighbors NNs to each other and the number of pairs sharing a common NN. The pairs of the first type are called reflexive NNs, whereas the pairs of the latter type are called shared NNs. In this article, we consider the case where the random sample of size n is from the uniform distribution on an interval. We denote the number of reflexive NN pairs and the number of shared NN pairs in the sample by Rn and Qn, respectively. We derive the exact forms of the expected value and the variance for both Rn and Qn, and derive a recurrence relation for Rn which may also be used to compute the exact probability mass function pmf of Rn. Our approach is a novel method for finding the pmf of Rn and agrees with the results in the literature. We also present SLLN and CLT results for both Rn and Qn as n goes to infinity.

Download Full-text

Nearest neighbors and Voronoi volumes in high-dimensional point processes with various distance functions

Advances in Applied Probability ◽

10.1017/s000186780001541x ◽

1985 ◽

Vol 17 (04) ◽

pp. 794-809 ◽

Cited By ~ 4

Author(s):

Charles M. Newman ◽

Yosef Rinott

Keyword(s):

Asymptotic Behavior ◽

Point Process ◽

Poisson Point Process ◽

Point Processes ◽

Nearest Neighbor ◽

Correlation Coefficients ◽

Nearest Neighbors ◽

Distance Functions ◽

High Dimensional ◽

Voronoi Region

Consider a Poisson point process of density 1 in R d, centered so that the origin is one of the points. Using lv distances, 1≦p≦∞, define Nd as the number of other points which have the origin as their nearest neighbor and Vol Vd as the volume of the Voronoi region of the origin. We prove that Nd → Poisson (λ = 1) and Vol Vd → 1 in distribution as d →∞, thus extending previous results from the case p = 2. More generally, for a variety of exchangeable distributions for n + 1 points, e 0, · ··, e n, in Rd and a variety of distances, we obtain the asymptotic behavior of Nd n , the number of points which have e 0 as their nearest neighbor, as n, d → ∞ in one or both of the possible iterated orders. The distributions treated include points distributed on the unit l2 sphere and the distances treated include non-l p distances related to correlation coefficients.

Download Full-text

A Hybrid Clustering Algorithm Based on Rough Set and Shared Nearest Neighbors

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.145.189 ◽

2011 ◽

Vol 145 ◽

pp. 189-193 ◽

Cited By ~ 3

Author(s):

Horng Lin Shieh

Keyword(s):

Rough Set ◽

Hybrid Method ◽

Data Clustering ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

Nearest Neighbor Algorithm ◽

Data Set ◽

Lower And Upper Approximations ◽

Shared Nearest Neighbor

In this paper, a hybrid method combining rough set and shared nearest neighbor algorithms is proposed for data clustering with non-globular shapes. The roughk-means algorithm is based on the distances between data and cluster centers. It partitions a data set with globular shapes well, but when the data are non-globular shapes, the results obtained by a roughk-means algorithm are not very satisfactory. In order to resolve this problem, a combined rough set and shared nearest neighbor algorithm is proposed. The proposed algorithm first adopts a shared nearest neighbor algorithm to evaluate the similarity among data, then the lower and upper approximations of a rough set algorithm are used to partition the data set into clusters.

Download Full-text

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Lecture Notes in Computer Science - Databases Theory and Applications ◽

10.1007/978-3-030-69377-0_6 ◽

2021 ◽

pp. 62-73

Author(s):

Omid Jafari ◽

Parth Nagarkar

Keyword(s):

Experimental Analysis ◽

Nearest Neighbor ◽

Locality Sensitive Hashing ◽

High Dimensional ◽

Approximate Nearest Neighbor ◽

Nearest Neighbor Searches

Download Full-text

A SIMPLE LOCALLY ADAPTIVE NEAREST NEIGHBOR RULE WITH APPLICATION TO POLLUTION FORECASTING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001403002952 ◽

2003 ◽

Vol 17 (08) ◽

pp. 1369-1382 ◽

Cited By ~ 16

Author(s):

RICHARD NOCK ◽

MARC SEBBAN ◽

DIDIER BERNARD

Keyword(s):

Fast Algorithm ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

Locally Adaptive ◽

Nearest Neighbor Rule ◽

Neighbor Relationship ◽

Classification Time

In this paper, we propose a thorough investigation of a nearest neighbor rule which we call the "Symmetric Nearest Neighbor (sNN) rule". Basically, it symmetrises the classical nearest neighbor relationship from which are computed the points voting for some instances. Experiments on 29 datasets, most of which are readily available, show that the method significantly outperforms the traditional Nearest Neighbors methods. Experiments on a domain of interest related to tropical pollution normalization also show the greater potential of this method. We finally discuss the reasons for the rule's efficiency, provide methods for speeding-up the classification time, and derive from the sNN rule a reliable and fast algorithm to fix the parameter k in the k-NN rule, a longstanding problem in this field.

Download Full-text

LSR‐forest: An locality sensitive hashing‐based approximate k ‐nearest neighbor query algorithm on high‐dimensional uncertain data

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.5795 ◽

2020 ◽

Author(s):

Jiagang Wang ◽

Tu Qian ◽

Anbang Yang ◽

Hui Wang ◽

Jiangbo Qian

Keyword(s):

Nearest Neighbor ◽

Uncertain Data ◽

Locality Sensitive Hashing ◽

High Dimensional ◽

K Nearest Neighbor ◽

Nearest Neighbor Query ◽

Query Algorithm

Download Full-text