scholarly journals On approximate k-nearest neighbor searches based on the earth mover’s distance for efficient content-based multimedia information retrieval

2019 ◽  
Vol 16 (2) ◽  
pp. 615-638 ◽  
Author(s):  
Min-Hee Jang ◽  
Sang-Wook Kim ◽  
Woong-Kee Loh ◽  
Jung-Im Won

The Earth Mover's Distance (EMD) is one of the most-widely used distance functions to measure the similarity between two multimedia objects. While providing good search results, the EMD is too much time consuming to be used in large multimedia databases. To solve the problem, we propose an approximate k-nearest neighbor (k-NN) search method based on the EMD. In the proposed method, the overhead for both disk accesses and EMD computations is reduced significantly, thanks to the approximation. First, the proposed method builds an index using the M-tree, a distance-based multi-dimensional index structure, to reduce the disk access overhead. When building the index, we reduce the number of features in the multimedia objects through dimensionalityreduction. When performing the k-NN search on the M-tree, we find a small set of candidates from the disk using the index and then perform the post-processing on them. Second, the proposed method uses the approximate EMD for index retrieval and post-processing to reduce the computational overhead of the EMD. To compensate the errors due to the approximation, the method provides a way of accuracy improvement of the approximate EMD. We performed extensive experiments to show the efficiency of the proposed method. As a result, the method achieves significant improvement in performance with only small errors: the proposed method outperforms the previous method by up to 67.3% with only 3.5% error.

2020 ◽  
Author(s):  
Cameron Hargreaves ◽  
Matthew Dyer ◽  
Michael Gaultois ◽  
Vitaliy Kurlin ◽  
Matthew J Rosseinsky

It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the Earth Mover’s Distance (EMD) for inorganic compositions, a well-defined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the Inorganic Crystal Structure Database (ICSD). The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of ML techniques. We have found that with no supervision the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML techniques.


2007 ◽  
Vol 01 (02) ◽  
pp. 147-170 ◽  
Author(s):  
KASTURI CHATTERJEE ◽  
SHU-CHING CHEN

An efficient access and indexing framework, called Affinity Hybrid Tree (AH-Tree), is proposed which combines feature and metric spaces in a novel way. The proposed framework helps to organize large image databases and support popular multimedia retrieval mechanisms like Content-Based Image Retrieval (CBIR). It is efficient in terms of computational overhead and fairly accurate in producing query results close to human perception. AH-Tree, by being able to introduce the high level semantic image relationship as it is in its index structure, solves the problem of translating the content-similarity measurement into feature level equivalence which is both painstaking and error-prone. Algorithms for similarity (range and k-nearest neighbor) queries are implemented and extensive experiments are performed which produces encouraging results with low I/O and distance computations and high precision of query results.


2020 ◽  
Author(s):  
Cameron Hargreaves ◽  
Matthew Dyer ◽  
Michael Gaultois ◽  
Vitaliy Kurlin ◽  
Matthew J Rosseinsky

It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the Earth Mover’s Distance (EMD) for inorganic compositions, a well-defined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the Inorganic Crystal Structure Database (ICSD). The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of ML techniques. We have found that with no supervision the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML techniques.


Sign in / Sign up

Export Citation Format

Share Document