On approximate k-nearest neighbor searches based on the earth mover’s distance for efficient content-based multimedia information retrieval

It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the Earth Mover’s Distance (EMD) for inorganic compositions, a well-defined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the Inorganic Crystal Structure Database (ICSD). The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of ML techniques. We have found that with no supervision the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML techniques.

Download Full-text

A NOVEL INDEXING AND ACCESS MECHANISM USING AFFINITY HYBRID TREE FOR CONTENT-BASED IMAGE RETRIEVAL IN MULTIMEDIA DATABASES

International Journal of Semantic Computing ◽

10.1142/s1793351x07000093 ◽

2007 ◽

Vol 01 (02) ◽

pp. 147-170 ◽

Cited By ~ 5

Author(s):

KASTURI CHATTERJEE ◽

SHU-CHING CHEN

Keyword(s):

Image Retrieval ◽

Metric Spaces ◽

Nearest Neighbor ◽

Human Perception ◽

Multimedia Databases ◽

Multimedia Retrieval ◽

Index Structure ◽

Content Based Image Retrieval ◽

K Nearest Neighbor ◽

Computational Overhead

An efficient access and indexing framework, called Affinity Hybrid Tree (AH-Tree), is proposed which combines feature and metric spaces in a novel way. The proposed framework helps to organize large image databases and support popular multimedia retrieval mechanisms like Content-Based Image Retrieval (CBIR). It is efficient in terms of computational overhead and fairly accurate in producing query results close to human perception. AH-Tree, by being able to introduce the high level semantic image relationship as it is in its index structure, solves the problem of translating the content-similarity measurement into feature level equivalence which is both painstaking and error-prone. Algorithms for similarity (range and k-nearest neighbor) queries are implemented and extensive experiments are performed which produces encouraging results with low I/O and distance computations and high precision of query results.

Download Full-text

Efficient similarity search using the Earth Mover's Distance for large multimedia databases

2008 IEEE 24th International Conference on Data Engineering ◽

10.1109/icde.2008.4497439 ◽

2008 ◽

Cited By ~ 18

Author(s):

Ira Assent ◽

Marc Wichterich ◽

Tobias Meisen ◽

Thomas Seidl

Keyword(s):

Similarity Search ◽

Multimedia Databases ◽

Earth Mover’S Distance ◽

Earth Mover's Distance ◽

The Earth

Download Full-text

The Earth Mover’s Distance as a Metric for the Space of Inorganic Compositions

10.26434/chemrxiv.12777566 ◽

2020 ◽

Author(s):

Cameron Hargreaves ◽

Matthew Dyer ◽

Michael Gaultois ◽

Vitaliy Kurlin ◽

Matthew J Rosseinsky

Keyword(s):

Euclidean Distance ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Inorganic Crystal Structure Database ◽

Earth Mover’S Distance ◽

Chemical Similarity ◽

Earth Mover's Distance ◽

Neighbor Search ◽

The Earth ◽

Binary Compounds

It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the Earth Mover’s Distance (EMD) for inorganic compositions, a well-defined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the Inorganic Crystal Structure Database (ICSD). The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of ML techniques. We have found that with no supervision the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML techniques.

Download Full-text

Efficient Filter Approximation Using the Earth Mover's Distance in Very Large Multimedia Databases with Feature Signatures

Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management - CIKM '14 ◽

10.1145/2661829.2661877 ◽

2014 ◽

Cited By ~ 13

Author(s):

Merih Seran Uysal ◽

Christian Beecks ◽

Jochen Schmücking ◽

Thomas Seidl

Keyword(s):

Multimedia Databases ◽

Earth Mover’S Distance ◽

Earth Mover's Distance ◽

The Earth ◽

Filter Approximation

Download Full-text

Exploring multimedia databases via optimization-based relevance feedback and the earth mover's distance

Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09 ◽

10.1145/1645953.1646187 ◽

2009 ◽

Cited By ~ 1

Author(s):

Marc Wichterich ◽

Christian Beecks ◽

Martin Sundermeyer ◽

Thomas Seidl

Keyword(s):

Relevance Feedback ◽

Multimedia Databases ◽

Earth Mover’S Distance ◽

Earth Mover's Distance ◽

The Earth

Download Full-text

The Earth Mover's Distance under transformation sets

Proceedings of the Seventh IEEE International Conference on Computer Vision ◽

10.1109/iccv.1999.790393 ◽

1999 ◽

Cited By ~ 53

Author(s):

S. Cohen ◽

L. Guibasm

Keyword(s):

Earth Mover’S Distance ◽

Earth Mover's Distance ◽

The Earth

Download Full-text

On the earth mover's distance as a performance metric for sparse support recovery

2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP) ◽

10.1109/globalsip.2016.7906065 ◽

2016 ◽

Cited By ~ 1

Author(s):

A. Lavrenko ◽

F. Romer ◽

G. Del Galdo ◽

R. Thoma

Keyword(s):

Earth Mover’S Distance ◽

Earth Mover's Distance ◽

Performance Metric ◽

The Earth ◽

Support Recovery ◽

A Performance

Download Full-text