A neural algorithm for a fundamental computing problem

Sanjoy Dasgupta; Charles F. Stevens; Saket Navlakha

doi:10.1126/science.aam9868

A neural algorithm for a fundamental computing problem

Science ◽

10.1126/science.aam9868 ◽

2017 ◽

Vol 358 (6364) ◽

pp. 793-796 ◽

Cited By ~ 53

Author(s):

Sanjoy Dasgupta ◽

Charles F. Stevens ◽

Saket Navlakha

Keyword(s):

Large Scale ◽

Activity Patterns ◽

Fruit Fly ◽

Locality Sensitive Hashing ◽

Sensory Function ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Similar Images ◽

Traditional Approaches ◽

Similarity Searches

Similarity search—for example, identifying similar images in a database or similar documents on the web—is a fundamental computing problem faced by large-scale information retrieval systems. We discovered that the fruit fly olfactory circuit solves this problem with a variant of a computer science algorithm (called locality-sensitive hashing). The fly circuit assigns similar neural activity patterns to similar odors, so that behaviors learned from one odor can be applied when a similar odor is experienced. The fly algorithm, however, uses three computational strategies that depart from traditional approaches. These strategies can be translated to improve the performance of computational similarity searches. This perspective helps illuminate the logic supporting an important sensory function and provides a conceptually new algorithm for solving a fundamental computational problem.

Download Full-text

A neural algorithm for a fundamental computing problem

10.1101/180471 ◽

2017 ◽

Author(s):

Sanjoy Dasgupta ◽

Charles F. Stevens ◽

Saket Navlakha

Keyword(s):

Similarity Search ◽

Large Scale ◽

Activity Patterns ◽

Locality Sensitive Hashing ◽

Sensory Function ◽

Information Retrieval Systems ◽

Novel Variant ◽

Benchmark Datasets ◽

Similar Images ◽

Traditional Approaches

Similarity search, such as identifying similar images in a database or similar documents on the Web, is a fundamental computing problem faced by many large-scale information retrieval systems. We discovered that the fly’s olfac-tory circuit solves this problem using a novel variant of a traditional computer science algorithm (called locality-sensitive hashing). The fly’s circuit assigns similar neural activity patterns to similar input stimuli (odors), so that behav-iors learned from one odor can be applied when a similar odor is experienced. The fly’s algorithm, however, uses three new computational ingredients that depart from traditional approaches. We show that these ingredients can be translated to improve the performance of similarity search compared to tra-ditional algorithms when evaluated on several benchmark datasets. Overall, this perspective helps illuminate the logic supporting an important sensory function (olfaction), and it provides a conceptually new algorithm for solving a fundamental computational problem.

Download Full-text

IR Research and Innovation in Commercial Online Systems: An Exploratory Survey

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais731 ◽

2013 ◽

Author(s):

Sherry Koshman ◽

Edie Rasmussen

Keyword(s):

Information Retrieval ◽

Large Scale ◽

Interactive Systems ◽

Information Industry ◽

Online Systems ◽

Research And Innovation ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Operational Systems ◽

Mcgill University

From the 1994 CAIS Conference: The Information Industry in Transition McGill University, Montreal, Quebec. May 25 - 27, 1994."Conventional" information retrieval systems (IRS), originating in the research of the 11950s and 1960s, are based on keyword matching and the application of Boolean operators to produce a set of retrieved documents from a database. In the ensuing years, research in information retrieval has identified a number of innovations (for example, automatic weighting of terms, ranked output, and relevance feedback) which have the potential to significantly enhance the performance of IRS, though commercial vendors have been slow to incorporate these changes into their systems. This was the situation in 1988 which led Radecki, in a special issue of Information Processing & Management, to examine the potential for improvements in conventional Boolean retrieval systems, and explore the reasons why these improvements had not been implemented in operational systems. Over the last five years, this position has begun to change as commercial vendors such as Dialog, Dow Jones, West Publishing, and Mead have implemented new, non-Boolean features in their systems, including natural language input, weighted keyword terms, and document ranking. This paper identifies some of the significant findings of IR research and compares them to the implementation of non-Boolean features in such systems. The preliminary survey of new features in commercial systems suggests the need for new methods of evaluation, including the development of evalutation measures appropriate to large-scale, interactive systems.

Download Full-text

A novel locality-sensitive hashing algorithm for similarity searches on large-scale hyperspectral data

Remote Sensing Letters ◽

10.1080/2150704x.2016.1207255 ◽

2016 ◽

Vol 7 (10) ◽

pp. 965-974 ◽

Cited By ~ 6

Author(s):

Yuan Zhou ◽

Chun Liu ◽

Nan Li ◽

Minzhen Li

Keyword(s):

Large Scale ◽

Hyperspectral Data ◽

Locality Sensitive Hashing ◽

Hashing Algorithm ◽

Similarity Searches

Download Full-text

Large-scale image search with text for information retrieval

Journal of Innovations in Engineering Education ◽

10.3126/jiee.v4i1.35390 ◽

2021 ◽

Vol 4 (1) ◽

pp. 87-89

Author(s):

Janardan Bhatta

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Large Scale ◽

Image Feature ◽

Image Search ◽

Search Results ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Text Features ◽

Text Query

Searching images in a large database is a major requirement in Information Retrieval Systems. Expecting image search results based on a text query is a challenging task. In this paper, we leverage the power of Computer Vision and Natural Language Processing in Distributed Machines to lower the latency of search results. Image pixel features are computed based on contrastive loss function for image search. Text features are computed based on the Attention Mechanism for text search. These features are aligned together preserving the information in each text and image feature. Previously, the approach was tested only in multilingual models. However, we have tested it in image-text dataset and it enabled us to search in any form of text or images with high accuracy.

Download Full-text

Managing tail latency in large scale information retrieval systems

ACM SIGIR Forum ◽

10.1145/3451964.3451982 ◽

2020 ◽

Vol 54 (1) ◽

pp. 1-2

Author(s):

Joel M. Mackenzie

Keyword(s):

Information Retrieval ◽

User Experience ◽

Large Scale ◽

Response Times ◽

Smart Devices ◽

Worst Case ◽

Retrieval Systems ◽

Trade Offs ◽

Efficiency And Effectiveness ◽

Information Retrieval Systems

As both the availability of internet access and the prominence of smart devices continue to increase, data is being generated at a rate faster than ever before. This massive increase in data production comes with many challenges, including efficiency concerns for the storage and retrieval of such large-scale data. However, users have grown to expect the sub-second response times that are common in most modern search engines, creating a problem --- how can such large amounts of data continue to be served efficiently enough to satisfy end users? This dissertation investigates several issues regarding tail latency in large-scale information retrieval systems. Tail latency corresponds to the high percentile latency that is observed from a system --- in the case of search, this latency typically corresponds to how long it takes for a query to be processed. In particular, keeping tail latency as low as possible translates to a good experience for all users, as tail latency is directly related to the worst-case latency and hence, the worst possible user experience. The key idea in targeting tail latency is to move from questions such as "what is the median latency of our search engine?" to questions which more accurately capture user experience such as "how many queries take more than 200 ms to return answers?" or "what is the worst case latency that a user may be subject to, and how often might it occur?" While various strategies exist for efficiently processing queries over large textual corpora, prior research has focused almost entirely on improvements to the average processing time or cost of search systems. As a first contribution, we examine some state-of-the-art retrieval algorithms for two popular index organizations, and discuss the trade-offs between them, paying special attention to the notion of tail latency. This research uncovers a number of observations that are subsequently leveraged for improved search efficiency and effectiveness. We then propose and solve a new problem, which involves processing a number of related query variations together, known as multi-queries , to yield higher quality search results. We experiment with a number of algorithmic approaches to efficiently process these multi-queries, and report on the cost, efficiency, and effectiveness trade-offs present with each. Finally, we examine how predictive models can be used to improve the tail latency and end-to-end cost of a commonly used multi-stage retrieval architecture without impacting result effectiveness. By combining ideas from numerous areas of information retrieval, we propose a prediction framework which can be used for training and evaluating several efficiency/effectiveness trade-off parameters, resulting in improved trade-offs between cost, result quality, and tail latency.

Download Full-text

Energy Efficiency in Large Scale Information Retrieval Systems

ACM SIGIR Forum ◽

10.1145/3190580.3190605 ◽

2018 ◽

Vol 51 (3) ◽

pp. 159-160

Author(s):

Matteo Catena

Keyword(s):

Energy Efficiency ◽

Information Retrieval ◽

Large Scale ◽

Retrieval Systems ◽

Information Retrieval Systems

Download Full-text

Fast Retrieval Algorithm for Earth Mover's Distance Using EMD Lower Bounds and a Skipping Algorithm

Advances in Multimedia ◽

10.1155/2011/421820 ◽

2011 ◽

Vol 2011 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Masami Shishibori ◽

Daichi Koizumi ◽

Kenji Kita

Keyword(s):

High Speed ◽

Large Scale ◽

Retrieval Algorithm ◽

Efficient Computation ◽

Earth Mover’S Distance ◽

Earth Mover's Distance ◽

Retrieval Systems ◽

Large Databases ◽

Information Retrieval Systems ◽

Feature Based

The earth mover's distance (EMD) is a measure of the distance between two distributions, and it has been widely used in multimedia information retrieval systems, in particular, in content-based image retrieval systems. When the EMD is applied to image problems based on color or texture, the EMD reflects the human perceptual similarities. However, its computations are too expensive to use in large-scale databases. In order to achieve efficient computation of the EMD during query processing, we have developed “fastEMD,” a library for high-speed feature-based similarity retrievals in large databases. This paper introduces techniques that are used in the implementation of the fastEMD and performs extensive experiments to demonstrate its efficiency.

Download Full-text

Enhancing Clustering and Labeling for Large-Scale Information Retrieval Systems

10.17918/etd-6655 ◽

2021 ◽

Author(s):

Xuemei Gong

Keyword(s):

Information Retrieval ◽

Large Scale ◽

Retrieval Systems ◽

Information Retrieval Systems

Download Full-text

Challenges in building large-scale information retrieval systems

Proceedings of the Second ACM International Conference on Web Search and Data Mining - WSDM '09 ◽

10.1145/1498759.1498761 ◽

2009 ◽

Cited By ~ 92

Author(s):

Jeffrey Dean

Keyword(s):

Information Retrieval ◽

Large Scale ◽

Retrieval Systems ◽

Information Retrieval Systems

Download Full-text

Relevance Predictability in Information Retrieval Systems

Methods of Information in Medicine ◽

10.1055/s-0038-1636254 ◽

1967 ◽

Vol 06 (02) ◽

pp. 45-51 ◽

Cited By ~ 6

Author(s):

A. Kent ◽

J. Belzer ◽

M. Kuhfeerst ◽

E. D. Dym ◽

D. L. Shirey ◽

...

Keyword(s):

Information Retrieval ◽

Experimental Conditions ◽

Treatment Groups ◽

Retrieval Systems ◽

Significant Difference ◽

Information Retrieval Systems ◽

High Predictability ◽

Intermediate Response ◽

Quantitative Indicators ◽

Level Of Processing

An experiment is described which attempts to derive quantitative indicators regarding the potential relevance predictability of the intermediate stimuli used to represent documents in information retrieval systems. In effect, since the decision to peruse an entire document is often predicated upon the examination of one »level of processing« of the document (e.g., the citation and/or abstract), it became interesting to analyze the properties of what constitutes »relevance«. However, prior to such an analysis, an even more elementary step had to be made, namely, to determine what portions of a document should be examined.An evaluation of the ability of intermediate response products (IRPs), functioning as cues to the information content of full documents, to predict the relevance determination that would be subsequently made on these documents by motivated users of information retrieval systems, was made under controlled experimental conditions. The hypothesis that there might be other intermediate response products (selected extracts from the document, i.e., first paragraph, last paragraph, and the combination of first and last paragraph), that would be as representative of the full document as the traditional IRPs (citation and abstract) was tested systematically. The results showed that:1. there is no significant difference among the several IRP treatment groups on the number of cue evaluations of relevancy which match the subsequent user relevancy decision on the document;2. first and last paragraph combinations have consistently predicted relevancy to a higher degree than the other IRPs;3. abstracts were undistinguished as predictors; and4. the apparent high predictability rating for citations was not substantive.Some of these results are quite different than would be expected from previous work with unmotivated subjects.

Download Full-text