A neural algorithm for a fundamental computing problem

Mapping Intimacies ◽

10.1101/180471 ◽

2017 ◽

Author(s):

Sanjoy Dasgupta ◽

Charles F. Stevens ◽

Saket Navlakha

Keyword(s):

Similarity Search ◽

Large Scale ◽

Activity Patterns ◽

Locality Sensitive Hashing ◽

Sensory Function ◽

Information Retrieval Systems ◽

Novel Variant ◽

Benchmark Datasets ◽

Similar Images ◽

Traditional Approaches

Similarity search, such as identifying similar images in a database or similar documents on the Web, is a fundamental computing problem faced by many large-scale information retrieval systems. We discovered that the fly’s olfac-tory circuit solves this problem using a novel variant of a traditional computer science algorithm (called locality-sensitive hashing). The fly’s circuit assigns similar neural activity patterns to similar input stimuli (odors), so that behav-iors learned from one odor can be applied when a similar odor is experienced. The fly’s algorithm, however, uses three new computational ingredients that depart from traditional approaches. We show that these ingredients can be translated to improve the performance of similarity search compared to tra-ditional algorithms when evaluated on several benchmark datasets. Overall, this perspective helps illuminate the logic supporting an important sensory function (olfaction), and it provides a conceptually new algorithm for solving a fundamental computational problem.

Download Full-text

A neural algorithm for a fundamental computing problem

Science ◽

10.1126/science.aam9868 ◽

2017 ◽

Vol 358 (6364) ◽

pp. 793-796 ◽

Cited By ~ 53

Author(s):

Sanjoy Dasgupta ◽

Charles F. Stevens ◽

Saket Navlakha

Keyword(s):

Large Scale ◽

Activity Patterns ◽

Fruit Fly ◽

Locality Sensitive Hashing ◽

Sensory Function ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Similar Images ◽

Traditional Approaches ◽

Similarity Searches

Similarity search—for example, identifying similar images in a database or similar documents on the web—is a fundamental computing problem faced by large-scale information retrieval systems. We discovered that the fruit fly olfactory circuit solves this problem with a variant of a computer science algorithm (called locality-sensitive hashing). The fly circuit assigns similar neural activity patterns to similar odors, so that behaviors learned from one odor can be applied when a similar odor is experienced. The fly algorithm, however, uses three computational strategies that depart from traditional approaches. These strategies can be translated to improve the performance of computational similarity searches. This perspective helps illuminate the logic supporting an important sensory function and provides a conceptually new algorithm for solving a fundamental computational problem.

Download Full-text

Content-Based Similarity Search in Large-Scale DNA Data Storage Systems

10.1101/2020.05.25.115477 ◽

2020 ◽

Author(s):

Callista Bee ◽

Yuan-Jyue Chen ◽

David Ward ◽

Xiaomeng Liu ◽

Georg Seelig ◽

...

Keyword(s):

Data Storage ◽

Dna Sequences ◽

Similarity Search ◽

Large Scale ◽

Storage Systems ◽

Unique Identifier ◽

Query Image ◽

Synthetic Dna ◽

Dna Database ◽

Similar Images

AbstractSynthetic DNA has the potential to store the world’s continuously growing amount of data in an extremely dense and durable medium. Current proposals for DNA-based digital storage systems include the ability to retrieve individual files by their unique identifier, but not by their content. Here, we demonstrate content-based retrieval from a DNA database by learning a mapping from images to DNA sequences such that an encoded query image will retrieve visually similar images from the database via DNA hybridization. We encoded and synthesized a database of 1.6 million images and queried it with a variety of images, showing that each query retrieves a sample of the database containing visually similar images are retrieved at a rate much greater than chance. We compare our results with several algorithms for similarity search in electronic systems, and demonstrate that our molecular approach is competitive with state-of-the-art electronics.One Sentence SummaryLearned encodings enable content-based image similarity search from a database of 1.6 million images encoded in synthetic DNA.

Download Full-text

Locality Sensitive Hashing for Similarity Search Using MapReduce on Large Scale Data

Language Processing and Intelligent Information Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-38634-3_19 ◽

2013 ◽

pp. 171-178 ◽

Cited By ~ 11

Author(s):

Radosław Szmit

Keyword(s):

Similarity Search ◽

Large Scale ◽

Locality Sensitive Hashing ◽

Large Scale Data ◽

Scale Data

Download Full-text

Recent Progress in Machine Learning-based Prediction of Peptide Activity for Drug Discovery

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666190122151634 ◽

2019 ◽

Vol 19 (1) ◽

pp. 4-16 ◽

Cited By ~ 6

Author(s):

Qihui Wu ◽

Hanzhong Ke ◽

Dongli Li ◽

Qi Wang ◽

Jiansong Fang ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Large Scale ◽

Recent Progress ◽

High Specificity ◽

Learning Approaches ◽

Anticancer Peptides ◽

The Past ◽

Traditional Approaches ◽

Large Scale Screening

Over the past decades, peptide as a therapeutic candidate has received increasing attention in drug discovery, especially for antimicrobial peptides (AMPs), anticancer peptides (ACPs) and antiinflammatory peptides (AIPs). It is considered that the peptides can regulate various complex diseases which are previously untouchable. In recent years, the critical problem of antimicrobial resistance drives the pharmaceutical industry to look for new therapeutic agents. Compared to organic small drugs, peptide- based therapy exhibits high specificity and minimal toxicity. Thus, peptides are widely recruited in the design and discovery of new potent drugs. Currently, large-scale screening of peptide activity with traditional approaches is costly, time-consuming and labor-intensive. Hence, in silico methods, mainly machine learning approaches, for their accuracy and effectiveness, have been introduced to predict the peptide activity. In this review, we document the recent progress in machine learning-based prediction of peptides which will be of great benefit to the discovery of potential active AMPs, ACPs and AIPs.

Download Full-text

The Sense of Smell and Aging: What We Have Learned From Population Studies

Innovation in Aging ◽

10.1093/geroni/igaa057.2947 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 810-811

Author(s):

Jayant Pinto

Keyword(s):

Mental Health ◽

Older Adults ◽

Social Life ◽

Large Scale ◽

Population Studies ◽

Vital Role ◽

Sensory Function ◽

Physical And Mental Health ◽

Sense Of Smell ◽

Health Aging

Abstract Decline of the sense of smell with age causes a marked impact on older adults, markedly reducing quality of life. Olfactory dysfunction impairs nutrition, decreases the ability to experience pleasure, and results in depression, among other burdens. Large-scale population studies have identified impaired olfaction as a key heath indicator that predicts the development of decreased physical and mental health, reduced physical activity, weight loss, mild cognitive impairment and dementia, and mortality itself. These data have been generated via analyses of data from several aging cohorts, including the National Social Life, Health, and Aging Project (NSHAP); the Beaver Dam cohort; the Atherosclerosis Risk in Communities project; the Rush Memory and Aging Project; the Health, Aging, and Body Composition project; the Washington Heights/Inwood Columbia Aging Project; among others. In this presentation, we will review the close connection between olfaction, health, aging, including discussion of insights from these studies. We will also discuss emerging data from NSHAP on the effects of sensory function on cognition, mental health, and social interaction, which demonstrate that sensory function plays a vital role in the lives of older adults. Part of a symposium sponsored by Sensory Health Interest Group.

Download Full-text

Large-scale Semantic Parsing without Question-Answer Pairs

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00190 ◽

2014 ◽

Vol 2 ◽

pp. 377-392 ◽

Cited By ~ 40

Author(s):

Siva Reddy ◽

Mirella Lapata ◽

Mark Steedman

Keyword(s):

Natural Language ◽

Large Scale ◽

Graph Matching ◽

State Of The Art ◽

The State ◽

Semantic Parsing ◽

Matching Problem ◽

Weak Supervision ◽

Benchmark Datasets

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.

Download Full-text

Incremental Community Detection on Large Complex Attributed Network

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451216 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1-20

Author(s):

Zhe Chen ◽

Aixin Sun ◽

Xiaokui Xiao

Keyword(s):

Community Detection ◽

Large Scale ◽

Network Data ◽

Topological Information ◽

Community Membership ◽

Attributed Network ◽

Benchmark Datasets ◽

Modularity Maximization ◽

Large Scale Networks

Community detection on network data is a fundamental task, and has many applications in industry. Network data in industry can be very large, with incomplete and complex attributes, and more importantly, growing. This calls for a community detection technique that is able to handle both attribute and topological information on large scale networks, and also is incremental. In this article, we propose inc-AGGMMR, an incremental community detection framework that is able to effectively address the challenges that come from scalability, mixed attributes, incomplete values, and evolving of the network. Through construction of augmented graph, we map attributes into the network by introducing attribute centers and belongingness edges. The communities are then detected by modularity maximization. During this process, we adjust the weights of belongingness edges to balance the contribution between attribute and topological information to the detection of communities. The weight adjustment mechanism enables incremental updates of community membership of all vertices. We evaluate inc-AGGMMR on five benchmark datasets against eight strong baselines. We also provide a case study to incrementally detect communities on a PayPal payment network which contains users with transactions. The results demonstrate inc-AGGMMR’s effectiveness and practicability.

Download Full-text

Astrid

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436907 ◽

2020 ◽

Vol 14 (4) ◽

pp. 471-484

Author(s):

Suraj Shetiya ◽

Saravanan Thirumuruganathan ◽

Nick Koudas ◽

Gautam Das

Keyword(s):

Deep Learning ◽

Objective Function ◽

Pattern Matching ◽

Language Processing ◽

Language Model ◽

Language Models ◽

Selectivity Estimation ◽

Statistical Correlations ◽

Benchmark Datasets ◽

Traditional Approaches

Accurate selectivity estimation for string predicates is a long-standing research challenge in databases. Supporting pattern matching on strings (such as prefix, substring, and suffix) makes this problem much more challenging, thereby necessitating a dedicated study. Traditional approaches often build pruned summary data structures such as tries followed by selectivity estimation using statistical correlations. However, this produces insufficiently accurate cardinality estimates resulting in the selection of sub-optimal plans by the query optimizer. Recently proposed deep learning based approaches leverage techniques from natural language processing such as embeddings to encode the strings and use it to train a model. While this is an improvement over traditional approaches, there is a large scope for improvement. We propose Astrid, a framework for string selectivity estimation that synthesizes ideas from traditional and deep learning based approaches. We make two complementary contributions. First, we propose an embedding algorithm that is query-type (prefix, substring, and suffix) and selectivity aware. Consider three strings 'ab', 'abc' and 'abd' whose prefix frequencies are 1000, 800 and 100 respectively. Our approach would ensure that the embedding for 'ab' is closer to 'abc' than 'abd'. Second, we describe how neural language models could be used for selectivity estimation. While they work well for prefix queries, their performance for substring queries is sub-optimal. We modify the objective function of the neural language model so that it could be used for estimating selectivities of pattern matching queries. We also propose a novel and efficient algorithm for optimizing the new objective function. We conduct extensive experiments over benchmark datasets and show that our proposed approaches achieve state-of-the-art results.

Download Full-text

Accelerating large scale centroid-based clustering with locality sensitive hashing

2016 IEEE 32nd International Conference on Data Engineering (ICDE) ◽

10.1109/icde.2016.7498278 ◽

2016 ◽

Cited By ~ 1

Author(s):

Ryan McConville ◽

Xin Cao ◽

Weiru Liu ◽

Paul Miller

Keyword(s):

Large Scale ◽

Locality Sensitive Hashing

Download Full-text

Efficient Heuristics for Large-Scale Vehicle Routing Problems Using Particle Swarm Optimization

International Journal of Green Computing ◽

10.4018/jgc.2012070103 ◽

2012 ◽

Vol 3 (2) ◽

pp. 34-50

Author(s):

A. Chandramouli ◽

L. Vivek Srinivasan ◽

T. T. Narendran

Keyword(s):

Particle Swarm Optimization ◽

Vehicle Routing ◽

Large Scale ◽

Particle Swarm ◽

Computational Effort ◽

Swarm Optimization ◽

Routing Problem ◽

Customer Base ◽

Benchmark Datasets ◽

Problem Instances

This paper addresses the Capacitated Vehicle Routing Problem (CVRP) with a homogenous fleet of vehicles serving a large customer base. The authors propose a multi-phase heuristic that clusters the nodes based on proximity, orients them along a route, and allots vehicles. For the final phase of determining the routes for each vehicle, they have developed a Particle Swarm Optimization (PSO) approach. Benchmark datasets as well as hypothetical datasets have been used for computational trials. The proposed heuristic is found to perform exceedingly well even for large problem instances, both in terms of quality of solutions and in terms of computational effort.

Download Full-text