LEAP: A Generalization of the Landau-Vishkin Algorithm with Custom Gap Penalties

Mapping Intimacies ◽

10.1101/133157 ◽

2017 ◽

Author(s):

Hongyi Xin ◽

Jeremie Kim ◽

Sunny Nahar ◽

Can Alkan ◽

Onur Mutlu

Keyword(s):

State Of The Art ◽

String Matching ◽

The State ◽

Levenshtein Distance ◽

Approximate String Matching ◽

Matching Problem ◽

De Bruijn Sequence ◽

Scoring Schemes ◽

Bit Vector ◽

Selection Of

AbstractMotivationApproximate String Matching is a pivotal problem in the field of computer science. It serves as an integral component for many string algorithms, most notably, DNA read mapping and alignment. The improved LV algorithm proposes an improved dynamic programming strategy over the banded Smith-Waterman algorithm but suffers from support of a limited selection of scoring schemes. In this paper, we propose the Leaping Toad problem, a generalization of the approximate string matching problem, as well as LEAP, a generalization of the Landau-Vishkin’s algorithm that solves the Leaping Toad problem under a broader selection of scoring schemes.ResultsWe benchmarked LEAP against 3 state-of-the-art approximate string matching implementations. We show that when using a bit-vectorized de Bruijn sequence based optimization, LEAP is up to 7.4x faster than the state-of-the-art bit-vector Levenshtein distance implementation and up to 32x faster than the state-of-the-art affine-gap-penalty parallel Needleman Wunsch Implementation.AvailabilityWe provide an implementation of LEAP in C++ at github.com/CMU-SAFARI/[email protected], [email protected] or [email protected]

Download Full-text

Large-scale Semantic Parsing without Question-Answer Pairs

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00190 ◽

2014 ◽

Vol 2 ◽

pp. 377-392 ◽

Cited By ~ 40

Author(s):

Siva Reddy ◽

Mirella Lapata ◽

Mark Steedman

Keyword(s):

Natural Language ◽

Large Scale ◽

Graph Matching ◽

State Of The Art ◽

The State ◽

Semantic Parsing ◽

Matching Problem ◽

Weak Supervision ◽

Benchmark Datasets

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.

Download Full-text

A fast bit-vector algorithm for approximate string matching based on dynamic programming

Combinatorial Pattern Matching - Lecture Notes in Computer Science ◽

10.1007/bfb0030777 ◽

1998 ◽

pp. 1-13 ◽

Cited By ~ 13

Author(s):

Gene Myers

Keyword(s):

Dynamic Programming ◽

String Matching ◽

Approximate String Matching ◽

Bit Vector

Download Full-text

3D Printing in Heterogeneous Catalysis—The State of the Art

Materials ◽

10.3390/ma13204534 ◽

2020 ◽

Vol 13 (20) ◽

pp. 4534 ◽

Cited By ~ 1

Author(s):

Elżbieta Bogdan ◽

Piotr Michorczyk

Keyword(s):

Additive Manufacturing ◽

Heterogeneous Catalysis ◽

3D Printing ◽

Chemical Synthesis ◽

State Of The Art ◽

Three Dimensional ◽

Production Method ◽

The State ◽

Monolithic Catalysts ◽

Selection Of

This paper describes the process of additive manufacturing and a selection of three-dimensional (3D) printing methods which have applications in chemical synthesis, specifically for the production of monolithic catalysts. A review was conducted on reference literature for 3D printing applications in the field of catalysis. It was proven that 3D printing is a promising production method for catalysts.

Download Full-text

Design and evaluation of a biologically-inspired cloud elasticity framework

Cluster Computing ◽

10.1007/s10586-020-03073-7 ◽

2020 ◽

Vol 23 (4) ◽

pp. 3095-3117

Author(s):

Amjad Ullah ◽

Jingpeng Li ◽

Amir Hussain

Keyword(s):

Fuzzy System ◽

State Of The Art ◽

The State ◽

Effective Management ◽

Biologically Inspired ◽

Computational Overhead ◽

Cloud Elasticity ◽

Computational Resources ◽

Auto Scaling ◽

Selection Of

Abstract The elasticity in cloud is essential to the effective management of computational resources as it enables readjustment at runtime to meet application demands. Over the years, researchers and practitioners have proposed many auto-scaling solutions using versatile techniques ranging from simple if-then-else based rules to sophisticated optimisation, control theory and machine learning based methods. However, despite an extensive range of existing elasticity research, the aim of implementing an efficient scaling technique that satisfies the actual demands is still a challenge to achieve. The existing methods suffer from issues like: (1) the lack of adaptability and static scaling behaviour whilst considering completely fixed approaches; (2) the burden of additional computational overhead, the inability to cope with the sudden changes in the workload behaviour and the preference of adaptability over reliability at runtime whilst considering the fully dynamic approaches; and (3) the lack of considering uncertainty aspects while designing auto-scaling solutions. In this paper, we aim to address these issues using a holistic biologically-inspired feedback switch controller. This method utilises multiple controllers and a switching mechanism, implemented using fuzzy system, that realises the selection of suitable controller at runtime. The fuzzy system also facilitates the design of qualitative elasticity rules. Furthermore, to improve the possibility of avoiding the oscillatory behaviour (a problem commonly associated with switch methodologies), this paper integrates a biologically-inspired computational model of action selection. Lastly, we identify seven different kinds of real workload patterns and utilise them to evaluate the performance of the proposed method against the state-of-the-art approaches. The obtained computational results demonstrate that the proposed method results in achieving better performance without incurring any additional cost in comparison to the state-of-the-art approaches.

Download Full-text

A fast bit-vector algorithm for approximate string matching based on dynamic programming

Journal of the ACM ◽

10.1145/316542.316550 ◽

1999 ◽

Vol 46 (3) ◽

pp. 395-415 ◽

Cited By ~ 228

Author(s):

Gene Myers

Keyword(s):

Dynamic Programming ◽

String Matching ◽

Approximate String Matching ◽

Bit Vector

Download Full-text

Generating Multiple Diverse Responses with Multi-Mapping and Posterior Mapping Selection

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/683 ◽

2019 ◽

Cited By ~ 1

Author(s):

Chaotao Chen ◽

Jinhua Peng ◽

Fan Wang ◽

Jun Xu ◽

Hua Wu

Keyword(s):

State Of The Art ◽

The State ◽

Target Response ◽

Empirical Results ◽

Art Methods ◽

Mapping Mechanism ◽

The One ◽

Selection Of

In human conversation an input post is open to multiple potential responses, which is typically regarded as a one-to-many problem. Promising approaches mainly incorporate multiple latent mechanisms to build the one-to-many relationship. However, without accurate selection of the latent mechanism corresponding to the target response during training, these methods suffer from a rough optimization of latent mechanisms. In this paper, we propose a multi-mapping mechanism to better capture the one-to-many relationship, where multiple mapping modules are employed as latent mechanisms to model the semantic mappings from an input post to its diverse responses. For accurate optimization of latent mechanisms, a posterior mapping selection module is designed to select the corresponding mapping module according to the target response for further optimization. We also introduce an auxiliary matching loss to facilitate the optimization of posterior mapping selection. Empirical results demonstrate the superiority of our model in generating multiple diverse and informative responses over the state-of-the-art methods.

Download Full-text

Studi Perbandingan Algoritma Pencarian String dalam Metode Approximate String Matching untuk Identifikasi Kesalahan Pengetikan Teks

Jurnal Buana Informatika ◽

10.24002/jbi.v7i2.491 ◽

2016 ◽

Vol 7 (2) ◽

Cited By ~ 1

Author(s):

Yeny Rochmawati ◽

Retno Kusumaningrum

Keyword(s):

Hamming Distance ◽

String Matching ◽

Mean Average Precision ◽

Levenshtein Distance ◽

Approximate String Matching ◽

Average Precision ◽

Relevance Judgments ◽

Typing Error ◽

The Mean ◽

Distance Hamming

Abstract. Error typing resulting in the change of standard words into non-standard words are often caused by misspelling. This can be addressed by developing a system to identify errors in typing. Approximate string matching is one method that is widely implemented to identify error typing by using several string search algorithms, i.e. Levenshtein Distance, Hamming Distance, Damerau Levenshtein Distance and Jaro Winkler Distance. However, there is no study that compares the performance of the four algorithms.Â Therefore, this research aims to compare the performance between the four algorithms in order to identify which algorithm is the most accurate and precise in the search string based on various errors typing. Evaluation is performed by using usersâ€™ relevance judgments which produce the mean average precision (MAP) to determine the best algorithm. The result shows that Jaro Winkler Distance algorithm is the best in word-checking with 0.87 of MAP value when identifying the typing error of 50 incorrect words.Keywords: Errors typing, Levenshtein, Hamming, Damerau Levenshtein, Jaro WinklerÂ Abstrak. Kesalahan pengetikan mengakibatkan kata baku berubah menjadi kata tidak baku karena ejaan yang digunakan tidak sesuai. Hal tersebut dapat ditangani dengan mengembangkan sistem untuk mengidentifikasi kesalahan pengetikan. Metode approximate string matching merupakan salah satu metode yang banyak diterapkan untuk mengidentifikasi kesalahan pengetikan dengan berbagai jenis algoritma pencarian string yaitu Levenshtein Distance, Hamming Distance, Damerau Levenshtein Distance dan Jaro Winkler Distance. Akan tetapi studi perbandingan kinerja dari keempat algoritma tersebut untuk Bahasa Indonesia belum pernah dilakukan. Oleh karena itu penelitian ini bertujuan untuk melakukan studi perbandingan kinerja dari keempat algoritma tersebut sehingga dapat diketahui algoritma mana yang lebih akurat dan tepat dalam pencarian string berdasarkan kesalahan penulisan yang bervariasi. Evaluasi yang dilakukan menggunakan user relevance judgement yang menghasilkan nilai mean average precision (MAP) untuk menentukan algoritma yang terbaik. Hasil penelitian terhadap 50 kata salah menunjukkan bahwa algoritma Jaro Winkler Distance terbaik dalam melakukan pengecekan kata dengan nilai MAP sebesar 0,87.Kata Kunci: Kesalahan pengetikan, Levenshtein, Hamming, Damerau Levenshtein, Jaro Winkler

Download Full-text

Approximate Chinese String Matching Techniques Based on Pinyin Input Method

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1017 ◽

2014 ◽

Vol 513-517 ◽

pp. 1017-1020

Author(s):

Bing Liu ◽

Dan Han ◽

Shuang Zhang

Keyword(s):

Computer Science ◽

Rapid Development ◽

String Matching ◽

Approximate String Matching ◽

Chinese Characters ◽

Matching Problem ◽

Input Method ◽

Large Size ◽

Matching Techniques ◽

Research And Design

String matching is one of the most typical problems in computer science. Previous studies mainly focused on accurate string matching problem. However, with the rapid development of the computer and Internet as well as the continuously rising of new issues, people find that it has very important theoretical value and practical meaning to research and design efficient approximate string matching algorithms. Approximate string matching is also called string matching that allows errors, which mainly aims to find the pattern string in the text and database and allows k differences between the pattern string and its occurring forms in the text. For the problem of approximate string matching, though a number of algorithms have been proposed, there are fewer studies which focus on large size of alphabet . Most of experts are interested in small or middle size of alphabet . For large size of , especially for Chinese characters and Asian phonetics, there are fewer efficient algorithms. For the above reasons, this paper focuses on the approximate Chinese strings matching problem based on the pinyin input method.

Download Full-text

A Graph Theoretic Model to Solve the Approximate String Matching Problem Allowing for Translocations

Lecture Notes in Computer Science - Combinatorial Algorithms ◽

10.1007/978-3-642-35926-2_20 ◽

2012 ◽

pp. 169-181

Author(s):

Pritom Ahmed ◽

A. S. M. Shohidull Islam ◽

M. Sohel Rahman

Keyword(s):

String Matching ◽

Approximate String Matching ◽

Theoretic Model ◽

Matching Problem ◽

Graph Theoretic

Download Full-text

A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations

PLoS ONE ◽

10.1371/journal.pone.0186251 ◽

2017 ◽

Vol 12 (10) ◽

pp. e0186251 ◽

Cited By ~ 11

Author(s):

ThienLuan Ho ◽

Seung-Rohk Oh ◽

HyunJin Kim

Keyword(s):

Graphics Processing Units ◽

String Matching ◽

Levenshtein Distance ◽

Approximate String Matching ◽

Graphics Processing

Download Full-text