scholarly journals LEAP: A Generalization of the Landau-Vishkin Algorithm with Custom Gap Penalties

2017 ◽  
Author(s):  
Hongyi Xin ◽  
Jeremie Kim ◽  
Sunny Nahar ◽  
Can Alkan ◽  
Onur Mutlu

AbstractMotivationApproximate String Matching is a pivotal problem in the field of computer science. It serves as an integral component for many string algorithms, most notably, DNA read mapping and alignment. The improved LV algorithm proposes an improved dynamic programming strategy over the banded Smith-Waterman algorithm but suffers from support of a limited selection of scoring schemes. In this paper, we propose the Leaping Toad problem, a generalization of the approximate string matching problem, as well as LEAP, a generalization of the Landau-Vishkin’s algorithm that solves the Leaping Toad problem under a broader selection of scoring schemes.ResultsWe benchmarked LEAP against 3 state-of-the-art approximate string matching implementations. We show that when using a bit-vectorized de Bruijn sequence based optimization, LEAP is up to 7.4x faster than the state-of-the-art bit-vector Levenshtein distance implementation and up to 32x faster than the state-of-the-art affine-gap-penalty parallel Needleman Wunsch Implementation.AvailabilityWe provide an implementation of LEAP in C++ at github.com/CMU-SAFARI/[email protected], [email protected] or [email protected]

Author(s):  
Siva Reddy ◽  
Mirella Lapata ◽  
Mark Steedman

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.


Materials ◽  
2020 ◽  
Vol 13 (20) ◽  
pp. 4534 ◽  
Author(s):  
Elżbieta Bogdan ◽  
Piotr Michorczyk

This paper describes the process of additive manufacturing and a selection of three-dimensional (3D) printing methods which have applications in chemical synthesis, specifically for the production of monolithic catalysts. A review was conducted on reference literature for 3D printing applications in the field of catalysis. It was proven that 3D printing is a promising production method for catalysts.


2020 ◽  
Vol 23 (4) ◽  
pp. 3095-3117
Author(s):  
Amjad Ullah ◽  
Jingpeng Li ◽  
Amir Hussain

Abstract The elasticity in cloud is essential to the effective management of computational resources as it enables readjustment at runtime to meet application demands. Over the years, researchers and practitioners have proposed many auto-scaling solutions using versatile techniques ranging from simple if-then-else based rules to sophisticated optimisation, control theory and machine learning based methods. However, despite an extensive range of existing elasticity research, the aim of implementing an efficient scaling technique that satisfies the actual demands is still a challenge to achieve. The existing methods suffer from issues like: (1) the lack of adaptability and static scaling behaviour whilst considering completely fixed approaches; (2) the burden of additional computational overhead, the inability to cope with the sudden changes in the workload behaviour and the preference of adaptability over reliability at runtime whilst considering the fully dynamic approaches; and (3) the lack of considering uncertainty aspects while designing auto-scaling solutions. In this paper, we aim to address these issues using a holistic biologically-inspired feedback switch controller. This method utilises multiple controllers and a switching mechanism, implemented using fuzzy system, that realises the selection of suitable controller at runtime. The fuzzy system also facilitates the design of qualitative elasticity rules. Furthermore, to improve the possibility of avoiding the oscillatory behaviour (a problem commonly associated with switch methodologies), this paper integrates a biologically-inspired computational model of action selection. Lastly, we identify seven different kinds of real workload patterns and utilise them to evaluate the performance of the proposed method against the state-of-the-art approaches. The obtained computational results demonstrate that the proposed method results in achieving better performance without incurring any additional cost in comparison to the state-of-the-art approaches.


Author(s):  
Chaotao Chen ◽  
Jinhua Peng ◽  
Fan Wang ◽  
Jun Xu ◽  
Hua Wu

In human conversation an input post is open to multiple potential responses, which is typically regarded as a one-to-many problem. Promising approaches mainly incorporate multiple latent mechanisms to build the one-to-many relationship. However, without accurate selection of the latent mechanism corresponding to the target response during training, these methods suffer from a rough optimization of latent mechanisms. In this paper, we propose a multi-mapping mechanism to better capture the one-to-many relationship, where multiple mapping modules are employed as latent mechanisms to model the semantic mappings from an input post to its diverse responses. For accurate optimization of latent mechanisms, a posterior mapping selection module is designed to select the corresponding mapping module according to the target response for further optimization. We also introduce an auxiliary matching loss to facilitate the optimization of posterior mapping selection. Empirical results demonstrate the superiority of our model in generating multiple diverse and informative responses over the state-of-the-art methods.


2016 ◽  
Vol 7 (2) ◽  
Author(s):  
Yeny Rochmawati ◽  
Retno Kusumaningrum

Abstract. Error typing resulting in the change of standard words into non-standard words are often caused by misspelling. This can be addressed by developing a system to identify errors in typing. Approximate string matching is one method that is widely implemented to identify error typing by using several string search algorithms, i.e. Levenshtein Distance, Hamming Distance, Damerau Levenshtein Distance and Jaro Winkler Distance. However, there is no study that compares the performance of the four algorithms.  Therefore, this research aims to compare the performance between the four algorithms in order to identify which algorithm is the most accurate and precise in the search string based on various errors typing. Evaluation is performed by using users’ relevance judgments which produce the mean average precision (MAP) to determine the best algorithm. The result shows that Jaro Winkler Distance algorithm is the best in word-checking with 0.87 of MAP value when identifying the typing error of 50 incorrect words.Keywords: Errors typing, Levenshtein, Hamming, Damerau Levenshtein, Jaro Winkler Abstrak. Kesalahan pengetikan mengakibatkan kata baku berubah menjadi kata tidak baku karena ejaan yang digunakan tidak sesuai. Hal tersebut dapat ditangani dengan mengembangkan sistem untuk mengidentifikasi kesalahan pengetikan. Metode approximate string matching merupakan salah satu metode yang banyak diterapkan untuk mengidentifikasi kesalahan pengetikan dengan berbagai jenis algoritma pencarian string yaitu Levenshtein Distance, Hamming Distance, Damerau Levenshtein Distance dan Jaro Winkler Distance. Akan tetapi studi perbandingan kinerja dari keempat algoritma tersebut untuk Bahasa Indonesia belum pernah dilakukan. Oleh karena itu penelitian ini bertujuan untuk melakukan studi perbandingan kinerja dari keempat algoritma tersebut sehingga dapat diketahui algoritma mana yang lebih akurat dan tepat dalam pencarian string berdasarkan kesalahan penulisan yang bervariasi. Evaluasi yang dilakukan menggunakan user relevance judgement yang menghasilkan nilai mean average precision (MAP) untuk menentukan algoritma yang terbaik. Hasil penelitian terhadap 50 kata salah menunjukkan bahwa algoritma Jaro Winkler Distance terbaik dalam melakukan pengecekan kata dengan nilai MAP sebesar 0,87.Kata Kunci: Kesalahan pengetikan, Levenshtein, Hamming, Damerau Levenshtein, Jaro Winkler


2014 ◽  
Vol 513-517 ◽  
pp. 1017-1020
Author(s):  
Bing Liu ◽  
Dan Han ◽  
Shuang Zhang

String matching is one of the most typical problems in computer science. Previous studies mainly focused on accurate string matching problem. However, with the rapid development of the computer and Internet as well as the continuously rising of new issues, people find that it has very important theoretical value and practical meaning to research and design efficient approximate string matching algorithms. Approximate string matching is also called string matching that allows errors, which mainly aims to find the pattern string in the text and database and allows k differences between the pattern string and its occurring forms in the text. For the problem of approximate string matching, though a number of algorithms have been proposed, there are fewer studies which focus on large size of alphabet . Most of experts are interested in small or middle size of alphabet . For large size of , especially for Chinese characters and Asian phonetics, there are fewer efficient algorithms. For the above reasons, this paper focuses on the approximate Chinese strings matching problem based on the pinyin input method.


Sign in / Sign up

Export Citation Format

Share Document