A Composite Boyer-Moore Algorithm for the String Matching Problem

Author(s):  
Zhengda Xiong
Author(s):  
Yangjun Chen

In computer engineering, a number of programming tasks involve a special problem, the so-called tree matching problem (Cole & Hariharan, 1997), as a crucial step, such as the design of interpreters for nonprocedural programming languages, automatic implementation of abstract data types, code optimization in compilers, symbolic computation, context searching in structure editors and automatic theorem proving. Recently, it has been shown that this problem can be transformed in linear time to another problem, the so called subset matching problem (Cole & Hariharan, 2002, 2003), which is to find all occurrences of a pattern string p of length m in a text string t of length n, where each pattern and text position is a set of characters drawn from some alphabet S. The pattern is said to occur at text position i if the set p[j] is a subset of the set t[i + j - 1], for all j (1 = j = m). This is a generalization of the ordinary string matching and is of interest since an efficient algorithm for this problem implies an efficient solution to the tree matching problem. In addition, as shown in (Indyk, 1997), this problem can also be used to solve general string matching and counting matching (Muthukrishan, 1997; Muthukrishan & Palem, 1994), and enables us to design efficient algorithms for several geometric pattern matching problems. In this article, we propose a new algorithm on this issue, which needs only O(n + m) time in the case that the size of S is small and O(n + m·n0.5) time on average in general cases.


1995 ◽  
Vol 2 (46) ◽  
Author(s):  
Dany Breslauer ◽  
Livio Colussi ◽  
Laura Toniolo

In this paper we study the exact comparison complexity of the string<br />prefix-matching problem in the deterministic sequential comparison model<br />with equality tests. We derive almost tight lower and upper bounds on<br />the number of symbol comparisons required in the worst case by on-line<br />prefix-matching algorithms for any fixed pattern and variable text. Unlike<br />previous results on the comparison complexity of string-matching and<br />prefix-matching algorithms, our bounds are almost tight for any particular pattern.<br />We also consider the special case where the pattern and the text are the<br />same string. This problem, which we call the string self-prefix problem, is<br />similar to the pattern preprocessing step of the Knuth-Morris-Pratt string-matching<br />algorithm that is used in several comparison efficient string-matching<br />and prefix-matching algorithms, including in our new algorithm.<br />We obtain roughly tight lower and upper bounds on the number of symbol<br />comparisons required in the worst case by on-line self-prefix algorithms.<br />Our algorithms can be implemented in linear time and space in the<br />standard uniform-cost random-access-machine model.


2014 ◽  
Vol 513-517 ◽  
pp. 1017-1020
Author(s):  
Bing Liu ◽  
Dan Han ◽  
Shuang Zhang

String matching is one of the most typical problems in computer science. Previous studies mainly focused on accurate string matching problem. However, with the rapid development of the computer and Internet as well as the continuously rising of new issues, people find that it has very important theoretical value and practical meaning to research and design efficient approximate string matching algorithms. Approximate string matching is also called string matching that allows errors, which mainly aims to find the pattern string in the text and database and allows k differences between the pattern string and its occurring forms in the text. For the problem of approximate string matching, though a number of algorithms have been proposed, there are fewer studies which focus on large size of alphabet . Most of experts are interested in small or middle size of alphabet . For large size of , especially for Chinese characters and Asian phonetics, there are fewer efficient algorithms. For the above reasons, this paper focuses on the approximate Chinese strings matching problem based on the pinyin input method.


2013 ◽  
Vol 45 (2) ◽  
pp. 1-42 ◽  
Author(s):  
Simone Faro ◽  
Thierry Lecroq

2020 ◽  
Vol 32 (2) ◽  
pp. 135-148
Author(s):  
Yuliya Alekseevna Susanina ◽  
Anna Nikitichna Yaveyn ◽  
Semyon Vyacheslavovich Grigorev

Author(s):  
A. Amir ◽  
M. Farach

String matching is a basic theoretical problem in computer science, but has been useful in implementating various text editing tasks. The explosion of multimedia requires an appropriate generalization of string matching to higher dimensions. The first natural generalization is that of seeking the occurrences of a pattern in a text where both pattern arid text are rectangles. The last few years saw a tremendous activity in two dimensional pattern matching algorithms. We naturally had to limit the amount of information that entered this chapter. We chose to concentrate on serial deterministic algorithms for some of the basic issues of two dimensional matching. Throughout this chapter we define our problems in terms of squares rather than rectangles, however, all results presented easily generalize to rectangles. The Exact Two Dimensional Matching Problem is defined as follows: . . . INPUT: Text array T[n x n] and pattern array P[m x m]. OUTPUT: All locations [i,j] in T where there is an occurrence of P, i.e. T[i+k+,j+l] = P[k+1,l+1] 0 ≤ k, l ≤ n-1. . . . A natural way of solving any generalized problem is by reducing it to a special case whose solution is known. It is therefore not surprising that most solutions to the two dimensional exact matching problem use exact string matching algorithms in one way or another. In this section, we present an algorithm for two dimensional matching which relies on reducing a matrix of characters into a one dimensional array. Let P' [1 . . .m] be a pattern which is derived from P by setting P' [i] = P[i,l]P[i,2]…P[i,m], that is, the ith character of P' is the ith row of P. Let Ti[l . . .n — m + 1], for 1 ≤ i ≤ n, be a set of arrays such that Ti[j] = T[i, j] T [ i , j + 1 ] • • • T[i, j + m-1]. Clearly, P occurs at T[i, j] iff P' occurs at Ti[j].


Sign in / Sign up

Export Citation Format

Share Document