Two Efficient Algorithms for Linear Time Suffix Array Construction

Ge Nong;  Sen Zhang;  Wai Hong Chan

doi:10.1109/tc.2010.188

Designing efficient algorithms for querying large corpora

Oslo Studies in Language ◽

10.5617/osla.8504 ◽

2021 ◽

Vol 11 (2) ◽

pp. 283-302

Author(s):

Paul Meurer

Keyword(s):

Regular Expression ◽

Linear Time ◽

Suffix Array ◽

Efficient Algorithms ◽

Regular Expressions ◽

Efficient Treatment ◽

Suffix Arrays ◽

Regular Expression Matching ◽

Finite State ◽

Query System

I describe several new efficient algorithms for querying large annotated corpora. The search algorithms as they are implemented in several popular corpus search engines are less than optimal in two respects: regular expression string matching in the lexicon is done in linear time, and regular expressions over corpus positions are evaluated starting in those corpus positions that match the constraints of the initial edges of the corresponding network. To address these shortcomings, I have developed an algorithm for regular expression matching on suffix arrays that allows fast lexicon lookup, and a technique for running finite state automata from edges with lowest corpus counts. The implementation of the lexicon as suffix array also lends itself to an elegant and efficient treatment of multi-valued and set-valued attributes. The described techniques have been implemented in a fully functional corpus management system and are also used in a treebank query system.

Download Full-text

Diagnosability and Diagnostic Algorithm for Pancake Graph under the Comparison Model

Journal of Interconnection Networks ◽

10.1142/s021926591550005x ◽

2015 ◽

Vol 15 (01n02) ◽

pp. 1550005

Author(s):

WENJUN LIU ◽

CHENG-KUAN LIN

Keyword(s):

Fault Diagnosis ◽

Interconnection Networks ◽

Linear Time ◽

Hamiltonian Path ◽

Diagnostic Algorithm ◽

Efficient Algorithms ◽

Comparison Model ◽

Diagnosis Model

Fault diagnosis is important for the reliability of interconnection networks. This paper addresses the fault diagnosis of n-dimensional pancake graph Pn under the comparison diagnosis model. By the concept of local diagnosability, we first prove that the diagnosabitly of Pn is n − 1, and it has strong local diagnosability property even if there are n − 3 faulty edges. Furthermore, we present efficient algorithms to locate extended star and Hamiltonian path structures in Pn, respectively. According to the works of Li et al. and Lai, the extended star and Hamiltonian path structures can be used to identify all faulty vertices in linear time, provided the number of faulty vertices is no more than n − 1.

Download Full-text

EFFICIENT PATH-CONSISTENCY PROPAGATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213098000081 ◽

1998 ◽

Vol 07 (02) ◽

pp. 121-142 ◽

Cited By ~ 15

Author(s):

ASSEF CHMEISS ◽

PHILIPPE JEGOU

Keyword(s):

Linear Time ◽

Natural Generalization ◽

Efficient Algorithms ◽

New Approach ◽

Constraint Networks ◽

Arc Consistency ◽

Linear Time Algorithms

Recently, efficient algorithms have been proposed to achieve arc- and path-consistencey in constraint networks. For example, for arc-consistency, there are linear time algorithms (in the size of the problem) which are efficient in practice (e.g. AC-6 and AC-7). The best path-consistency algorithm proposed is PC-{5|6} which is a natural generalization of AC-6 to path-consistency. While its theoretical complexity is the best, experimentations show clearly that it is not very efficient in practice. In this paper, we propose two algorithms, one for arc-consistency, AC-8, and the second for path-consistency, PC-8. These algorithms are based on the same principle: to exploit minimal supports as AC-6 and PC-{5|6} do, but without recording them. While for AC-8, this approach is of limited interest, we show that for path-consistency, this new approach allows to outperform significantly existing algorithms.

Download Full-text

EFFICIENT ALGORITHMS FOR TRAVELLING SALESMAN PROBLEMS ARISING IN WAREHOUSE ORDER PICKING

The ANZIAM Journal ◽

10.1017/s1446181115000140 ◽

2015 ◽

Vol 57 (2) ◽

pp. 166-174 ◽

Cited By ~ 2

Author(s):

H. CHARKHGARD ◽

M. SAVELSBERGH

Keyword(s):

Spanning Tree ◽

Minimum Spanning Tree ◽

Linear Time ◽

Optimal Solution ◽

Efficient Algorithms ◽

Order Picking ◽

Travelling Salesman ◽

Routing Problems ◽

Parallel Lines

We investigate two routing problems that arise when order pickers traverse an aisle in a warehouse. The routing problems can be viewed as Euclidean travelling salesman problems with points on two parallel lines. We show that if the order picker traverses only a section of the aisle and then returns, then an optimal solution can be found in linear time, and if the order picker traverses the entire aisle, then an optimal solution can be found in quadratic time. Moreover, we show how to approximate the routing cost in linear time by computing a minimum spanning tree for the points on the parallel lines.

Download Full-text

Linear-time computation of minimal absent words using suffix array

BMC Bioinformatics ◽

10.1186/s12859-014-0388-9 ◽

2014 ◽

Vol 15 (1) ◽

Cited By ~ 20

Author(s):

Carl Barton ◽

Alice Heliou ◽

Laurent Mouchard ◽

Solon P Pissis

Keyword(s):

Linear Time ◽

Suffix Array ◽

Absent Words

Download Full-text

RECONSTRUCTING A SUFFIX ARRAY

International Journal of Foundations of Computer Science ◽

10.1142/s0129054106004418 ◽

2006 ◽

Vol 17 (06) ◽

pp. 1281-1295 ◽

Cited By ~ 7

Author(s):

FRANTISEK FRANEK ◽

WILLIAM F. SMYTH

Keyword(s):

Data Compression ◽

Data Structures ◽

Suffix Tree ◽

Suffix Array ◽

Efficient Algorithms ◽

Lexicographical Order ◽

Time And Space

For certain problems (for example, computing repetitions and repeats, data compression applications) it is not necessary that the suffixes of a string represented in a suffix tree or suffix array should occur in lexicographical order (lexorder). It thus becomes of interest to study possible alternate orderings of the suffixes in these data structures, that may be easier to construct or more efficient to use. In this paper we consider the "reconstruction" of a suffix array based on a given reordering of the alphabet, and we describe simple time- and space-efficient algorithms that accomplish it.

Download Full-text

Linear Time Suffix Array Construction Using D-Critical Substrings

Combinatorial Pattern Matching - Lecture Notes in Computer Science ◽

10.1007/978-3-642-02441-2_6 ◽

2009 ◽

pp. 54-67 ◽

Cited By ~ 6

Author(s):

Ge Nong ◽

Sen Zhang ◽

Wai Hong Chan

Keyword(s):

Linear Time ◽

Suffix Array

Download Full-text

Space-efficient algorithms for computing the convex hull of a simple polygonal line in linear time

Computational Geometry ◽

10.1016/j.comgeo.2005.11.005 ◽

2006 ◽

Vol 34 (2) ◽

pp. 75-82 ◽

Cited By ~ 16

Author(s):

Hervé Brönnimann ◽

Timothy M. Chan

Keyword(s):

Convex Hull ◽

Linear Time ◽

Polygonal Line ◽

Efficient Algorithms

Download Full-text

THE VIRTUAL SUFFIX TREE

International Journal of Foundations of Computer Science ◽

10.1142/s0129054109007066 ◽

2009 ◽

Vol 20 (06) ◽

pp. 1109-1133 ◽

Cited By ~ 2

Author(s):

JIE LIN ◽

YUE JIANG ◽

DON ADJEROH

Keyword(s):

Suffix Tree ◽

Linear Time ◽

Suffix Array ◽

Intermediate Step ◽

Suffix Trees ◽

String Length ◽

Space Requirement ◽

Suffix Arrays ◽

Tree Construction ◽

Efficient Data

We introduce the VST (virtual suffix tree), an efficient data structure for suffix trees and suffix arrays. Starting from the suffix array, we construct the suffix tree, from which we derive the virtual suffix tree. Later, we remove the intermediate step of suffix tree construction, and build the VST directly from the suffix array. The VST provides the same functionality as the suffix tree, including suffix links, but at a much smaller space requirement. It has the same linear time construction even for large alphabets, Σ, requires O(n) space to store (n is the string length), and allows searching for a pattern of length m to be performed in O(m log |Σ|) time, the same time needed for a suffix tree. Given the VST, we show an algorithm that computes all the suffix links in linear time, independent of Σ. The VST requires less space than other recently proposed data structures for suffix trees and suffix arrays, such as the enhanced suffix array [1], and the linearized suffix tree [17]. On average, the space requirement (including that for suffix arrays and suffix links) is 13.8n bytes for the regular VST, and 12.05n bytes in its compact form.

Download Full-text

Space-Efficient Algorithms for Computing the Convex Hull of a Simple Polygonal Line in Linear Time

LATIN 2004: Theoretical Informatics - Lecture Notes in Computer Science ◽

10.1007/978-3-540-24698-5_20 ◽

2004 ◽

pp. 162-171 ◽

Cited By ~ 1

Author(s):

Hervé Brönnimann ◽

Timothy M. Chan

Keyword(s):

Convex Hull ◽

Linear Time ◽

Polygonal Line ◽

Efficient Algorithms

Download Full-text