scholarly journals Sliding Suffix Tree

Algorithms ◽  
2018 ◽  
Vol 11 (8) ◽  
pp. 118 ◽  
Author(s):  
Andrej Brodnik ◽  
Matevž Jekovec

We consider a sliding window W over a stream of characters from some alphabet of constant size. We want to look up a pattern in the current sliding window content and obtain all positions of the matches. We present an indexed version of the sliding window, based on a suffix tree. The data structure of size Θ(|W|) has optimal time queries Θ(m+occ) and amortized constant time updates, where m is the length of the query string and occ is its number of occurrences.

2018 ◽  
Author(s):  
Felipe A. Louza ◽  
Guilherme P. Telles ◽  
Simon Gog

Strings are prevalent in Computer Science and algorithms for their efficient processing are fundamental in various applications. The results introduced in this work contribute with theoretical improvements and practical advances in building full-text indexes. Our first contribution is an in-place algorithm that computes the Burrows-Wheeler transform and the longest common prefix (LCP) array. Our second contribution is the construction of the suffix array augmented with the LCP array in optimal time and space for strings from constant size alphabets. Our third contribution is a set of algorithms to construct full-text indexes for string collections in optimal theoretical bounds. This work is an extended abstract of the Ph.D. thesis of the first author.


2015 ◽  
Vol 733 ◽  
pp. 867-870
Author(s):  
Zhen Zhong Jin ◽  
Zheng Huang ◽  
Hua Zhang

The suffix tree is a useful data structure constructed for indexing strings. However, when it comes to large datasets of discrete contents, most existing algorithms become very inefficient. Discrete datasets are need to be indexed in many fields like record analysis, data analyze in sensor network, association analysis etc. This paper presents an algorithm, STD, which stands for Suffix Tree for Discrete contents, that performs very efficiently with discrete input datasets. It imports several wonderful intermediate data structures for discrete strings; we also take care of the situation that the discrete input strings have similar characteristics. Moreover, STD keeps the advantages of existing implementations which are for successive input strings. Experiments were taken to evaluate the performance and shown that the method works well.


1999 ◽  
Vol 09 (01) ◽  
pp. 29-38
Author(s):  
MIKHAIL J. ATALLAH

We give a result that implies an improvement by a factor of log log  n in the hypercube bounds for the geometric problems of batched planar point location, trapezoidal decomposition, and polygon triangulation. The improvements are achieved through a better solution to the multisearch problem on a hypercube, a parallel search problem where the elements in the data structure S to be searched are totally ordered, but where it is not possible to compare in constant time any two given queries q and q′. Whereas the previous best solution to this problem took O( log  n( log log  n)3) time on an n-processor hypercube, the solution given here takes O( log  n( log log  n)2) time on an n-processor hypercube. The hypercube model for which we claim our bounds is the standard one, SIMD, with O(1) memory registers per processor, and with one-port communication. Each register can store O( log  n) bits, so that a processor knows its ID.


Author(s):  
Pooya Davoodi ◽  
Gonzalo Navarro ◽  
Rajeev Raman ◽  
S. Srinivasa Rao

We consider the problem of encoding range minimum queries (RMQs): given an array A [1.. n ] of distinct totally ordered values, to pre-process A and create a data structure that can answer the query RMQ( i , j ), which returns the index containing the smallest element in A [ i .. j ], without access to the array A at query time. We give a data structure whose space usage is 2 n + o ( n ) bits, which is asymptotically optimal for worst-case data, and answers RMQs in O (1) worst-case time. This matches the previous result of Fischer and Heun, but is obtained in a more natural way. Furthermore, our result can encode the RMQs of a random array A in 1.919 n + o ( n ) bits in expectation, which is not known to hold for Fischer and Heun’s result. We then generalize our result to the encoding range top-2 query (RT2Q) problem, which is like the encoding RMQ problem except that the query RT2Q( i , j ) returns the indices of both the smallest and second smallest elements of A [ i .. j ]. We introduce a data structure using 3.272 n + o ( n ) bits that answers RT2Qs in constant time, and also give lower bounds on the effective entropy of the RT2Q problem.


Author(s):  
Federico D’Ambrosio ◽  
Hans L. Bodlaender ◽  
Gerard T. Barkema

AbstractIn this paper, we consider several efficient data structures for the problem of sampling from a dynamically changing discrete probability distribution, where some prior information is known on the distribution of the rates, in particular the maximum and minimum rate, and where the number of possible outcomes N is large. We consider three basic data structures, the Acceptance–Rejection method, the Complete Binary Tree and the Alias method. These can be used as building blocks in a multi-level data structure, where at each of the levels, one of the basic data structures can be used, with the top level selecting a group of events, and the bottom level selecting an element from a group. Depending on assumptions on the distribution of the rates of outcomes, different combinations of the basic structures can be used. We prove that for particular data structures the expected time of sampling and update is constant when the rate distribution follows certain conditions. We show that for any distribution, combining a tree structure with the Acceptance–Rejection method, we have an expected time of sampling and update of $$O\left( \log \log {r_{max}}/{r_{min}}\right) $$ O log log r max / r min is possible, where $$r_{max}$$ r max is the maximum rate and $$r_{min}$$ r min the minimum rate. We also discuss an implementation of a Two Levels Acceptance–Rejection data structure, that allows expected constant time for sampling, and amortized constant time for updates, assuming that $$r_{max}$$ r max and $$r_{min}$$ r min are known and the number of events is sufficiently large. We also present an experimental verification, highlighting the limits given by the constraints of a real-life setting.


Algorithms ◽  
2019 ◽  
Vol 12 (5) ◽  
pp. 91
Author(s):  
Giuseppe Lancia ◽  
Marcello Dalpasso

We describe a simple data structure for storing subsets of { 0 , … , N − 1 } , with N a given integer, which has optimal time performance for all the main set operations, whereas previous data structures are non-optimal for at least one such operation. We report on the comparison of a Java implementation of our structure with other structures of the standard Java Collections.


Sign in / Sign up

Export Citation Format

Share Document