Sliding Suffix Tree

Andrej Brodnik; Matevž Jekovec

doi:10.3390/a11080118

Sliding Suffix Tree

Algorithms ◽

10.3390/a11080118 ◽

2018 ◽

Vol 11 (8) ◽

pp. 118 ◽

Cited By ~ 1

Author(s):

Andrej Brodnik ◽

Matevž Jekovec

Keyword(s):

Data Structure ◽

Suffix Tree ◽

Sliding Window ◽

Constant Time ◽

Optimal Time ◽

Constant Size ◽

Query String

We consider a sliding window W over a stream of characters from some alphabet of constant size. We want to look up a pattern in the current sliding window content and obtain all positions of the matches. We present an indexed version of the sliding window, based on a suffix tree. The data structure of size Θ(|W|) has optimal time queries Θ(m+occ) and amortized constant time updates, where m is the length of the query string and occ is its number of occurrences.

Download Full-text

Engineering Augmented Suffix Sorting Algorithms

10.5753/ctd.2018.3652 ◽

2018 ◽

Author(s):

Felipe A. Louza ◽

Guilherme P. Telles ◽

Simon Gog

Keyword(s):

Computer Science ◽

Full Text ◽

Suffix Array ◽

Optimal Time ◽

Time And Space ◽

Sorting Algorithms ◽

Constant Size ◽

Common Prefix ◽

Efficient Processing ◽

Burrows Wheeler Transform

Strings are prevalent in Computer Science and algorithms for their efficient processing are fundamental in various applications. The results introduced in this work contribute with theoretical improvements and practical advances in building full-text indexes. Our first contribution is an in-place algorithm that computes the Burrows-Wheeler transform and the longest common prefix (LCP) array. Our second contribution is the construction of the suffix array augmented with the LCP array in optimal time and space for strings from constant size alphabets. Our third contribution is a set of algorithms to construct full-text indexes for string collections in optimal theoretical bounds. This work is an extended abstract of the Ph.D. thesis of the first author.

Download Full-text

GeoTree: A Data Structure for Constant Time Geospatial Search Enabling a Real-Time Property Index

Lecture Notes in Networks and Systems - Intelligent Computing ◽

10.1007/978-3-030-80126-7_12 ◽

2021 ◽

pp. 152-165

Author(s):

Robert Miller ◽

Phil Maguire

Keyword(s):

Data Structure ◽

Real Time ◽

Constant Time ◽

Time Property ◽

Property Index

Download Full-text

Suffix Tree Constructing Algorithm for Datasets with Discrete Contents

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.733.867 ◽

2015 ◽

Vol 733 ◽

pp. 867-870

Author(s):

Zhen Zhong Jin ◽

Zheng Huang ◽

Hua Zhang

Keyword(s):

Data Structure ◽

Sensor Network ◽

Association Analysis ◽

Data Structures ◽

Suffix Tree ◽

Analysis Data ◽

Large Datasets ◽

Intermediate Data ◽

Input Strings

The suffix tree is a useful data structure constructed for indexing strings. However, when it comes to large datasets of discrete contents, most existing algorithms become very inefficient. Discrete datasets are need to be indexed in many fields like record analysis, data analyze in sensor network, association analysis etc. This paper presents an algorithm, STD, which stands for Suffix Tree for Discrete contents, that performs very efficiently with discrete input datasets. It imports several wonderful intermediate data structures for discrete strings; we also take care of the situation that the discrete input strings have similar characteristics. Moreover, STD keeps the advantages of existing implementations which are for successive input strings. Experiments were taken to evaluate the performance and shown that the method works well.

Download Full-text

AN IMPROVED HYPERCUBE BOUND FOR MULTISEARCHING AND ITS APPLICATIONS

International Journal of Computational Geometry & Applications ◽

10.1142/s0218195999000030 ◽

1999 ◽

Vol 09 (01) ◽

pp. 29-38

Author(s):

MIKHAIL J. ATALLAH

Keyword(s):

Data Structure ◽

Constant Time ◽

Search Problem ◽

Parallel Search ◽

Point Location ◽

Planar Point ◽

Geometric Problems ◽

Trapezoidal Decomposition ◽

Planar Point Location ◽

Hypercube Model

We give a result that implies an improvement by a factor of log log n in the hypercube bounds for the geometric problems of batched planar point location, trapezoidal decomposition, and polygon triangulation. The improvements are achieved through a better solution to the multisearch problem on a hypercube, a parallel search problem where the elements in the data structure S to be searched are totally ordered, but where it is not possible to compare in constant time any two given queries q and q′. Whereas the previous best solution to this problem took O( log n( log log n)3) time on an n-processor hypercube, the solution given here takes O( log n( log log n)2) time on an n-processor hypercube. The hypercube model for which we claim our bounds is the standard one, SIMD, with O(1) memory registers per processor, and with one-port communication. Each register can store O( log n) bits, so that a processor knows its ID.

Download Full-text

A New Keyphrases Extraction Method Based on Suffix Tree Data Structure for Arabic Documents Clustering

International Journal of Database Management Systems ◽

10.5121/ijdms.2013.5602 ◽

2013 ◽

Vol 5 (6) ◽

pp. 17-33 ◽

Cited By ~ 5

Author(s):

Issam SAHMOUDI ◽

Hanane FROUD ◽

Abdelmonaime LACHKAR

Keyword(s):

Data Structure ◽

Extraction Method ◽

Suffix Tree ◽

Tree Data ◽

Tree Data Structure

Download Full-text

Encoding range minima and range top-2 queries

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2013.0131 ◽

2014 ◽

Vol 372 (2016) ◽

pp. 20130131 ◽

Cited By ~ 9

Author(s):

Pooya Davoodi ◽

Gonzalo Navarro ◽

Rajeev Raman ◽

S. Srinivasa Rao

Keyword(s):

Data Structure ◽

Lower Bounds ◽

Query Time ◽

Constant Time ◽

Worst Case ◽

Asymptotically Optimal ◽

Random Array ◽

Case Data ◽

Natural Way

We consider the problem of encoding range minimum queries (RMQs): given an array A [1.. n ] of distinct totally ordered values, to pre-process A and create a data structure that can answer the query RMQ( i , j ), which returns the index containing the smallest element in A [ i .. j ], without access to the array A at query time. We give a data structure whose space usage is 2 n + o ( n ) bits, which is asymptotically optimal for worst-case data, and answers RMQs in O (1) worst-case time. This matches the previous result of Fischer and Heun, but is obtained in a more natural way. Furthermore, our result can encode the RMQs of a random array A in 1.919 n + o ( n ) bits in expectation, which is not known to hold for Fischer and Heun’s result. We then generalize our result to the encoding range top-2 query (RT2Q) problem, which is like the encoding RMQ problem except that the query RT2Q( i , j ) returns the indices of both the smallest and second smallest elements of A [ i .. j ]. We introduce a data structure using 3.272 n + o ( n ) bits that answers RT2Qs in constant time, and also give lower bounds on the effective entropy of the RT2Q problem.

Download Full-text

Dynamic sampling from a discrete probability distribution with a known distribution of rates

Computational Statistics ◽

10.1007/s00180-021-01159-3 ◽

2021 ◽

Author(s):

Federico D’Ambrosio ◽

Hans L. Bodlaender ◽

Gerard T. Barkema

Keyword(s):

Data Structure ◽

Probability Distribution ◽

Data Structures ◽

Basic Data ◽

Constant Time ◽

Discrete Probability ◽

Minimum Rate ◽

Discrete Probability Distribution ◽

Expected Time ◽

Rejection Method

AbstractIn this paper, we consider several efficient data structures for the problem of sampling from a dynamically changing discrete probability distribution, where some prior information is known on the distribution of the rates, in particular the maximum and minimum rate, and where the number of possible outcomes N is large. We consider three basic data structures, the Acceptance–Rejection method, the Complete Binary Tree and the Alias method. These can be used as building blocks in a multi-level data structure, where at each of the levels, one of the basic data structures can be used, with the top level selecting a group of events, and the bottom level selecting an element from a group. Depending on assumptions on the distribution of the rates of outcomes, different combinations of the basic structures can be used. We prove that for particular data structures the expected time of sampling and update is constant when the rate distribution follows certain conditions. We show that for any distribution, combining a tree structure with the Acceptance–Rejection method, we have an expected time of sampling and update of $$O\left( \log \log {r_{max}}/{r_{min}}\right) $$ O log log r max / r min is possible, where $$r_{max}$$ r max is the maximum rate and $$r_{min}$$ r min the minimum rate. We also discuss an implementation of a Two Levels Acceptance–Rejection data structure, that allows expected constant time for sampling, and amortized constant time for updates, assuming that $$r_{max}$$ r max and $$r_{min}$$ r min are known and the number of events is sufficiently large. We also present an experimental verification, highlighting the limits given by the constraints of a real-life setting.

Download Full-text

FASTSET: A Fast Data Structure for the Representation of Sets of Integers

Algorithms ◽

10.3390/a12050091 ◽

2019 ◽

Vol 12 (5) ◽

pp. 91

Author(s):

Giuseppe Lancia ◽

Marcello Dalpasso

Keyword(s):

Data Structure ◽

Data Structures ◽

Optimal Time ◽

Time Performance ◽

Set Operations

We describe a simple data structure for storing subsets of { 0 , … , N − 1 } , with N a given integer, which has optimal time performance for all the main set operations, whereas previous data structures are non-optimal for at least one such operation. We report on the comparison of a Java implementation of our structure with other structures of the standard Java Collections.

Download Full-text