inverted index compression
Recently Published Documents


TOTAL DOCUMENTS

19
(FIVE YEARS 1)

H-INDEX

6
(FIVE YEARS 0)

Entropy ◽  
2021 ◽  
Vol 23 (3) ◽  
pp. 296
Author(s):  
Andrzej Chmielowiec ◽  
Paweł Litwin

This article deals with compression of binary sequences with a given number of ones, which can also be considered as a list of indexes of a given length. The first part of the article shows that the entropy H of random n-element binary sequences with exactly k elements equal one satisfies the inequalities klog2(0.48·n/k)<H<klog2(2.72·n/k). Based on this result, we propose a simple coding using fixed length words. Its main application is the compression of random binary sequences with a large disproportion between the number of zeros and the number of ones. Importantly, the proposed solution allows for a much faster decompression compared with the Golomb-Rice coding with a relatively small decrease in the efficiency of compression. The proposed algorithm can be particularly useful for database applications for which the speed of decompression is much more important than the degree of index list compression.


2020 ◽  
Vol 53 (6) ◽  
pp. 1-36
Author(s):  
Giulio Ermanno Pibiri ◽  
Rossano Venturini

Author(s):  
Guiduo Duan ◽  
Xiaotong Wang ◽  
Tianxi Huang ◽  
Jürgen Kurths

Association rule (AR) mining in complex scene has attracted extensive attention of researchers in recent years. Typically, many researchers focused on an algorithm itself and ignored a generalization method to improve the performance of AR mining. Tuna et al., presented a general data structure Speeding-Up AR Structure with Inverted Index Compression (SAII) which could be utilized in most of the existing algorithms to improve their performance IEEE Trans. Cybern. 46(12) (2016) 3059–3072. However, we found that this algorithm consumes a lot of time in re-ordering data because a one-to-one comparison method is used in this process, which is the main reason that the speeding-up structure is difficult to establish when coping with much more large amount of data. To overcome these problems, this paper aims to propose an improved speeding-up AR algorithm based on group similarity and Apache Spark framework to further reduce the memory requirements and runtime. Our simulation results on the police business big dataset make clear that our improved approach performs well and is more suitable for a big data environment.


2019 ◽  
Vol 13 (2) ◽  
pp. 343-356 ◽  
Author(s):  
Xingshen Song ◽  
Yuexiang Yang ◽  
Yu Jiang ◽  
Kun Jiang

Author(s):  
Giulio Ermanno Pibiri ◽  
Rossano Venturini

Author(s):  
V. Glory ◽  
S. Domnic

Inverted index is used in most Information Retrieval Systems (IRS) to achieve the fast query response time. In inverted index, compression schemes are used to improve the efficiency of IRS. In this chapter, the authors study and analyze various compression techniques that are used for indexing. They also present a new compression technique that is based on FastPFOR called New FastPFOR. The storage structure and the integers' representation of the proposed method can improve its performances both in compression and decompression. The study on existing works shows that the recent research works provide good results either in compression or in decoding, but not in both. Hence, their decompression performance is not fair. To achieve better performance in decompression, the authors propose New FastPFOR in this chapter. To evaluate the performance of the proposed method, they experiment with TREC collections. The results show that the proposed method could achieve better decompression performance than the existing techniques.


Author(s):  
Giulio Ermanno Pibiri ◽  
Rossano Venturini

2016 ◽  
Vol 46 (12) ◽  
pp. 3059-3072 ◽  
Author(s):  
Jose Maria Luna ◽  
Alberto Cano ◽  
Mykola Pechenizkiy ◽  
Sebastian Ventura

Sign in / Sign up

Export Citation Format

Share Document