similarity join Latest Research Papers

Top-k Tree Similarity Join

10.1145/3459637.3482304 ◽

2021 ◽

Author(s):

Jianhua Wang ◽

Jianye Yang ◽

Wenjie Zhang

Keyword(s):

Similarity Join ◽

Tree Similarity

Download Full-text

Efficient Set Similarity Join on Multi-Attribute Data Using Lightweight Filters

Journal of Information and Data Management ◽

10.5753/jidm.2021.1969 ◽

2021 ◽

Vol 12 (3) ◽

Author(s):

Leonardo Andrade Ribeiro ◽

Felipe Ferreira Borges ◽

Diego Oliveira

Keyword(s):

Data Structure ◽

Processing Time ◽

Cost Model ◽

Similarity Join ◽

Attribute Data ◽

Join Algorithms ◽

Filtering Technique ◽

Alternative Approaches ◽

Similarity Joins ◽

Single Set

We consider the problem of efficiently answering set similarity joins on multi-attribute data. Traditional set similarity join algorithms assume string data represented by a single set and, thus, miss the opportunity to exploit predicates over multiple attributes to reduce the number of similarity computations. In this article, we present a framework to enhance existing algorithms with additional filters for dealing with multi-attribute data. We then instantiate this framework with a lightweight filtering technique based on a simple, yet effective data structure, for which exact and probabilistic implementations are evaluated. In this context, we devise a cost model to identify the best attribute ordering to reduce processing time. Moreover, alternative approaches are also investigated and a new algorithm combining key ideas from previous work is introduced. Finally, we present a thorough experimental evaluation, which demonstrates that our main proposal is efficient and significantly outperforms competing algorithms.

Download Full-text

Accelerating Progressive Set Similarity Join with the CPU-GPU Architecture

Big Data Research ◽

10.1016/j.bdr.2021.100267 ◽

2021 ◽

pp. 100267

Author(s):

Lining Yu ◽

Tiezheng Nie ◽

Derong Shen ◽

Yue Kou

Keyword(s):

Similarity Join ◽

Gpu Architecture

Download Full-text

Sub-trajectory Similarity Join with Obfuscation

33rd International Conference on Scientific and Statistical Database Management ◽

10.1145/3468791.3468822 ◽

2021 ◽

Author(s):

Yanchuan Chang ◽

Jianzhong Qi ◽

Egemen Tanin ◽

Xingjun Ma ◽

Hanan Samet

Keyword(s):

Similarity Join ◽

Trajectory Similarity

Download Full-text

PPIS-JOIN: A Novel Privacy-Preserving Image Similarity Join Method

Neural Processing Letters ◽

10.1007/s11063-021-10537-3 ◽

2021 ◽

Author(s):

Chengyuan Zhang ◽

Fangxin Xie ◽

Hao Yu ◽

Jianfeng Zhang ◽

Lei Zhu ◽

...

Keyword(s):

Privacy Preserving ◽

Image Similarity ◽

Similarity Join

Download Full-text

Efficient Spatio-Textual Similarity Join Processing on NUMA Systems

2021 22nd IEEE International Conference on Mobile Data Management (MDM) ◽

10.1109/mdm52706.2021.00022 ◽

2021 ◽

Author(s):

Saransh Gautam ◽

Suprio Ray ◽

Bradford G. Nickerson

Keyword(s):

Similarity Join

Download Full-text

HySet: A hybrid framework for exact set similarity join using a GPU

Parallel Computing ◽

10.1016/j.parco.2021.102790 ◽

2021 ◽

pp. 102790

Author(s):

Christos Bellas ◽

Anastasios Gounaris

Keyword(s):

Similarity Join ◽

Hybrid Framework

Download Full-text

Dynamic Set Similarity Join: An Update Log based Approach

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2021.3126631 ◽

2021 ◽

pp. 1-1

Author(s):

Chengcheng Yang ◽

Lisi Chen ◽

Hao Wang ◽

Shuo Shang ◽

Rui Mao ◽

...

Keyword(s):

Similarity Join

Download Full-text

NUMA-Aware Spatio-Textual Similarity Join

Proceedings of the 28th International Conference on Advances in Geographic Information Systems ◽

10.1145/3397536.3422227 ◽

2020 ◽

Author(s):

Saransh Gautam ◽

Suprio Ray ◽

Bradford G. Nickerson

Keyword(s):

Similarity Join

Download Full-text

Heterogeneous CPU-GPU Epsilon Grid Joins: Static and Dynamic Work Partitioning Strategies

Data Science and Engineering ◽

10.1007/s41019-020-00145-x ◽

2020 ◽

Author(s):

Benoit Gallet ◽

Michael Gowanlock

Keyword(s):

Execution Time ◽

Performance Metrics ◽

Performance Model ◽

Similarity Join ◽

Load Imbalance ◽

Test Platform ◽

Static Partitioning ◽

Different Characteristics ◽

Static And Dynamic Work ◽

Partitioning Strategy

Abstract Given two datasets (or tables) A and B and a search distance $$\epsilon$$ ϵ , the distance similarity join, denoted as $$A \ltimes _\epsilon B$$ A ⋉ ϵ B , finds the pairs of points ($$p_a$$ p a , $$p_b$$ p b ), where $$p_a \in A$$ p a ∈ A and $$p_b \in B$$ p b ∈ B , and such that the distance between $$p_a$$ p a and $$p_b$$ p b is $$\le \epsilon$$ ≤ ϵ . If $$A = B$$ A = B , then the similarity join is equivalent to a similarity self-join, denoted as $$A \bowtie _\epsilon A$$ A ⋈ ϵ A . We propose in this paper Heterogeneous Epsilon Grid Joins (HEGJoin), a heterogeneous CPU-GPU distance similarity join algorithm. Efficiently partitioning the work between the CPU and the GPU is a challenge. Indeed, the work partitioning strategy needs to consider the different characteristics and computational throughput of the processors (CPU and GPU), as well as the data-dependent nature of the similarity join that accounts in the overall execution time (e.g., the number of queries, their distribution, the dimensionality, etc.). In addition to HEGJoin, we design in this paper a dynamic and two static work partitioning strategies. We also propose a performance model for each static partitioning strategy to perform the distribution of the work between the processors. We evaluate the performance of all three partitioning methods by considering the execution time and the load imbalance between the CPU and GPU as performance metrics. HEGJoin achieves a speedup of up to $$5.46\times$$ 5.46 × ($$3.97\times$$ 3.97 × ) over the GPU-only (CPU-only) algorithms on our first test platform and up to $$1.97\times$$ 1.97 × ($$12.07\times$$ 12.07 × ) on our second test platform over the GPU-only (CPU-only) algorithms.

Download Full-text

similarity join
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Top-k Tree Similarity Join

Efficient Set Similarity Join on Multi-Attribute Data Using Lightweight Filters

Accelerating Progressive Set Similarity Join with the CPU-GPU Architecture

Sub-trajectory Similarity Join with Obfuscation

PPIS-JOIN: A Novel Privacy-Preserving Image Similarity Join Method

Efficient Spatio-Textual Similarity Join Processing on NUMA Systems

HySet: A hybrid framework for exact set similarity join using a GPU

Dynamic Set Similarity Join: An Update Log based Approach

NUMA-Aware Spatio-Textual Similarity Join

Heterogeneous CPU-GPU Epsilon Grid Joins: Static and Dynamic Work Partitioning Strategies

Export Citation Format

similarity joinRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Top-k Tree Similarity Join

Efficient Set Similarity Join on Multi-Attribute Data Using Lightweight Filters

Accelerating Progressive Set Similarity Join with the CPU-GPU Architecture

Sub-trajectory Similarity Join with Obfuscation

PPIS-JOIN: A Novel Privacy-Preserving Image Similarity Join Method

Efficient Spatio-Textual Similarity Join Processing on NUMA Systems

HySet: A hybrid framework for exact set similarity join using a GPU

Dynamic Set Similarity Join: An Update Log based Approach

NUMA-Aware Spatio-Textual Similarity Join

Heterogeneous CPU-GPU Epsilon Grid Joins: Static and Dynamic Work Partitioning Strategies

similarity join
Recently Published Documents