Optimized Distributed Subgraph Matching Algorithm Based on Partition Replication

Ling Yuan; Jiali Bin; Peng Pan

doi:10.3390/electronics9010184

Optimized Distributed Subgraph Matching Algorithm Based on Partition Replication

Electronics ◽

10.3390/electronics9010184 ◽

2020 ◽

Vol 9 (1) ◽

pp. 184

Author(s):

Ling Yuan ◽

Jiali Bin ◽

Peng Pan

Keyword(s):

Large Scale ◽

High Efficiency ◽

Search Space ◽

Dynamic Graph ◽

Matching Algorithm ◽

Query Graph ◽

Subgraph Matching ◽

Match Algorithm ◽

Large Scale Data ◽

Scale Data

At present, with the explosive growth of data scale, subgraph matching for massive graph data is difficult to satisfy with efficiency. Meanwhile, the graph index used in existing subgraph matching algorithm is difficult to update and maintain when facing dynamic graphs. We propose a distributed subgraph matching algorithm based on Partition Replica (noted as PR-Match) to process the partition and storage of large-scale data graphs. The PR-Match algorithm first splits the query graph into sub-queries, then assigns the sub-query to each node for sub-graph matching, and finally merges the matching results. In the PR-Match algorithm, we propose a heuristic rule based on prediction cost to select the optimal merging plan, which greatly reduces the cost of merging. In order to accelerate the matching speed of the sub-query graph, a vertex code based on the vertex neighbor label signature is proposed, which greatly reduces the search space for the subquery. As the vertex code is based on the increment, the problem that the feature-based graph index is difficult to maintain in the face of the dynamic graph is solved. An abundance of experiments on real and synthetic datasets demonstrate the high efficiency and strong scalability of the PR-Match algorithm when handling large-scale data graphs.

Download Full-text

Subgraph-Indexed Sequential Subdivision for Continuous Subgraph Matching on Dynamic Knowledge Graph

Complexity ◽

10.1155/2020/8871756 ◽

2020 ◽

Vol 2020 ◽

pp. 1-18

Author(s):

Yunhao Sun ◽

Guanyu Li ◽

Mengmeng Guan ◽

Bo Ning

Keyword(s):

Empirical Studies ◽

Search Space ◽

Knowledge Graph ◽

Dynamic Graph ◽

Matching Problem ◽

Flow Graph ◽

Query Graph ◽

Subgraph Matching ◽

Wide Range ◽

Multiple Edges

Continuous subgraph matching problem on dynamic graph has become a popular research topic in the field of graph analysis, which has a wide range of applications including information retrieval and community detection. Specifically, given a query graph q , an initial graph G 0 , and a graph update stream △ G i , the problem of continuous subgraph matching is to sequentially conduct all possible isomorphic subgraphs covering △ G i of q on G i (= G 0 ⊕ △ G i ). Since knowledge graph is a directed labeled multigraph having multiple edges between a pair of vertices, it brings new challenges for the problem focusing on dynamic knowledge graph. One challenge is that the multigraph characteristic of knowledge graph intensifies the complexity of candidate calculation, which is the combination of complex topological and attributed structures. Another challenge is that the isomorphic subgraphs covering a given region are conducted on a huge search space of seed candidates, which causes a lot of time consumption for searching the unpromising candidates. To address these challenges, a method of subgraph-indexed sequential subdivision is proposed to accelerating the continuous subgraph matching on dynamic knowledge graph. Firstly, a flow graph index is proposed to arrange the search space of seed candidates in topological knowledge graph and an adjacent index is designed to accelerate the identification of candidate activation states in attributed knowledge graph. Secondly, the sequential subdivision of flow graph index and the transition state model are employed to incrementally conduct subgraph matching and maintain the regional influence of changed candidates, respectively. Finally, extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.

Download Full-text

E-Commerce data classification in the cloud environment based on bayesian algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189421 ◽

2020 ◽

pp. 1-8

Author(s):

Bing Xu

Keyword(s):

Large Scale ◽

High Efficiency ◽

Feature Selection Method ◽

Data Classification ◽

Classification Algorithm ◽

Testing Time ◽

Bayesian Algorithm ◽

Large Scale Data ◽

Distributed Platform ◽

Scale Data

In the process of e-commerce transactions, a large amount of data will be generated, whose effective classification is one of current research hotspots. An improved feature selection method was proposed based on the characteristics of Bayesian classification algorithm. Due to the long training and testing time of modern large-scale data classification on a single computer, a data classification algorithm based on Naive Bayes was designed and implemented on the Hadoop distributed platform. The experimental results showed that the improved algorithm could effectively improve the accuracy of classification, and the designed parallel Bayesian data classification algorithm had high efficiency, which was suitable for the processing and analysis of massive data.

Download Full-text

Large-Scale Data Learning Method for Anomaly Detection using Machine Learning for Monitoring Vibration in Vehicle Equipment

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.140.480 ◽

2020 ◽

Vol 140 (6) ◽

pp. 480-487

Author(s):

Minoru Kondo

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Large Scale ◽

Learning Method ◽

Large Scale Data ◽

Scale Data

Download Full-text

Faculty Opinions recommendation of Comparative assessment of large-scale data sets of protein-protein interactions.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1006598.82257 ◽

2002 ◽

Author(s):

Rob Russell

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Comparative Assessment ◽

Data Sets ◽

Protein Protein Interactions ◽

Large Scale Data ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

ProGen:Provenance database generator for large-scale data set

Journal of Computer Applications ◽

10.3724/sp.j.1087.2008.02737 ◽

2009 ◽

Vol 28 (11) ◽

pp. 2737-2740

Author(s):

Xiao ZHANG ◽

Shan WANG ◽

Na LIAN

Keyword(s):

Large Scale ◽

Data Set ◽

Large Scale Data ◽

Scale Data

Download Full-text

Construction of integrated particle rendering environment for large scale data visualization

Impact ◽

10.21820/23987073.2018.11.9 ◽

2018 ◽

Vol 2018 (11) ◽

pp. 9-11

Author(s):

Koji Koyamada

Keyword(s):

Data Visualization ◽

Large Scale ◽

Large Scale Data ◽

Scale Data

Download Full-text

COMMUNITY-CURATED DATA RESOURCES AND LARGE-SCALE DATA-MODEL SYNTHESES: THE CHILDREN OF COHMAP

10.1130/abs/2016am-286533 ◽

2016 ◽

Author(s):

John W. Williams ◽

◽

Simon Goring ◽

Eric Grimm ◽

Jason McLachlan

Keyword(s):

Data Model ◽

Large Scale ◽

Large Scale Data ◽

Scale Data

Download Full-text

Local and global approaches of affinity propagation clustering for large scale data

Journal of Zhejiang University SCIENCE A ◽

10.1631/jzus.a0720058 ◽

2008 ◽

Vol 9 (10) ◽

pp. 1373-1381 ◽

Cited By ~ 28

Author(s):

Ding-yin Xia ◽

Fei Wu ◽

Xu-qing Zhang ◽

Yue-ting Zhuang

Keyword(s):

Large Scale ◽

Affinity Propagation ◽

Large Scale Data ◽

Affinity Propagation Clustering ◽

Scale Data

Download Full-text

Towards Large-Scale Data Annotation of Audio from Wearables: Validating Zooniverse Annotations of Infant Vocalization Types

2021 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt48900.2021.9383511 ◽

2021 ◽

Author(s):

Chiara Semenzin ◽

Lisa Hamrick ◽

Amanda Seidl ◽

Bridgette Kelleher ◽

Alejandrina Cristia

Keyword(s):

Large Scale ◽

Data Annotation ◽

Large Scale Data ◽

Infant Vocalization ◽

Scale Data

Download Full-text

A Framework for International Collaboration on ITER Using Large-Scale Data Transfer to Enable Near-Real-Time Analysis

Fusion Science & Technology ◽

10.1080/15361055.2020.1851073 ◽

2021 ◽

Vol 77 (2) ◽

pp. 98-108

Author(s):

R. M. Churchill ◽

C. S. Chang ◽

J. Choi ◽

J. Wong ◽

S. Klasky ◽

...

Keyword(s):

Real Time ◽

International Collaboration ◽

Large Scale ◽

Data Transfer ◽

Time Analysis ◽

Real Time Analysis ◽

Large Scale Data ◽

Scale Data

Download Full-text