Empowering Fast Incremental Computation over Large Scale Dynamic Graphs

Author(s):  
Charith Wickramaarachchi ◽  
Charalampos Chelmis ◽  
Viktor K. Prasanna
Author(s):  
Luis M. Vaquero ◽  
Felix Cuadrado ◽  
Dionysios Logothetis ◽  
Claudio Martella

2021 ◽  
Vol 37 (2) ◽  
pp. 107-122
Author(s):  
Anh-Cang Phan ◽  
Thanh-Ngoan Trieu ◽  
Thuong-Cang Phan

In the era of information explosion, Big data is receiving increased attention as having important implications for growth, profitability, and survival of modern organizations. However, it also offers many challenges in the way data is processed and queried over time. A join operation is one of the most common operations appearing in many data queries. Specially, a recursive join is a join type used to query hierarchical data but it is more extremely complex and costly. The evaluation of the recursive join in MapReduce includes some iterations of two tasks of a join task and an incremental computation task. Those tasks are significantly expensive and reduce the performance of queries in large datasets because they generate plenty of intermediate data transmitting over the network. In this study, we thus propose a simple but efficient approach for Big recursive joins based on reducing by half the number of the required iterations in the Spark environment. This improvement leads to significantly reducing the number of the required tasks as well as the amount of the intermediate data generated and transferred over the network. Our experimental results show that an improved recursive join is more efficient and faster than a traditional one on large-scale datasets.


IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 78471-78482
Author(s):  
Xiaohuan Shan ◽  
Guangxiang Wang ◽  
Linlin Ding ◽  
Baoyan Song ◽  
Yan Xu

PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0248764
Author(s):  
Angelo Furno ◽  
Nour-Eddin El Faouzi ◽  
Rajesh Sharma ◽  
Eugenio Zimeo

Betweenness Centrality (BC) has proven to be a fundamental metric in many domains to identify the components (nodes) of a system modelled as a graph that are mostly traversed by information flows thus being critical to the proper functioning of the system itself. In the transportation domain, the metric has been mainly adopted to discover topological bottlenecks of the physical infrastructure composed of roads or railways. The adoption of this metric to study the evolution of transportation networks that take into account also the dynamic conditions of traffic is in its infancy mainly due to the high computation time needed to compute BC in large dynamic graphs. This paper explores the adoption of dynamic BC, i.e., BC computed on dynamic large-scale graphs, modeling road networks and the related vehicular traffic, and proposes the adoption of a fast algorithm for ahead monitoring of transportation networks by computing approximated BC values under time constraints. The experimental analysis proves that, with a bounded and tolerable approximation, the algorithm computes BC on very large dynamically weighted graphs in a significantly shorter time if compared with exact computation. Moreover, since the proposed algorithm can be tuned for an ideal trade-off between performance and accuracy, our solution paves the way to quasi real-time monitoring of highly dynamic networks providing anticipated information about possible congested or vulnerable areas. Such knowledge can be exploited by travel assistance services or intelligent traffic control systems to perform informed re-routing and therefore enhance network resilience in smart cities.


Author(s):  
Chun Jiang Zhu ◽  
Tan Zhu ◽  
Kam-Yiu Lam ◽  
Song Han ◽  
Jinbo Bi

We consider the problem of clustering graph nodes over large-scale dynamic graphs, such as citation networks, images and web networks, when graph updates such as node/edge insertions/deletions are observed distributively. We propose communication-efficient algorithms for two well-established communication models namely the message passing and the blackboard models. Given a graph with n nodes that is observed at s remote sites over time [1,t], the two proposed algorithms have communication costs Õ(ns) and Õ(n + s) (Õ hides a polylogarithmic factor), almost matching their lower bounds, Ω(ns) and Ω(n + s), respectively, in the message passing and the blackboard models. More importantly, we prove that at each time point in [1,t] our algorithms generate clustering quality nearly as good as that of centralizing all updates up to that time and then applying a standard centralized clustering algorithm. We conducted extensive experiments on both synthetic and real-life datasets which confirmed the communication efficiency of our approach over baseline algorithms while achieving comparable clustering results.


Sign in / Sign up

Export Citation Format

Share Document