distributed query processing Latest Research Papers

Modularis

Proceedings of the VLDB Endowment ◽

10.14778/3484224.3484229 ◽

2021 ◽

Vol 14 (13) ◽

pp. 3308-3321

Author(s):

Dimitrios Koutsoukos ◽

Ingo Müller ◽

Renato Marroquín ◽

Ana Klimovic ◽

Gustavo Alonso

Keyword(s):

Data Analytics ◽

Modular Design ◽

Processing System ◽

Building Blocks ◽

Short Term ◽

Distributed Query Processing ◽

Distributed Query ◽

Code Changes ◽

Hardware Platforms ◽

Cluster Database

The enormous quantity of data produced every day together with advances in data analytics has led to a proliferation of data management and analysis systems. Typically, these systems are built around highly specialized monolithic operators optimized for the underlying hardware. While effective in the short term, such an approach makes the operators cumbersome to port and adapt, which is increasingly required due to the speed at which algorithms and hardware evolve. To address this limitation, we present Modularis , an execution layer for data analytics based on sub-operators , i.e., composable building blocks resembling traditional database operators but at a finer granularity. To demonstrate the feasibility and advantages of our approach, we use Modularis to build a distributed query processing system supporting relational queries running on an RDMA cluster, a serverless cloud platform, and a smart storage engine. Modularis requires minimal code changes to execute queries across these three diverse hardware platforms, showing that the sub-operator approach reduces the amount and complexity of the code to maintain. In fact, changes in the platform affect only those sub-operators that depend on the underlying hardware (in our use cases, mainly the sub-operators related to network communication). We show the end-to-end performance of Modularis by comparing it with a framework for SQL processing (Presto), a commercial cluster database (SingleStore), as well as Query-as-a-Service systems (Athena, BigQuery). Modularis outperforms all these systems, proving that the design and architectural advantages of a modular design can be achieved without degrading performance. We also compare Modularis with a hand-optimized implementation of a join for RDMA clusters. We show that Modularis has the advantage of being easily extensible to a wider range of join variants and group by queries, all of which are not supported in the hand-tuned join.

Download Full-text

Compliant Geo-distributed Query Processing

Proceedings of the 2021 International Conference on Management of Data ◽

10.1145/3448016.3453687 ◽

2021 ◽

Author(s):

Kaustubh Beedkar ◽

Jorge-Arnulfo Quiané-Ruiz ◽

Volker Markl

Keyword(s):

Query Processing ◽

Distributed Query Processing ◽

Distributed Query

Download Full-text

Distributed rrays: an algebra for generic distributed query processing

Distributed and Parallel Databases ◽

10.1007/s10619-021-07325-2 ◽

2021 ◽

Author(s):

Ralf Hartmut Güting ◽

Thomas Behr ◽

Jan Kristof Nidzwetzki

Keyword(s):

Query Processing ◽

Main Memory ◽

Basic Algebra ◽

Distributed Data ◽

Data Types ◽

Distributed Query Processing ◽

Data Set ◽

Distributed Query ◽

Basic Engine ◽

Distributed Arrays

AbstractWe propose a simple model for distributed query processing based on the concept of a distributed array. Such an array has fields of some data type whose values can be stored on different machines. It offers operations to manipulate all fields in parallel within the distributed algebra. The arrays considered are one-dimensional and just serve to model a partitioned and distributed data set. Distributed arrays rest on a given set of data types and operations called the basic algebra implemented by some piece of software called the basic engine. It provides a complete environment for query processing on a single machine. We assume this environment is extensible by types and operations. Operations on distributed arrays are implemented by one basic engine called the master which controls a set of basic engines called the workers. It maps operations on distributed arrays to the respective operations on their fields executed by workers. The distributed algebra is completely generic: any type or operation added in the extensible basic engine will be immediately available for distributed query processing. To demonstrate the use of the distributed algebra as a language for distributed query processing, we describe a fairly complex algorithm for distributed density-based similarity clustering. The algorithm is a novel contribution by itself. Its complete implementation is shown in terms of the distributed algebra and the basic algebra. As a basic engine the Secondo system is used, a rich environment for extensible query processing, providing useful tools such as main memory M-trees, graphs, or a DBScan implementation.

Download Full-text

Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2021.0121237 ◽

2021 ◽

Vol 12 (12) ◽

Author(s):

Kiranmai MVSV ◽

D Haritha

Keyword(s):

Machine Learning ◽

Query Processing ◽

Large Scale ◽

Distributed Query Processing ◽

Distributed Query

Download Full-text

Parallel-Based Techniques for Managing and Analyzing the Performance on Semantic Graph

Parallel Processing Letters ◽

10.1142/s0129626420500073 ◽

2020 ◽

Vol 30 (02) ◽

pp. 2050007

Author(s):

Abdulelah Algosaibi ◽

Khaled Ragab ◽

Saleh Albahli

Keyword(s):

Real World ◽

Linked Data ◽

Open Data ◽

Distributed Data ◽

Distributed Query Processing ◽

Semantic Graph ◽

Distributed Query ◽

Significant Performance ◽

Time Latency ◽

Federated Queries

In recent years, data are generated rapidly that advanced the evolving of the linked data. Modern data are globally distributed over the semantically linked graphs. The nature of the distributed data over the semantic graph raised new demands on further investigation on improving performance on the semantic graphs. In this work, we analyzed the time latency as an important factor to be further investigated and improved. We evaluated the parallel computing on these distributed data in order to better utilize the parallelism approaches. A federation framework based on a multi-threaded environment supporting federated SPARQL query was introduced. In our experiments, we show the achievability and effectiveness of our model on a set of real-world quires through real-world Linked Open Data cloud. Significant performance improvement has noticed. Further, we highlight short-comings that could open an avenue in the research of federated queries. Keywords: Semantic web; distributed query processing; query federation; linked data; join methods.

Download Full-text

Distributed Query Processing in the Edge Assisted IoT Data Monitoring System

IEEE Internet of Things Journal ◽

10.1109/jiot.2020.3026988 ◽

2020 ◽

pp. 1-1

Author(s):

Zhipeng Cai ◽

Tuo Shi

Keyword(s):

Query Processing ◽

Monitoring System ◽

Data Monitoring ◽

Distributed Query Processing ◽

Distributed Query

Download Full-text

Prospective Data Model and Distributed Query Processing for Mobile Sensing Data Streams

Lecture Notes in Computer Science - Multiple-Aspect Analysis of Semantic Trajectories ◽

10.1007/978-3-030-38081-6_6 ◽

2020 ◽

pp. 66-82

Author(s):

Mariem Brahem ◽

Karine Zeitouni ◽

Laurent Yeh ◽

Hafsa El Hafyani

Keyword(s):

Query Processing ◽

Data Streams ◽

Data Model ◽

Mobile Sensing ◽

Distributed Query Processing ◽

Prospective Data ◽

Sensing Data ◽

Distributed Query

Download Full-text

Effective Visual Big Data Processing with Machine Learning Methodologies

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1316.10812s19 ◽

2019 ◽

Vol 8 (12S) ◽

pp. 1148-1152

Keyword(s):

Machine Learning ◽

Experimental Study ◽

Big Data ◽

Query Processing ◽

Processing Technique ◽

Time Data ◽

Distributed Query Processing ◽

Distributed Query ◽

Real Time Data ◽

Online Business

Development of high web utilization made business procedures in a difficult manner. In request to dissect the online business un-organized and gigantic measure of information is unimaginable with the Traditional frameworks. Recent innovations propel the strategies for examination are made to break down a lot of the information utilizing the Big Data Techniques, and to improve the adaptability and the precision of investigating the business methodologies, it has actualized on Hadoop with parallel preparing. This paper presents the experimental study on IBM real time data of one lakh records for demonstrating the efficiency of proposed Hadoop based distributed query processing technique.

Download Full-text