distributed query processing
Recently Published Documents


TOTAL DOCUMENTS

136
(FIVE YEARS 4)

H-INDEX

17
(FIVE YEARS 0)

2021 ◽  
Vol 14 (13) ◽  
pp. 3308-3321
Author(s):  
Dimitrios Koutsoukos ◽  
Ingo Müller ◽  
Renato Marroquín ◽  
Ana Klimovic ◽  
Gustavo Alonso

The enormous quantity of data produced every day together with advances in data analytics has led to a proliferation of data management and analysis systems. Typically, these systems are built around highly specialized monolithic operators optimized for the underlying hardware. While effective in the short term, such an approach makes the operators cumbersome to port and adapt, which is increasingly required due to the speed at which algorithms and hardware evolve. To address this limitation, we present Modularis , an execution layer for data analytics based on sub-operators , i.e., composable building blocks resembling traditional database operators but at a finer granularity. To demonstrate the feasibility and advantages of our approach, we use Modularis to build a distributed query processing system supporting relational queries running on an RDMA cluster, a serverless cloud platform, and a smart storage engine. Modularis requires minimal code changes to execute queries across these three diverse hardware platforms, showing that the sub-operator approach reduces the amount and complexity of the code to maintain. In fact, changes in the platform affect only those sub-operators that depend on the underlying hardware (in our use cases, mainly the sub-operators related to network communication). We show the end-to-end performance of Modularis by comparing it with a framework for SQL processing (Presto), a commercial cluster database (SingleStore), as well as Query-as-a-Service systems (Athena, BigQuery). Modularis outperforms all these systems, proving that the design and architectural advantages of a modular design can be achieved without degrading performance. We also compare Modularis with a hand-optimized implementation of a join for RDMA clusters. We show that Modularis has the advantage of being easily extensible to a wider range of join variants and group by queries, all of which are not supported in the hand-tuned join.


Author(s):  
Ralf Hartmut Güting ◽  
Thomas Behr ◽  
Jan Kristof Nidzwetzki

AbstractWe propose a simple model for distributed query processing based on the concept of a distributed array. Such an array has fields of some data type whose values can be stored on different machines. It offers operations to manipulate all fields in parallel within the distributed algebra. The arrays considered are one-dimensional and just serve to model a partitioned and distributed data set. Distributed arrays rest on a given set of data types and operations called the basic algebra implemented by some piece of software called the basic engine. It provides a complete environment for query processing on a single machine. We assume this environment is extensible by types and operations. Operations on distributed arrays are implemented by one basic engine called the master which controls a set of basic engines called the workers. It maps operations on distributed arrays to the respective operations on their fields executed by workers. The distributed algebra is completely generic: any type or operation added in the extensible basic engine will be immediately available for distributed query processing. To demonstrate the use of the distributed algebra as a language for distributed query processing, we describe a fairly complex algorithm for distributed density-based similarity clustering. The algorithm is a novel contribution by itself. Its complete implementation is shown in terms of the distributed algebra and the basic algebra. As a basic engine the Secondo system is used, a rich environment for extensible query processing, providing useful tools such as main memory M-trees, graphs, or a DBScan implementation.


2020 ◽  
Vol 30 (02) ◽  
pp. 2050007
Author(s):  
Abdulelah Algosaibi ◽  
Khaled Ragab ◽  
Saleh Albahli

In recent years, data are generated rapidly that advanced the evolving of the linked data. Modern data are globally distributed over the semantically linked graphs. The nature of the distributed data over the semantic graph raised new demands on further investigation on improving performance on the semantic graphs. In this work, we analyzed the time latency as an important factor to be further investigated and improved. We evaluated the parallel computing on these distributed data in order to better utilize the parallelism approaches. A federation framework based on a multi-threaded environment supporting federated SPARQL query was introduced. In our experiments, we show the achievability and effectiveness of our model on a set of real-world quires through real-world Linked Open Data cloud. Significant performance improvement has noticed. Further, we highlight short-comings that could open an avenue in the research of federated queries. Keywords: Semantic web; distributed query processing; query federation; linked data; join methods.


Development of high web utilization made business procedures in a difficult manner. In request to dissect the online business un-organized and gigantic measure of information is unimaginable with the Traditional frameworks. Recent innovations propel the strategies for examination are made to break down a lot of the information utilizing the Big Data Techniques, and to improve the adaptability and the precision of investigating the business methodologies, it has actualized on Hadoop with parallel preparing. This paper presents the experimental study on IBM real time data of one lakh records for demonstrating the efficiency of proposed Hadoop based distributed query processing technique.


Sign in / Sign up

Export Citation Format

Share Document