Spout: a transparent distributed execution engine for Java applets

It has been observed that there has been a great interest in computing experiments which has been useful on shared nothing computers and commodity machines. We need multiple systems running in parallel working closely together towards the same goal. Frequently it has been experienced and observed that the distributed execution engine named MapReduce handles the primary input-output workload for such clusters. There are numerous distributed file systems around viz. NTFS,ReFS,FAT,FAT32 in windows and linux, we studied them and implemented a few distributed file systems. It has been studied that distributed file systems (DFS) work very well on many small files but some do not generate expected output on large files. We implemented benchmark testing algorithms in each distributed files systems for small and large files, and the analysis is been put forward in this paper. Even we came across the various implementation issues of various DFS, they have also been mentioned in this paper.

Download Full-text

DtCraft: A distributed execution engine for compute-intensive applications

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) ◽

10.1109/iccad.2017.8203853 ◽

2017 ◽

Cited By ~ 2

Author(s):

Tsung-Wei Huang ◽

Chun-Xun Lin ◽

Martin D. F. Wong

Keyword(s):

Execution Engine ◽

Distributed Execution

Download Full-text

Rumble

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436910 ◽

2020 ◽

Vol 14 (4) ◽

pp. 498-506 ◽

Cited By ~ 1

Author(s):

Ingo Müller ◽

Ghislain Fourny ◽

Stefan Irimescu ◽

Can Berker Cikis ◽

Gustavo Alonso

Keyword(s):

Data Model ◽

Data Sets ◽

Complex Data ◽

Recursive Structure ◽

Tabular Data ◽

Impedance Mismatch ◽

Execution Engine ◽

Complex Data Sets ◽

Distributed Execution ◽

Data Independence

This paper introduces Rumble, a query execution engine for large, heterogeneous, and nested collections of JSON objects built on top of Apache Spark. While data sets of this type are more and more wide-spread, most existing tools are built around a tabular data model, creating an impedance mismatch for both the engine and the query interface. In contrast, Rumble uses JSONiq, a standardized language specifically designed for querying JSON documents. The key challenge in the design and implementation of Rumble is mapping the recursive structure of JSON documents and JSONiq queries onto Spark's execution primitives based on tabular data frames. Our solution is to translate a JSONiq expression into a tree of iterators that dynamically switch between local and distributed execution modes depending on the nesting level. By overcoming the impedance mismatch in the engine , Rumble frees the user from solving the same problem for every single query, thus increasing their productivity considerably. As we show in extensive experiments, Rumble is able to scale to large and complex data sets in the terabyte range with a similar or better performance than other engines. The results also illustrate that Codd's concept of data independence makes as much sense for heterogeneous, nested data sets as it does on highly structured tables.

Download Full-text

DtCraft: A High-Performance Distributed Execution Engine at Scale

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/tcad.2018.2834422 ◽

2019 ◽

Vol 38 (6) ◽

pp. 1070-1083 ◽

Cited By ~ 1

Author(s):

Tsung-Wei Huang ◽

Chun-Xun Lin ◽

Martin D. F. Wong

Keyword(s):

High Performance ◽

Execution Engine ◽

Distributed Execution

Download Full-text

Lightweight Distributed Execution Engine for Large-Scale Spatial Join Query Processing

2015 IEEE International Congress on Big Data ◽

10.1109/bigdatacongress.2015.30 ◽

2015 ◽

Cited By ~ 5

Author(s):

Jianting Zhang ◽

Simin You ◽

Le Gruenwald

Keyword(s):

Query Processing ◽

Large Scale ◽

Spatial Join ◽

Execution Engine ◽

Distributed Execution

Download Full-text

J-CO: A Platform-Independent Framework for Managing Geo-Referenced JSON Data Sets

Electronics ◽

10.3390/electronics10050621 ◽

2021 ◽

Vol 10 (5) ◽

pp. 621

Author(s):

Giuseppe Psaila ◽

Paolo Fosci

Keyword(s):

Query Language ◽

Open Data ◽

Internet Technology ◽

Data Sets ◽

Specific Storage ◽

Current State ◽

Execution Engine ◽

Share Data ◽

Cloud Servers ◽

Computational Resources

Internet technology and mobile technology have enabled producing and diffusing massive data sets concerning almost every aspect of day-by-day life. Remarkable examples are social media and apps for volunteered information production, as well as Open Data portals on which public administrations publish authoritative and (often) geo-referenced data sets. In this context, JSON has become the most popular standard for representing and exchanging possibly geo-referenced data sets over the Internet.Analysts, wishing to manage, integrate and cross-analyze such data sets, need a framework that allows them to access possibly remote storage systems for JSON data sets, to retrieve and query data sets by means of a unique query language (independent of the specific storage technology), by exploiting possibly-remote computational resources (such as cloud servers), comfortably working on their PC in their office, more or less unaware of real location of resources. In this paper, we present the current state of the J-CO Framework, a platform-independent and analyst-oriented software framework to manipulate and cross-analyze possibly geo-tagged JSON data sets. The paper presents the general approach behind the J-CO Framework, by illustrating the query language by means of a simple, yet non-trivial, example of geographical cross-analysis. The paper also presents the novel features introduced by the re-engineered version of the execution engine and the most recent components, i.e., the storage service for large single JSON documents and the user interface that allows analysts to comfortably share data sets and computational resources with other analysts possibly working in different places of the Earth globe. Finally, the paper reports the results of an experimental campaign, which show that the execution engine actually performs in a more than satisfactory way, proving that our framework can be actually used by analysts to process JSON data sets.

Download Full-text