A Holistic Approach for High-level Programming of Next-generation Data-intensive Applications Targeting Distributed Heterogeneous Computing Environment

This paper presents the architecture, implementation and evaluation of a middleware support layer for NoSQL storage systems. Our middleware automatically selects performance and scalability tactics in terms of application specific workloads. Enterprises are turning to NoSQL storage technologies for their data-intensive computing and analytics applications. Comprehensive benchmarks of different Big Data platforms can help drive decisions which solutions to adopt. However, selecting the best performing technology, configuring the deployment for scalability and tuning parameters at runtime for an optimal service delivery remain challenging tasks, especially when application workloads evolve over time. Our middleware solves this problem at runtime by monitoring the data growth, changes in the read-write-query mix at run-time, as well as other system metrics that are indicative of sub-optimal performance. Our middleware employs supervised machine learning on historic and current monitoring information and corresponding configurations to select the best combinations of high-level tactics and adapt NoSQL systems to evolving workloads. This work has been driven by two real world case studies with different QoS requirements. The evaluation demonstrates that our middleware can adapt to unseen workloads of data-intensive applications, and automate the configuration of different families of NoSQL systems at runtime to optimize the performance and scalability of such applications.

Download Full-text

Building High-Resolution Sky Images Using the Cell/B.E.

Scientific Programming ◽

10.1155/2009/408370 ◽

2009 ◽

Vol 17 (1-2) ◽

pp. 113-134 ◽

Cited By ~ 4

Author(s):

Ana Lucia Varbanescu ◽

Alexander S. van Amesfoort ◽

Tim Cornwell ◽

Ger van Diepen ◽

Rob van Nieuwpoort ◽

...

Keyword(s):

High Performance ◽

Complete Solution ◽

Performance Potential ◽

Data Intensive ◽

Irregular Data ◽

Original Application ◽

The One ◽

High Level ◽

Data Intensive Applications ◽

Application Specific

The performance potential of the Cell/B.E., as well as its availability, have attracted a lot of attention from various high-performance computing (HPC) fields. While computation intensive kernels proved to be exceptionally well suited for running on the Cell, irregular data-intensive applications are usually considered as poor matches. In this paper, we present our complete solution for enabling such a data-intensive application to run efficiently on the Cell/B.E. processor. Specifically, we target radioastronomy data gridding and degridding, two resembling imaging filters based on convolutional resampling. Our solution is based on building a high-level application model, used to evaluate parallelization alternatives. Next, we choose the one with the best performance potential, and we gradually exploit this potential by applying platform-specific and application-specific optimizations. After several iterations, our target application shows a speed-up factor between 10 and 20 on a dual-Cell blade when compared with the original application running on a commodity machine. Given these results, and based on our empirical observations, we are able to pinpoint a set of ten guidelines for parallelizing similar applications on the Cell/B.E. Finally, we conclude the Cell/B.E. can provide high performance for data-intensive applications at the price of increased programming efforts and with a significant aid from aggressive application-specific optimizations.

Download Full-text

Quality Guaranteed Media Delivery over Advanced Network

Next Generation Content Delivery Infrastructures ◽

10.4018/978-1-4666-1794-0.ch006 ◽

2012 ◽

pp. 121-146

Author(s):

Zhiming Zhao ◽

Paola Grosso ◽

Jeroen van der Ham ◽

Cees Th. A.M. de Laat

Keyword(s):

Digital Media ◽

Network Services ◽

User Interactions ◽

Real Time Processing ◽

Data Intensive ◽

Delivery Performance ◽

High Level ◽

Media Delivery ◽

Data Intensive Applications ◽

Software Services

Moving large quantities of data between distributed parties is a frequently invoked process in data intensive applications, such as collaborative digital media development. These transfers often have high quality requirements on the network services, especially when they involve user interactions or require real time processing on large volumes of data. The best effort services provided by IP-routed networks give limited guarantee on the delivery performance. Advanced networks such as hybrid networks make it feasible for high level applications, such as workflows, to request network paths and service provisioning. However, the quality of network services has so far rarely been considered in composing and executing workflow processes; applications tune the execution quality selecting only optimal software services and computing resources, and neglecting the network components. In this chapter, the authors provide an overview on this research domain, and introduce a system called NEtWork QoS Planner (NEWQoSPlanner) to provide support for including network services in high level workflow applications.

Download Full-text

Scientific workflows applied to the coupling of a continuum (Elmer v8.3) and a discrete element (HiDEM v1.0) ice dynamic model

Geoscientific Model Development ◽

10.5194/gmd-12-3001-2019 ◽

2019 ◽

Vol 12 (7) ◽

pp. 3001-3015 ◽

Cited By ~ 2

Author(s):

Shahbaz Memon ◽

Dorothée Vallot ◽

Thomas Zwinger ◽

Jan Åström ◽

Helmut Neukirchen ◽

...

Keyword(s):

Management System ◽

High Performance ◽

Heterogeneous Computing ◽

Workflow Management ◽

Scientific Workflow ◽

Workflow Management System ◽

Data Intensive ◽

Cpu Utilization ◽

Computing Environments ◽

High Level

Abstract. Scientific computing applications involving complex simulations and data-intensive processing are often composed of multiple tasks forming a workflow of computing jobs. Scientific communities running such applications on computing resources often find it cumbersome to manage and monitor the execution of these tasks and their associated data. These workflow implementations usually add overhead by introducing unnecessary input/output (I/O) for coupling the models and can lead to sub-optimal CPU utilization. Furthermore, running these workflow implementations in different environments requires significant adaptation efforts, which can hinder the reproducibility of the underlying science. High-level scientific workflow management systems (WMS) can be used to automate and simplify complex task structures by providing tooling for the composition and execution of workflows – even across distributed and heterogeneous computing environments. The WMS approach allows users to focus on the underlying high-level workflow and avoid low-level pitfalls that would lead to non-optimal resource usage while still allowing the workflow to remain portable between different computing environments. As a case study, we apply the UNICORE workflow management system to enable the coupling of a glacier flow model and calving model which contain many tasks and dependencies, ranging from pre-processing and data management to repetitive executions in heterogeneous high-performance computing (HPC) resource environments. Using the UNICORE workflow management system, the composition, management, and execution of the glacier modelling workflow becomes easier with respect to usage, monitoring, maintenance, reusability, portability, and reproducibility in different environments and by different user groups. Last but not least, the workflow helps to speed the runs up by reducing model coupling I/O overhead and it optimizes CPU utilization by avoiding idle CPU cores and running the models in a distributed way on the HPC cluster that best fits the characteristics of each model.

Download Full-text

Next Generation HPC Clouds: A View for Large-Scale Scientific and Data-Intensive Applications

Lecture Notes in Computer Science - Euro-Par 2014: Parallel Processing Workshops ◽

10.1007/978-3-319-14313-2_3 ◽

2014 ◽

pp. 26-37 ◽

Cited By ~ 5

Author(s):

Dana Petcu ◽

Horacio González–Vélez ◽

Bogdan Nicolae ◽

Juan Miguel García–Gómez ◽

Elies Fuster–Garcia ◽

...

Keyword(s):

Large Scale ◽

Next Generation ◽

Data Intensive ◽

Data Intensive Applications

Download Full-text

Collaborative cache allocation and task scheduling for data-intensive applications in edge computing environment

Future Generation Computer Systems ◽

10.1016/j.future.2019.01.007 ◽

2019 ◽

Vol 95 ◽

pp. 249-264 ◽

Cited By ~ 27

Author(s):

Chunlin Li ◽

Jianhang Tang ◽

Hengliang Tang ◽

Youlong Luo

Keyword(s):

Task Scheduling ◽

Edge Computing ◽

Computing Environment ◽

Data Intensive ◽

Data Intensive Applications ◽

Cache Allocation ◽

And Task

Download Full-text

Creating a portable, high-level graph analytics paradigm for compute and data-intensive applications

International Journal of High Performance Computing and Networking ◽

10.1504/ijhpcn.2019.097054 ◽

2019 ◽

Vol 13 (1) ◽

pp. 105 ◽

Cited By ~ 1

Author(s):

Robert Searles ◽

Stephen Herbein ◽

Travis Johnston ◽

Michela Taufer ◽

Sunita Chandrasekaran

Keyword(s):

Graph Analytics ◽

Data Intensive ◽

High Level ◽

Data Intensive Applications

Download Full-text

Simultaneous functional units and register allocation based power management for high-level synthesis of data-intensive applications

2010 International Conference on Communications, Circuits and Systems (ICCCAS) ◽

10.1109/icccas.2010.5581860 ◽

2010 ◽

Author(s):

Feng Wu ◽

Ning Xu ◽

Fei Zheng ◽

Fubing Mao

Keyword(s):

Power Management ◽

Register Allocation ◽

High Level Synthesis ◽

Data Intensive ◽

Functional Units ◽

High Level ◽

Data Intensive Applications

Download Full-text