scholarly journals A Holistic Approach for High-level Programming of Next-generation Data-intensive Applications Targeting Distributed Heterogeneous Computing Environment

2016 ◽  
Vol 97 ◽  
pp. 131-134 ◽  
Author(s):  
Emanuele Carlini ◽  
Patrizio Dazzi ◽  
Matteo Mordacchini
Author(s):  
Robert Searles ◽  
Michela Taufer ◽  
Sunita Chandrasekaran ◽  
Stephen Herbein ◽  
Travis Johnston

Informatics ◽  
2020 ◽  
Vol 7 (3) ◽  
pp. 29
Author(s):  
Davy Preuveneers ◽  
Wouter Joosen

This paper presents the architecture, implementation and evaluation of a middleware support layer for NoSQL storage systems. Our middleware automatically selects performance and scalability tactics in terms of application specific workloads. Enterprises are turning to NoSQL storage technologies for their data-intensive computing and analytics applications. Comprehensive benchmarks of different Big Data platforms can help drive decisions which solutions to adopt. However, selecting the best performing technology, configuring the deployment for scalability and tuning parameters at runtime for an optimal service delivery remain challenging tasks, especially when application workloads evolve over time. Our middleware solves this problem at runtime by monitoring the data growth, changes in the read-write-query mix at run-time, as well as other system metrics that are indicative of sub-optimal performance. Our middleware employs supervised machine learning on historic and current monitoring information and corresponding configurations to select the best combinations of high-level tactics and adapt NoSQL systems to evolving workloads. This work has been driven by two real world case studies with different QoS requirements. The evaluation demonstrates that our middleware can adapt to unseen workloads of data-intensive applications, and automate the configuration of different families of NoSQL systems at runtime to optimize the performance and scalability of such applications.


2009 ◽  
Vol 17 (1-2) ◽  
pp. 113-134 ◽  
Author(s):  
Ana Lucia Varbanescu ◽  
Alexander S. van Amesfoort ◽  
Tim Cornwell ◽  
Ger van Diepen ◽  
Rob van Nieuwpoort ◽  
...  

The performance potential of the Cell/B.E., as well as its availability, have attracted a lot of attention from various high-performance computing (HPC) fields. While computation intensive kernels proved to be exceptionally well suited for running on the Cell, irregular data-intensive applications are usually considered as poor matches. In this paper, we present our complete solution for enabling such a data-intensive application to run efficiently on the Cell/B.E. processor. Specifically, we target radioastronomy data gridding and degridding, two resembling imaging filters based on convolutional resampling. Our solution is based on building a high-level application model, used to evaluate parallelization alternatives. Next, we choose the one with the best performance potential, and we gradually exploit this potential by applying platform-specific and application-specific optimizations. After several iterations, our target application shows a speed-up factor between 10 and 20 on a dual-Cell blade when compared with the original application running on a commodity machine. Given these results, and based on our empirical observations, we are able to pinpoint a set of ten guidelines for parallelizing similar applications on the Cell/B.E. Finally, we conclude the Cell/B.E. can provide high performance for data-intensive applications at the price of increased programming efforts and with a significant aid from aggressive application-specific optimizations.


Author(s):  
Zhiming Zhao ◽  
Paola Grosso ◽  
Jeroen van der Ham ◽  
Cees Th. A.M. de Laat

Moving large quantities of data between distributed parties is a frequently invoked process in data intensive applications, such as collaborative digital media development. These transfers often have high quality requirements on the network services, especially when they involve user interactions or require real time processing on large volumes of data. The best effort services provided by IP-routed networks give limited guarantee on the delivery performance. Advanced networks such as hybrid networks make it feasible for high level applications, such as workflows, to request network paths and service provisioning. However, the quality of network services has so far rarely been considered in composing and executing workflow processes; applications tune the execution quality selecting only optimal software services and computing resources, and neglecting the network components. In this chapter, the authors provide an overview on this research domain, and introduce a system called NEtWork QoS Planner (NEWQoSPlanner) to provide support for including network services in high level workflow applications.


2019 ◽  
Vol 12 (7) ◽  
pp. 3001-3015 ◽  
Author(s):  
Shahbaz Memon ◽  
Dorothée Vallot ◽  
Thomas Zwinger ◽  
Jan Åström ◽  
Helmut Neukirchen ◽  
...  

Abstract. Scientific computing applications involving complex simulations and data-intensive processing are often composed of multiple tasks forming a workflow of computing jobs. Scientific communities running such applications on computing resources often find it cumbersome to manage and monitor the execution of these tasks and their associated data. These workflow implementations usually add overhead by introducing unnecessary input/output (I/O) for coupling the models and can lead to sub-optimal CPU utilization. Furthermore, running these workflow implementations in different environments requires significant adaptation efforts, which can hinder the reproducibility of the underlying science. High-level scientific workflow management systems (WMS) can be used to automate and simplify complex task structures by providing tooling for the composition and execution of workflows – even across distributed and heterogeneous computing environments. The WMS approach allows users to focus on the underlying high-level workflow and avoid low-level pitfalls that would lead to non-optimal resource usage while still allowing the workflow to remain portable between different computing environments. As a case study, we apply the UNICORE workflow management system to enable the coupling of a glacier flow model and calving model which contain many tasks and dependencies, ranging from pre-processing and data management to repetitive executions in heterogeneous high-performance computing (HPC) resource environments. Using the UNICORE workflow management system, the composition, management, and execution of the glacier modelling workflow becomes easier with respect to usage, monitoring, maintenance, reusability, portability, and reproducibility in different environments and by different user groups. Last but not least, the workflow helps to speed the runs up by reducing model coupling I/O overhead and it optimizes CPU utilization by avoiding idle CPU cores and running the models in a distributed way on the HPC cluster that best fits the characteristics of each model.


Author(s):  
Dana Petcu ◽  
Horacio González–Vélez ◽  
Bogdan Nicolae ◽  
Juan Miguel García–Gómez ◽  
Elies Fuster–Garcia ◽  
...  

Author(s):  
Robert Searles ◽  
Stephen Herbein ◽  
Travis Johnston ◽  
Michela Taufer ◽  
Sunita Chandrasekaran

Sign in / Sign up

Export Citation Format

Share Document