Comparison of dataflow control techniques in distributed data-intensive systems

1988 ◽  
Vol 16 (1) ◽  
pp. 157-166 ◽  
Author(s):  
W. Alexander ◽  
G. Copeland
2012 ◽  
Vol 25 (12) ◽  
pp. 1784-1797 ◽  
Author(s):  
Yan Ma ◽  
Lizhe Wang ◽  
Dingsheng Liu ◽  
Tao Yuan ◽  
Peng Liu ◽  
...  

2011 ◽  
Vol 55-57 ◽  
pp. 1053-1057
Author(s):  
Gui De Zheng ◽  
Ming Chen

The next generation of scientific experiments and studies are being carried out by large collaborations of researchers distributed around the world engaged in analysis of huge collections of data generated by scientific instruments. Grid computing has emerged as an enabler for such collaborations as it aids communities in sharing resource to achieve common objective. This paper defines the problem of scheduling distributed data-intensive application on to Gird resource and presents a formal resource and application model for the problem.


2013 ◽  
Vol 29 (3) ◽  
pp. 739-750 ◽  
Author(s):  
Lizhe Wang ◽  
Jie Tao ◽  
Rajiv Ranjan ◽  
Holger Marten ◽  
Achim Streit ◽  
...  

Author(s):  
Rosa Filguiera ◽  
Amrey Krause ◽  
Malcolm Atkinson ◽  
Iraklis Klampanos ◽  
Alexander Moreno

This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications. These combine the familiarity of Python programming with the scalability of workflows. Data streaming is used to gain performance, rapid prototyping and applicability to live observations. dispel4py enables scientists to focus on their scientific goals, avoiding distracting details and retaining flexibility over the computing infrastructure they use. The implementation, therefore, has to map dispel4py abstract workflows optimally onto target platforms chosen dynamically. We present four dispel4py mappings: Apache Storm, message-passing interface (MPI), multi-threading and sequential, showing two major benefits: a) smooth transitions from local development on a laptop to scalable execution for production work, and b) scalable enactment on significantly different distributed computing infrastructures. Three application domains are reported and measurements on multiple infrastructures show the optimisations achieved; they have provided demanding real applications and helped us develop effective training. The dispel4py.org is an open-source project to which we invite participation. The effective mapping of dispel4py onto multiple target infrastructures demonstrates exploitation of data-intensive and high-performance computing (HPC) architectures and consistent scalability.


Sign in / Sign up

Export Citation Format

Share Document