DryadOpt: Branch-and-Bound on Distributed Data-Parallel Execution Engines

A Framework for Distributed Data-Parallel Execution in the Kepler Scientific Workflow System

Procedia Computer Science ◽

10.1016/j.procs.2012.04.178 ◽

2012 ◽

Vol 9 ◽

pp. 1620-1629 ◽

Cited By ~ 13

Author(s):

Jianwu Wang ◽

Daniel Crawl ◽

Ilkay Altintas

Keyword(s):

Scientific Workflow ◽

Parallel Execution ◽

Distributed Data ◽

Data Parallel ◽

Workflow System

Download Full-text

Retargeting sequential image-processing programs for data parallel execution

IEEE Transactions on Software Engineering ◽

10.1109/tse.2005.26 ◽

2005 ◽

Vol 31 (2) ◽

pp. 116-136 ◽

Cited By ~ 6

Author(s):

L.B. Baumstark ◽

L.M. Wills

Keyword(s):

Image Processing ◽

Parallel Execution ◽

Data Parallel ◽

Sequential Image

Download Full-text

Comparing task and data parallel execution schemes for the DIIRK method

Lecture Notes in Computer Science - Euro-Par'96 Parallel Processing ◽

10.1007/bfb0024684 ◽

1996 ◽

pp. 52-61 ◽

Cited By ~ 3

Author(s):

Thomas Rauber ◽

Gudula Rünger

Keyword(s):

Parallel Execution ◽

Data Parallel

Download Full-text

ZenLDA: Large-scale topic model training on distributed data-parallel platform

Big Data Mining and Analytics ◽

10.26599/bdma.2018.9020006 ◽

2018 ◽

Vol 1 (1) ◽

pp. 57-74 ◽

Cited By ~ 11

Keyword(s):

Large Scale ◽

Topic Model ◽

Distributed Data ◽

Data Parallel ◽

Parallel Platform ◽

Model Training

Download Full-text

On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5793 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3817-3824

Author(s):

Aritra Dutta ◽

El Houcine Bergou ◽

Ahmed M. Abdelmoniem ◽

Chen-Yu Ho ◽

Atal Narayan Sahu ◽

...

Keyword(s):

Deep Learning ◽

Theoretical Analysis ◽

Deep Neural Networks ◽

Theory And Practice ◽

Distributed Data ◽

Communication Costs ◽

Data Parallel ◽

Model Compression ◽

Wide Range ◽

Learning Frameworks

Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy between theory and practice: while theoretical analysis of most existing compression methods assumes compression is applied to the gradients of the entire model, many practical implementations operate individually on the gradients of each layer of the model.In this paper, we prove that layer-wise compression is, in theory, better, because the convergence rate is upper bounded by that of entire-model compression for a wide range of biased and unbiased compression methods. However, despite the theoretical bound, our experimental study of six well-known methods shows that convergence, in practice, may or may not be better, depending on the actual trained model and compression ratio. Our findings suggest that it would be advantageous for deep learning frameworks to include support for both layer-wise and entire-model compression.

Download Full-text

Mary, Hugo, and Hugo*: Learning to schedule distributed data‐parallel processing jobs on shared clusters

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.5823 ◽

2020 ◽

Author(s):

Lauritz Thamsen ◽

Jossekin Beilharz ◽

Vinh Thuy Tran ◽

Sasho Nedelkoski ◽

Odej Kao

Keyword(s):

Parallel Processing ◽

Distributed Data ◽

Data Parallel

Download Full-text

An integral direct, distributed-data, parallel MP2 algorithm

Theoretica Chimica Acta ◽

10.1007/s002140050180 ◽

1997 ◽

Vol 95 (1) ◽

pp. 13 ◽

Cited By ~ 15

Author(s):

Martin Schütz ◽

Roland Lindh

Keyword(s):

Distributed Data ◽

Data Parallel

Download Full-text

Distributed Data Parallel Techniques for Content-Matching Intrusion Detection Systems

MILCOM 2007 - IEEE Military Communications Conference ◽

10.1109/milcom.2007.4454922 ◽

2007 ◽

Cited By ~ 7

Author(s):

Christopher V. Kopek ◽

Errin W. Fulp ◽

Patrick S. Wheeler

Keyword(s):

Intrusion Detection ◽

Intrusion Detection Systems ◽

Distributed Data ◽

Detection Systems ◽

Data Parallel ◽

Parallel Techniques

Download Full-text

Comparative Evaluation and Case Studies of Shared-Memory and Data-Parallel Execution Patterns

Scientific Programming ◽

10.1155/1999/468372 ◽

1999 ◽

Vol 7 (1) ◽

pp. 1-19

Author(s):

Xiaodong Zhang ◽

Lin Sun

Keyword(s):

Linear System ◽

Shared Memory ◽

Interconnection Networks ◽

Parallel Execution ◽

Parallel Model ◽

Scientific Applications ◽

Data Parallel ◽

High Level ◽

Access Patterns ◽

Structured Program

Shared‐memory and data‐parallel programming models are two important paradigms for scientific applications. Both models provide high‐level program abstractions, and simple and uniform views of network structures. The common features of the two models significantly simplify program coding and debugging for scientific applications. However, the underlining execution and overhead patterns are significantly different between the two models due to their programming constraints, and due to different and complex structures of interconnection networks and systems which support the two models. We performed this experimental study to present implications and comparisons of execution patterns on two commercial architectures. We implemented a standard electromagnetic simulation program (EM) and a linear system solver using the shared‐memory model on the KSR‐1 and the data‐parallel model on the CM‐5. Our objectives are to examine the execution pattern changes required for an implementation transformation between the two models; to study memory access patterns; to address scalability issues; and to investigate relative costs and advantages/disadvantages of using the two models for scientific computations. Our results indicate that the EM program tends to become computation‐intensive in the KSR‐1 shared‐memory system, and memory‐demanding in the CM‐5 data‐parallel system when the systems and the problems are scaled. The EM program, a highly data‐parallel program performed extremely well, and the linear system solver, a highly control‐structured program suffered significantly in the data‐parallel model on the CM‐5. Our study provides further evidence that matching execution patterns of algorithms to parallel architectures would achieve better performance.

Download Full-text

A branch-and-bound approach to scheduling of data-parallel tasks on multi-core architectures

International Journal of Embedded Systems ◽

10.1504/ijes.2020.10026901 ◽

2020 ◽

Vol 12 (1) ◽

pp. 125

Author(s):

Ittetsu Taniguchi ◽

Hiroyuki Tomiyama ◽

Lin Meng ◽

Yang Liu

Keyword(s):

Branch And Bound ◽

Data Parallel ◽

Parallel Tasks

Download Full-text