scholarly journals DryadOpt: Branch-and-Bound on Distributed Data-Parallel Execution Engines

Author(s):  
Mihai Budiu ◽  
Daniel Delling ◽  
Renato F. Werneck
2020 ◽  
Vol 34 (04) ◽  
pp. 3817-3824
Author(s):  
Aritra Dutta ◽  
El Houcine Bergou ◽  
Ahmed M. Abdelmoniem ◽  
Chen-Yu Ho ◽  
Atal Narayan Sahu ◽  
...  

Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy between theory and practice: while theoretical analysis of most existing compression methods assumes compression is applied to the gradients of the entire model, many practical implementations operate individually on the gradients of each layer of the model.In this paper, we prove that layer-wise compression is, in theory, better, because the convergence rate is upper bounded by that of entire-model compression for a wide range of biased and unbiased compression methods. However, despite the theoretical bound, our experimental study of six well-known methods shows that convergence, in practice, may or may not be better, depending on the actual trained model and compression ratio. Our findings suggest that it would be advantageous for deep learning frameworks to include support for both layer-wise and entire-model compression.


Author(s):  
Lauritz Thamsen ◽  
Jossekin Beilharz ◽  
Vinh Thuy Tran ◽  
Sasho Nedelkoski ◽  
Odej Kao

1997 ◽  
Vol 95 (1) ◽  
pp. 13 ◽  
Author(s):  
Martin Schütz ◽  
Roland Lindh

1999 ◽  
Vol 7 (1) ◽  
pp. 1-19
Author(s):  
Xiaodong Zhang ◽  
Lin Sun

Shared‐memory and data‐parallel programming models are two important paradigms for scientific applications. Both models provide high‐level program abstractions, and simple and uniform views of network structures. The common features of the two models significantly simplify program coding and debugging for scientific applications. However, the underlining execution and overhead patterns are significantly different between the two models due to their programming constraints, and due to different and complex structures of interconnection networks and systems which support the two models. We performed this experimental study to present implications and comparisons of execution patterns on two commercial architectures. We implemented a standard electromagnetic simulation program (EM) and a linear system solver using the shared‐memory model on the KSR‐1 and the data‐parallel model on the CM‐5. Our objectives are to examine the execution pattern changes required for an implementation transformation between the two models; to study memory access patterns; to address scalability issues; and to investigate relative costs and advantages/disadvantages of using the two models for scientific computations. Our results indicate that the EM program tends to become computation‐intensive in the KSR‐1 shared‐memory system, and memory‐demanding in the CM‐5 data‐parallel system when the systems and the problems are scaled. The EM program, a highly data‐parallel program performed extremely well, and the linear system solver, a highly control‐structured program suffered significantly in the data‐parallel model on the CM‐5. Our study provides further evidence that matching execution patterns of algorithms to parallel architectures would achieve better performance.


2020 ◽  
Vol 12 (1) ◽  
pp. 125
Author(s):  
Ittetsu Taniguchi ◽  
Hiroyuki Tomiyama ◽  
Lin Meng ◽  
Yang Liu

Sign in / Sign up

Export Citation Format

Share Document