A Model for Weak Scaling to Many GPUs at the Basis of the Linpack Benchmark

Author(s):  
David Rohr ◽  
Jan De Cuveland ◽  
Volker Lindenstruth
Keyword(s):  
2018 ◽  
Vol 228 ◽  
pp. 03008
Author(s):  
Xuehua Liu ◽  
Liping Ding ◽  
Yanfeng Li ◽  
Guangxuan Chen ◽  
Jin Du

Register pressure problem has been a known problem for compiler because of the mismatch between the infinite number of pseudo registers and the finite number of hard registers. Too heavy register pressure may results in register spilling and then leads to performance degradation. There are a lot of optimizations, especially loop optimizations suffer from register spilling in compiler. In order to fight register pressure and therefore improve the effectiveness of compiler, this research takes the register pressure into account to improve loop unrolling optimization during the transformation process. In addition, a register pressure aware transformation is able to reduce the performance overhead of some fine-grained randomization transformations which can be used to defend against ROP attacks. Experiments showed a peak improvement of about 3.6% and an average improvement of about 1% for SPEC CPU 2006 benchmarks and a peak improvement of about 3% and an average improvement of about 1% for the LINPACK benchmark.


2009 ◽  
Vol 17 (1-2) ◽  
pp. 43-57 ◽  
Author(s):  
Michael Kistler ◽  
John Gunnels ◽  
Daniel Brokenshire ◽  
Brad Benton

In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i1processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™2 architecture and contains a set of special-purpose processing cores known as Synergistic Processing Elements (SPEs). The SPEs can be used as computational accelerators to augment the main PowerPC processor. The added computational capability of the SPEs results in a peak double precision floating point capability of 108.8 GFLOPS. We explain how we modified the standard open source implementation of Linpack to accelerate key computational kernels using the SPEs of the PowerXCell 8i processors. We describe in detail the implementation and performance of the computational kernels and also explain how we employed the SPEs for high-speed data movement and reformatting. The result of these modifications is a Linpack benchmark optimized for the IBM PowerXCell 8i processor that achieves 170.7 GFLOPS on a BladeCenter QS22 with 32 GB of DDR2 SDRAM memory. Our implementation of Linpack also supports clusters of QS22s, and was used to achieve a result of 11.1 TFLOPS on a cluster of 84 QS22 blades. We compare our results on a single BladeCenter QS22 with the base Linpack implementation without SPE acceleration to illustrate the benefits of our optimizations.


Author(s):  
R. F. Barrett ◽  
T. H. F. Chan ◽  
E. F. D'Azevedo ◽  
E. F. Jaeger ◽  
K. Wong ◽  
...  

2011 ◽  
pp. 1033-1036 ◽  
Author(s):  
Jack Dongarra ◽  
Piotr Luszczek ◽  
Paul Feautrier ◽  
Field G. Zee ◽  
Ernie Chan ◽  
...  
Keyword(s):  

2009 ◽  
Vol 2009 ◽  
pp. 1-9
Author(s):  
Manuel Saldaña ◽  
Emanuel Ramalho ◽  
Paul Chow

High-performance reconfigurable computers (HPRCs) provide a mix of standard processors and FPGAs to collectively accelerate applications. This introduces new design challenges, such as the need for portable programming models across HPRCs and system-level verification tools. To address the need for cosimulating a complete heterogeneous application using both software and hardware in an HPRC, we have created a tool called the Message-passing Simulation Framework (MSF). We have used it to simulate and develop an interface enabling an MPI-based approach to exchange data between X86 processors and hardware engines inside FPGAs. The MSF can also be used as an application development tool that enables multiple FPGAs in simulation to exchange messages amongst themselves and with X86 processors. As an example, we simulate a LINPACK benchmark hardware core using an Intel-FSB-Xilinx-FPGA platform to quickly prototype the hardware, to test the communications. and to verify the benchmark results.


2003 ◽  
Vol 15 (9) ◽  
pp. 803-820 ◽  
Author(s):  
Jack J. Dongarra ◽  
Piotr Luszczek ◽  
Antoine Petitet
Keyword(s):  

2009 ◽  
Vol 53 (5) ◽  
pp. 9:1-9:11 ◽  
Author(s):  
M. Kistler ◽  
J. Gunnels ◽  
D. Brokenshire ◽  
B. Benton
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document