scholarly journals Programming the Linpack Benchmark for the IBM PowerXCell 8i Processor

2009 ◽  
Vol 17 (1-2) ◽  
pp. 43-57 ◽  
Author(s):  
Michael Kistler ◽  
John Gunnels ◽  
Daniel Brokenshire ◽  
Brad Benton

In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i1processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™2 architecture and contains a set of special-purpose processing cores known as Synergistic Processing Elements (SPEs). The SPEs can be used as computational accelerators to augment the main PowerPC processor. The added computational capability of the SPEs results in a peak double precision floating point capability of 108.8 GFLOPS. We explain how we modified the standard open source implementation of Linpack to accelerate key computational kernels using the SPEs of the PowerXCell 8i processors. We describe in detail the implementation and performance of the computational kernels and also explain how we employed the SPEs for high-speed data movement and reformatting. The result of these modifications is a Linpack benchmark optimized for the IBM PowerXCell 8i processor that achieves 170.7 GFLOPS on a BladeCenter QS22 with 32 GB of DDR2 SDRAM memory. Our implementation of Linpack also supports clusters of QS22s, and was used to achieve a result of 11.1 TFLOPS on a cluster of 84 QS22 blades. We compare our results on a single BladeCenter QS22 with the base Linpack implementation without SPE acceleration to illustrate the benefits of our optimizations.

2018 ◽  
Vol 2018 ◽  
pp. 1-12
Author(s):  
Wenqi Chen ◽  
Hui Tian ◽  
Chin-Chen Chang ◽  
Fulin Nan ◽  
Jing Lu

Cloud storage, one of the core services of cloud computing, provides an effective way to solve the problems of storage and management caused by high-speed data growth. Thus, a growing number of organizations and individuals tend to store their data in the cloud. However, due to the separation of data ownership and management, it is difficult for users to check the integrity of data in the traditional way. Therefore, many researchers focus on developing several protocols, which can remotely check the integrity of data in the cloud. In this paper, we propose a novel public auditing protocol based on the adjacency-hash table, where dynamic auditing and data updating are more efficient than those of the state of the arts. Moreover, with such an authentication structure, computation and communication costs can be reduced effectively. The security analysis and performance evaluation based on comprehensive experiments demonstrate that our protocol can achieve all the desired properties and outperform the state-of-the-art ones in computing overheads for updating and verification.


Integrated-optics devices in lithium niobate have reached a significant maturity in recent years, and several complex devices have been demonstrated. In addition to performing modulation of light in fibre-optic transmission systems, lithium niobate devices currently offer the only components for photonic switching. Thus lithium niobate devices can be used as spatial, temporal and wavelength switches in high-speed and low-speed systems. In these systems electronic signals control the lithium niobate switches, which process the optical information and which are optically interfaced to optical fibres. Hence I am not concerned with all-optical switching. Examples of applications are multiplexing and demultiplexing of high-speed data streams, bit-by-bit or word-by-word switching in, for example, time-space-time stages or in access couplers in high-speed bus systems. Switch arrays, generally operating at lower speeds (below 1 GHz), can be used for network rearrangement, digital crossconnect, protection switching and generally in situations where the frequency and code transparency of the devices can be used to advantage. The status of lithium niobate devices for switching is reviewed, and performance limitations (including those imposed by polarization properties) and trade-offs are discussed, emphasizing time- and space-switching devices and applications.


Nanophotonics ◽  
2020 ◽  
Vol 9 (15) ◽  
pp. 4579-4588
Author(s):  
Chenghao Feng ◽  
Zhoufeng Ying ◽  
Zheng Zhao ◽  
Jiaqi Gu ◽  
David Z. Pan ◽  
...  

AbstractIntegrated photonics offers attractive solutions for realizing combinational logic for high-performance computing. The integrated photonic chips can be further optimized using multiplexing techniques such as wavelength-division multiplexing (WDM). In this paper, we propose a WDM-based electronic–photonic switching network (EPSN) to realize the functions of the binary decoder and the multiplexer, which are fundamental elements in microprocessors for data transportation and processing. We experimentally demonstrate its practicality by implementing a 3–8 (three inputs, eight outputs) switching network operating at 20 Gb/s. Detailed performance analysis and performance enhancement techniques are also given in this paper.


Pipelining is the concept of overlapping of multiple instructions to perform their operations to optimize the time and ability of hardware units. This paper presents the design and implementation of 6 stage pipelined architecture for High performance 64-bit Microprocessor without Interlocked Pipeline Stages (MIPS) based Reduced Instruction set computing (RISC) processor. In this work, combining efforts of pre-fetching unit, forwarding unit, Branch and Jump predicting unit, Hazard unit are used to reduce the hazards. Low power unit is used to minimize the power. Cache Memories, other devices and especially balancing pipeline stages optimize the Speed in this work. DDR4 SDRAM (Double Data Rate type4 Synchronous Dynamic Random Access Memory) controller is employed in this pipeline to achieve high-speed data transfers and to manage the entire system efficiently. Low power, Low delay Flip flops are used in pipeline registers that implicitly enhance the performance of the system. The proposed method provides better results compared to the existing models. The simulation and synthesis results of the proposed Architecture are evaluated by Xilinx 14.7 software and supporting graphs are plotted through MATLAB tool


Sign in / Sign up

Export Citation Format

Share Document