Retargeting sequential image-processing programs for data parallel execution

2005 ◽  
Vol 31 (2) ◽  
pp. 116-136 ◽  
Author(s):  
L.B. Baumstark ◽  
L.M. Wills
2014 ◽  
Vol 519-520 ◽  
pp. 719-723
Author(s):  
Guang Wang

A data parallel implementation of geometric operations is proposed and conclusions are proved. It shows that the computation complexity of data parallel implementation scheme presented in this paper is Ο(M+N). It can be used to improve the efficiency of geometric operations and can easily meet the real time requirements of the digital image processing.


1999 ◽  
Vol 7 (1) ◽  
pp. 1-19
Author(s):  
Xiaodong Zhang ◽  
Lin Sun

Shared‐memory and data‐parallel programming models are two important paradigms for scientific applications. Both models provide high‐level program abstractions, and simple and uniform views of network structures. The common features of the two models significantly simplify program coding and debugging for scientific applications. However, the underlining execution and overhead patterns are significantly different between the two models due to their programming constraints, and due to different and complex structures of interconnection networks and systems which support the two models. We performed this experimental study to present implications and comparisons of execution patterns on two commercial architectures. We implemented a standard electromagnetic simulation program (EM) and a linear system solver using the shared‐memory model on the KSR‐1 and the data‐parallel model on the CM‐5. Our objectives are to examine the execution pattern changes required for an implementation transformation between the two models; to study memory access patterns; to address scalability issues; and to investigate relative costs and advantages/disadvantages of using the two models for scientific computations. Our results indicate that the EM program tends to become computation‐intensive in the KSR‐1 shared‐memory system, and memory‐demanding in the CM‐5 data‐parallel system when the systems and the problems are scaled. The EM program, a highly data‐parallel program performed extremely well, and the linear system solver, a highly control‐structured program suffered significantly in the data‐parallel model on the CM‐5. Our study provides further evidence that matching execution patterns of algorithms to parallel architectures would achieve better performance.


2008 ◽  
Vol 18 (01) ◽  
pp. 23-37 ◽  
Author(s):  
CLEMENS GRELCK ◽  
STEFFEN KUTHE ◽  
SVEN-BODO SCHOLZ

We propose a novel execution model for the implicitly parallel execution of data parallel programs in the presence of general I/O operations. This model is called hybrid because it combines the advantages of the standard execution models fork/join and SPMD. Based on program analysis the hybrid model adapts itself to one or the other on the granularity of individual instructions. We outline compilation techniques that systematically derive the organization of parallel code from data flow characteristics aiming at the reduction of execution mode switches in general and synchronization/communication requirements in particular. Experiments based on a prototype implementation show the effectiveness of the hybrid execution model for reducing parallel overhead.


2017 ◽  
Vol 10 (13) ◽  
pp. 180
Author(s):  
Maheswari R ◽  
Pattabiraman V ◽  
Sharmila P

Objective: The prospective need of SIMD (Single Instruction and Multiple Data) applications like video and image processing in single system requires greater flexibility in computation to deliver high quality real time data. This paper performs an analysis of FPGA (Field Programmable Gate Array) based high performance Reconfigurable OpenRISC1200 (ROR) soft-core processor for SIMD.Methods: The ROR1200 ensures performance improvement by data level parallelism executing SIMD instruction simultaneously in HPRC (High Performance Reconfigurable Computing) at reduced resource utilization through RRF (Reconfigurable Register File) with multiple core functionalities. This work aims at analyzing the functionality of the reconfigurable architecture, by illustrating the implementation of two different image processing operations such as image convolution and image quality improvement. The MAC (Multiply-Accumulate) unit of ROR1200 used to perform image convolution and execution unit with HPRC is used for image quality improvement.Result: With parallel execution in multi-core, the proposed processor improves image quality by doubling the frame rate up-to 60 fps (frames per second) with peak power consumption of 400mWatt. Thus the processor gives a significant computational cost of 12ms with a refresh rate of 60Hz and 1.29ns of MAC critical path delay.Conclusion:This FPGA based processor becomes a feasible solution for portable embedded SIMD based applications which need high performance at reduced power consumptions


1996 ◽  
Vol 84 (7) ◽  
pp. 947-968 ◽  
Author(s):  
W.E. Alexander ◽  
D.S. Reeves ◽  
C.S. Gloster

Sign in / Sign up

Export Citation Format

Share Document