Retargeting sequential image-processing programs for data parallel execution

A data parallel implementation of geometric operations is proposed and conclusions are proved. It shows that the computation complexity of data parallel implementation scheme presented in this paper is Ο(M+N). It can be used to improve the efficiency of geometric operations and can easily meet the real time requirements of the digital image processing.

Download Full-text

Comparative Evaluation and Case Studies of Shared-Memory and Data-Parallel Execution Patterns

Scientific Programming ◽

10.1155/1999/468372 ◽

1999 ◽

Vol 7 (1) ◽

pp. 1-19

Author(s):

Xiaodong Zhang ◽

Lin Sun

Keyword(s):

Linear System ◽

Shared Memory ◽

Interconnection Networks ◽

Parallel Execution ◽

Parallel Model ◽

Scientific Applications ◽

Data Parallel ◽

High Level ◽

Access Patterns ◽

Structured Program

Shared‐memory and data‐parallel programming models are two important paradigms for scientific applications. Both models provide high‐level program abstractions, and simple and uniform views of network structures. The common features of the two models significantly simplify program coding and debugging for scientific applications. However, the underlining execution and overhead patterns are significantly different between the two models due to their programming constraints, and due to different and complex structures of interconnection networks and systems which support the two models. We performed this experimental study to present implications and comparisons of execution patterns on two commercial architectures. We implemented a standard electromagnetic simulation program (EM) and a linear system solver using the shared‐memory model on the KSR‐1 and the data‐parallel model on the CM‐5. Our objectives are to examine the execution pattern changes required for an implementation transformation between the two models; to study memory access patterns; to address scalability issues; and to investigate relative costs and advantages/disadvantages of using the two models for scientific computations. Our results indicate that the EM program tends to become computation‐intensive in the KSR‐1 shared‐memory system, and memory‐demanding in the CM‐5 data‐parallel system when the systems and the problems are scaled. The EM program, a highly data‐parallel program performed extremely well, and the linear system solver, a highly control‐structured program suffered significantly in the data‐parallel model on the CM‐5. Our study provides further evidence that matching execution patterns of algorithms to parallel architectures would achieve better performance.

Download Full-text

Extracting an explicitly data-parallel representation of image-processing programs

10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings. ◽

10.1109/wcre.2003.1287234 ◽

2004 ◽

Cited By ~ 1

Author(s):

L. Baumstark ◽

M. Guler ◽

L. Wills

Keyword(s):

Image Processing ◽

Data Parallel

Download Full-text

A HYBRID SHARED MEMORY EXECUTION MODEL FOR A DATA PARALLEL LANGUAGE WITH I/O

Parallel Processing Letters ◽

10.1142/s012962640800320x ◽

2008 ◽

Vol 18 (01) ◽

pp. 23-37 ◽

Cited By ~ 1

Author(s):

CLEMENS GRELCK ◽

STEFFEN KUTHE ◽

SVEN-BODO SCHOLZ

Keyword(s):

Program Analysis ◽

Flow Characteristics ◽

Parallel Execution ◽

Parallel Language ◽

Data Parallel ◽

Execution Model ◽

Execution Mode ◽

Parallel Code ◽

Execution Models ◽

Compilation Techniques

We propose a novel execution model for the implicitly parallel execution of data parallel programs in the presence of general I/O operations. This model is called hybrid because it combines the advantages of the standard execution models fork/join and SPMD. Based on program analysis the hybrid model adapts itself to one or the other on the granularity of individual instructions. We outline compilation techniques that systematically derive the organization of parallel code from data flow characteristics aiming at the reduction of execution mode switches in general and synchronization/communication requirements in particular. Experiments based on a prototype implementation show the effectiveness of the hybrid execution model for reducing parallel overhead.

Download Full-text

A Framework for Distributed Data-Parallel Execution in the Kepler Scientific Workflow System

Procedia Computer Science ◽

10.1016/j.procs.2012.04.178 ◽

2012 ◽

Vol 9 ◽

pp. 1620-1629 ◽

Cited By ~ 13

Author(s):

Jianwu Wang ◽

Daniel Crawl ◽

Ilkay Altintas

Keyword(s):

Scientific Workflow ◽

Parallel Execution ◽

Distributed Data ◽

Data Parallel ◽

Workflow System

Download Full-text

Parallel Image Processing with the Block Data Parallel Architecture

IBM Journal of Research and Development ◽

10.1147/rd.445.0681 ◽

2000 ◽

Vol 44 (5) ◽

pp. 681-702 ◽

Cited By ~ 4

Author(s):

W. E. Alexander ◽

D. S. Reeves ◽

C. S. Gloster

Keyword(s):

Image Processing ◽

Parallel Architecture ◽

Data Parallel ◽

Parallel Image Processing ◽

Parallel Image ◽

Block Data

Download Full-text

RECONFIGURABLE FPGA BASED SOFT-CORE PROCESSOR FOR SIMD APPLICATIONS

Asian Journal of Pharmaceutical and Clinical Research ◽

10.22159/ajpcr.2017.v10s1.19632 ◽

2017 ◽

Vol 10 (13) ◽

pp. 180

Author(s):

Maheswari R ◽

Pattabiraman V ◽

Sharmila P

Keyword(s):

Image Processing ◽

Image Quality ◽

Reconfigurable Computing ◽

High Performance ◽

Critical Path ◽

Computational Cost ◽

Parallel Execution ◽

Soft Core ◽

Frame Rate ◽

Time Data

Objective: The prospective need of SIMD (Single Instruction and Multiple Data) applications like video and image processing in single system requires greater flexibility in computation to deliver high quality real time data. This paper performs an analysis of FPGA (Field Programmable Gate Array) based high performance Reconfigurable OpenRISC1200 (ROR) soft-core processor for SIMD.Methods: The ROR1200 ensures performance improvement by data level parallelism executing SIMD instruction simultaneously in HPRC (High Performance Reconfigurable Computing) at reduced resource utilization through RRF (Reconfigurable Register File) with multiple core functionalities. This work aims at analyzing the functionality of the reconfigurable architecture, by illustrating the implementation of two different image processing operations such as image convolution and image quality improvement. The MAC (Multiply-Accumulate) unit of ROR1200 used to perform image convolution and execution unit with HPRC is used for image quality improvement.Result: With parallel execution in multi-core, the proposed processor improves image quality by doubling the frame rate up-to 60 fps (frames per second) with peak power consumption of 400mWatt. Thus the processor gives a significant computational cost of 12ms with a refresh rate of 60Hz and 1.29ns of MAC critical path delay.Conclusion:This FPGA based processor becomes a feasible solution for portable embedded SIMD based applications which need high performance at reduced power consumptions

Download Full-text

Parallel image processing with the block data parallel architecture

Proceedings of the IEEE ◽

10.1109/5.503297 ◽

1996 ◽

Vol 84 (7) ◽

pp. 947-968 ◽

Cited By ~ 7

Author(s):

W.E. Alexander ◽

D.S. Reeves ◽

C.S. Gloster

Keyword(s):

Image Processing ◽

Parallel Architecture ◽

Data Parallel ◽

Parallel Image Processing ◽

Parallel Image ◽

Block Data

Download Full-text

What's your neural function, visual narrative conjunction? Grammar, meaning, and fluency in sequential image processing

Cognitive Research Principles and Implications ◽

10.1186/s41235-017-0064-5 ◽

2017 ◽

Vol 2 (1) ◽

Cited By ~ 2

Author(s):

Neil Cohn ◽

Marta Kutas

Keyword(s):

Image Processing ◽

Neural Function ◽

Visual Narrative ◽

Sequential Image

Download Full-text