VLIW-in-the-large: a model for fine grain parallelism exploitation on distributed memory multiprocessors

MICROTHREADING A MODEL FOR DISTRIBUTED INSTRUCTION-LEVEL CONCURRENCY

Parallel Processing Letters ◽

10.1142/s0129626406002587 ◽

2006 ◽

Vol 16 (02) ◽

pp. 209-228 ◽

Cited By ~ 12

Author(s):

CHRIS JESSHOPE

Keyword(s):

Distributed Memory ◽

Chip Multiprocessors ◽

Dynamic Scheduling ◽

Register File ◽

Support Structures ◽

Promising Candidate ◽

Data Parallel ◽

Fine Grain ◽

Distributed Instruction

This paper analyses the micro-threaded model of concurrency making comparisons with both data and instruction-level concurrency. The model is fine grain and provides synchronisation in a distributed register file, making it a promising candidate for scalable chip-multiprocessors. The micro-threaded model was first proposed in 1996 as a means to tolerate high latencies in data-parallel, distributed-memory multi-processors. This paper explores the model's opportunity to provide the simultaneous issue of instructions, required for chip multiprocessors, and discusses the issues of scalability with regard to support structures implementing the model and communication in supporting it. The model supports deterministic distribution of code fragments and dynamic scheduling of instructions from within those fragments. The hardware also recognises different classes of variables from the register specifiers, which allows the hardware to manage locality and optimise communication so that it is both efficient and scalable.

Download Full-text

Task migration and fine grain parallelism on distributed memory architectures

Lecture Notes in Computer Science - Parallel Computing Technologies ◽

10.1007/3-540-63371-5_24 ◽

1997 ◽

pp. 226-240 ◽

Cited By ~ 1

Author(s):

Yvon Jégou

Keyword(s):

Distributed Memory ◽

Task Migration ◽

Fine Grain ◽

Memory Architectures

Download Full-text

Performance analysis of enhanced fine-grain multithreaded distributed-memory systems

2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236) ◽

10.1109/icsmc.2001.973067 ◽

2002 ◽

Cited By ~ 2

Author(s):

W.M. Zuberek

Keyword(s):

Performance Analysis ◽

Distributed Memory ◽

Memory Systems ◽

Fine Grain

Download Full-text

A comparative performance study of a fine-grain multi-threading model on distributed memory machines

Conference Proceedings of the 2000 IEEE International Performance, Computing, and Communications Conference (Cat. No.00CH37086) ◽

10.1109/pccc.2000.830367 ◽

2002 ◽

Cited By ~ 1

Author(s):

P. Kakulavarapu ◽

C.J. Morrone ◽

K.B. Theobald ◽

J.N. Amaral ◽

G.R. Gao

Keyword(s):

Distributed Memory ◽

Performance Study ◽

Comparative Performance ◽

Distributed Memory Machines ◽

Fine Grain

Download Full-text

OpenMP Issues Arising in the Development of Parallel BLAS and LAPACK Libraries

Scientific Programming ◽

10.1155/2003/278167 ◽

2003 ◽

Vol 11 (2) ◽

pp. 95-104 ◽

Cited By ~ 2

Author(s):

C. Addison ◽

Y. Ren ◽

M. van Waveren

Keyword(s):

Shared Memory ◽

Linear Algebra ◽

Distributed Memory ◽

Parallel Computations ◽

Dense Linear Algebra ◽

Fine Grain ◽

Parallel Implementations ◽

Work Distribution ◽

Multiple Array ◽

Parallel Library

Dense linear algebra libraries need to cope efficiently with a range of input problem sizes and shapes. Inherently this means that parallel implementations have to exploit parallelism wherever it is present. While OpenMP allows relatively fine grain parallelism to be exploited in a shared memory environment it currently lacks features to make it easy to partition computation over multiple array indices or to overlap sequential and parallel computations. The inherent flexible nature of shared memory paradigms such as OpenMP poses other difficulties when it becomes necessary to optimise performance across successive parallel library calls. Notions borrowed from distributed memory paradigms, such as explicit data distributions help address some of these problems, but the focus on data rather than work distribution appears misplaced in an SMP context.

Download Full-text

DYNAMIC LOAD BALANCERS FOR A MULTITHREADED MULTIPROCESSOR SYSTEM

Parallel Processing Letters ◽

10.1142/s0129626401000506 ◽

2001 ◽

Vol 11 (01) ◽

pp. 169-184 ◽

Cited By ~ 3

Author(s):

PRASAD KAKULAVARAPU ◽

OLIVIER C. MAQUELIN ◽

JOSÉ NELSON AMARAL ◽

GUANG R. GAO

Keyword(s):

Load Balancing ◽

Dynamic Load ◽

Distributed Memory ◽

Divide And Conquer ◽

Performance Ratio ◽

Multiprocessor System ◽

Fine Grain ◽

Price Performance ◽

Execution Model ◽

Load Balancer

Designing multi-processor systems that deliver a reasonable price-performance ratio using off-the-shelf processor and compiler technologies is a major challenge. For an important class of applications, it is critical to explore fine-grain parallelism to achieve reasonable performance. In such parallel systems it is essential to efficiently manage communication latencies, bandwidth, and synchronization overheads. In this paper we study load balancing strategies for the runtime system of a multi-threaded system. EARTH (Efficient Architecture for Running Threads) is a multi-threaded programming and execution model that supports fine-grain, non-preemptive, threads in a distributed memory environment. We describe the design and implementation of a set of dynamic load balancing algorithms, and study their performance in divide-and-conquer, regular, and irregular applications. Our experimental study on the distributed memory multi-processor IBP SP-2 indicate that a randomized load balancer perform as well as, and often better than, history based load balancers.

Download Full-text

Reducing overhead in implementing fine-grain parallel data-structures of a dataflow language on off-the-shelf distributed-memory parallel computers

Proceedings of the Thirtieth Hawaii International Conference on System Sciences ◽

10.1109/hicss.1997.667262 ◽

2002 ◽

Author(s):

S. Kusakabe ◽

T. Nagai ◽

K. Inenaga ◽

M. Amamiya

Keyword(s):

Data Structures ◽

Distributed Memory ◽

Parallel Computers ◽

Fine Grain ◽

Parallel Data

Download Full-text

Address generation of dataflow fine-grain parallel data-structures on a distributed-memory computer

Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique ◽

10.1109/pact.1996.552658 ◽

2002 ◽

Author(s):

S. Kusakabe ◽

T. Nagai ◽

K. Inenaga ◽

M. Amamiya

Keyword(s):

Data Structures ◽

Distributed Memory ◽

Fine Grain ◽

Parallel Data ◽

Address Generation

Download Full-text

Aspects of microanalysis in a transmission electron microscope

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100109690 ◽

1978 ◽

Vol 36 (1) ◽

pp. 510-511

Author(s):

R. Sinclair ◽

B.E. Jacobson

Keyword(s):

Grain Size ◽

Electron Beam ◽

Electron Microscope ◽

Chemical Analysis ◽

Transmission Electron Microscope ◽

Vapor Deposition ◽

Critical Currents ◽

X Ray ◽

Fine Grain ◽

Transmission Electron

INTRODUCTIONThe prospect of performing chemical analysis of thin specimens at any desired level of resolution is particularly appealing to the materials scientist. Commercial TEM-based systems are now available which virtually provide this capability. The purpose of this contribution is to illustrate its application to problems which would have been intractable until recently, pointing out some current limitations.X-RAY ANALYSISIn an attempt to fabricate superconducting materials with high critical currents and temperature, thin Nb3Sn films have been prepared by electron beam vapor deposition [1]. Fine-grain size material is desirable which may be achieved by codeposition with small amounts of Al2O3 . Figure 1 shows the STEM microstructure, with large (∽ 200 Å dia) voids present at the grain boundaries. Higher quality TEM micrographs (e.g. fig. 2) reveal the presence of small voids within the grains which are absent in pure Nb3Sn prepared under identical conditions. The X-ray spectrum from large (∽ lμ dia) or small (∽100 Ǻ dia) areas within the grains indicates only small amounts of A1 (fig.3).

Download Full-text

Superior 2X2 Slides

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100114414 ◽

1975 ◽

Vol 33 ◽

pp. 158-159

Author(s):

Harry Schaefer ◽

Bruce Wetzel

Keyword(s):

New York ◽

High Resolution ◽

Black And White ◽

Fine Grain ◽

White Film

High resolution 24mm X 36mm positive transparencies can be made from original black and white negatives produced by SEM, TEM, and photomicrography with ease, convenience, and little expense. The resulting 2in X 2in slides are superior to 3¼in X 4in lantern slides for storage, transport, and sturdiness, and projection equipment is more readily available. By mating a 35mm camera directly to an enlarger lens board (Fig. 1), one combines many advantages of both. The negative is positioned and illuminated with the enlarger and then focussed and photographed with the camera on a fine grain black and white film.Specifically, a Durst Laborator 138 S 5in by 7in enlarger with 240/200 condensers and a 500 watt Opale bulb (Ehrenreich Photo-Optical Industries, Inc., New York, NY) is rotated to the horizontal and adjusted for comfortable eye level viewing.

Download Full-text