An Overview on High Performance Issues of Parallel Architectures

In most distributed memory MIMD multiprocessors, processors are connected by a point-to-point interconnection network, usually modeled by a graph where processors are nodes and communication links are edges. Since interprocessor communication frequently constitutes serious bottlenecks, several architectures were proposed that enhance point-to-point topologies with the help of multiple bus systems so as to improve the communication efficiency. In this paper we study parallel architectures where the communication means are constituted solely by buses. These architectures can use the power of bus technologies, providing a way to interconnect much more processors in a simple and efficient manner. We present the hyperpath, hypergrid, hyperring, and hypertorus architectures, which are the bus-based versions of the well used point-to-point interconnection networks. Using (hyper) graph theoretic concepts to model inter-processor communication in such networks, we give optimal algorithms for broadcasting a message from one processor to all the others. For deriving high performance communication patterns we developed a new tool called simplification. The idea is to construct a graph, to be called representative graph, from the original hyper-topology, in such a way that it will become easy to describe and perform communication schemes to the former that will fit to the latter, because the simplification concept also allows us to partially use some already known communication algorithms for usual networks.

Download Full-text

High performance in tree-based parallel architectures

EUROMICRO 97 Proceedings of the 23rd EUROMICRO Conference New Frontiers of Information Technology (Cat No 97TB100167) EURMIC-97 ◽

10.1109/eurmic.1997.617358 ◽

2002 ◽

Author(s):

F. Ancona ◽

S. Rovetta ◽

R. Zumino

Keyword(s):

High Performance ◽

Parallel Architectures

Download Full-text

High performance domain decomposition methods on massively parallel architectures with freefem++

Journal of Numerical Mathematics ◽

10.1515/jnum-2012-0015 ◽

2012 ◽

Vol 20 (3-4) ◽

Cited By ~ 5

Author(s):

P. Jolivet ◽

V. Dolean ◽

F. Hecht ◽

F. Nataf ◽

C. Prud’Homme ◽

...

Keyword(s):

Domain Decomposition ◽

High Performance ◽

Decomposition Methods ◽

Parallel Architectures ◽

Massively Parallel ◽

Domain Decomposition Methods ◽

Massively Parallel Architectures

Download Full-text

Report on the workshop on design & performance issues in parallel architectures

ACM SIGMETRICS Performance Evaluation Review ◽

10.1145/25286.25287 ◽

1987 ◽

Vol 14 (3-4) ◽

pp. 16-32 ◽

Cited By ~ 1

Author(s):

Satish K Tripathi ◽

Steve Kaisler ◽

Sharat Chandran ◽

Ashok K Agrawala

Keyword(s):

Parallel Architectures ◽

Design Performance ◽

Performance Issues

Download Full-text

Deep-submicron Placement Minimizing Crosstalk

VLSI Design ◽

10.1155/2001/46394 ◽

2001 ◽

Vol 12 (1) ◽

pp. 1-12

Author(s):

Jun Dong Cho ◽

Jin Youn Cho

Keyword(s):

High Performance ◽

Optimization Technique ◽

Deep Submicron ◽

Multi Objective Optimization ◽

Crosstalk Noise ◽

Performance Constraints ◽

Placement Optimization ◽

Number Of Layers ◽

Physical Attributes ◽

Performance Issues

Placement of multiple dies on an MCM or high-performance VLSI substrate is a nontrivial task in which multiple criteria need to be considered simultaneously to obtain a true multi-objective optimization. Unfortunately, the exact physical attributes of a design are not known in the placement step until the entire design process is carried out. When the performance issues are considered, crosstalk noise constraints in the form of net separation and via constraint become important. In this paper, for better performance and wirability estimation during placement for MCMs, several performance constraints are taken into account simultaneously. A graph-based wirability estimation along with the Genetic placement optimization technique is proposed to minimize crosstalk, crossings, wirelength and the number of layers. Our work is significant since it is the first attempt at bringing the crosstalk and other performance issues into the placement domain.

Download Full-text

Design Space Exploration of High-Performance Parallel Architectures

Journal of Integrated Circuits and Systems ◽

10.29292/jics.v3i1.279 ◽

2008 ◽

Vol 3 (1) ◽

pp. 32-38

Author(s):

Enric Musoll ◽

Mario Nemirovsky

Keyword(s):

Power Efficiency ◽

High Performance ◽

Design Space Exploration ◽

Parallel Architecture ◽

Parallel Architectures ◽

Power Performance ◽

Power Budget ◽

Performance Goal ◽

Power Efficient ◽

On Chip

High-performance single-threaded processors achieve their performance goal partly by relying, among other architectural techniques, on speculation and large on-chip caches. The hardware to support these techniques is usually a large portion of the overall processor real state area, and therefore it consumes a significant amount of power that sometimes is not optimally used toward doing useful work. In this work, we study the intuitive fact that architectures with hardware support for threads are more power efficient than a more traditional single-threaded superscalar architecture. Toward this goal, we have created a model of the power, performance and area of several parallel architectures. This model shows that a parallel architecture can be designed so that (a) it requires less area and power (to reach the same performance), or (b) it achieves better power efficiency and less area (for the same power budget), or (c) it has higher performance and better power efficiency (for the same area constraint), when compared to a single-threaded superscalar architecture.

Download Full-text

Parallel Architectures for MEDLINE Search

Encyclopedia of Healthcare Information Systems ◽

10.4018/978-1-59904-889-5.ch130 ◽

2008 ◽

pp. 1048-1055

Author(s):

Rajendra V. Boppana ◽

Suresh Chalasani ◽

Bob Badgett ◽

Jacqueline A. Pugh

Keyword(s):

High Performance Computing ◽

High Performance ◽

Response Times ◽

Low Cost ◽

Parallel Architecture ◽

Fast Response ◽

Parallel Architectures ◽

Medline Search ◽

Medline Database ◽

Performance Computing

In this article, we describe a parallel architecture for MEDLINE database integrated with search refinement tools to facilitate accurate and fast response to search requests by users. The proposed architecture, to be developed by the authors, will use low-cost, high-performance computing clusters consisting of Linux based personal computers and workstations (i) to provide subsecond response times for individual searches and (ii) to support several concurrent queries from search refinement programs such as SUMSearch.

Download Full-text