Improving Fine-Grained Irregular Shared-Memory Benchmarks by Data Reordering

Support for Fine-Grained Synchronization in Shared-Memory Multiprocessors

Lecture Notes in Computer Science - Parallel Computing Technologies ◽

10.1007/978-3-540-73940-1_45 ◽

2007 ◽

pp. 453-467 ◽

Cited By ~ 3

Author(s):

Vladimir Vlassov ◽

Oscar Sierra Merino ◽

Csaba Andras Moritz ◽

Konstantin Popov

Keyword(s):

Shared Memory ◽

Shared Memory Multiprocessors ◽

Fine Grained

Download Full-text

Integrating Fine-Grained Message Passing in Cache Coherent Shared Memory Multiprocessors

Journal of Parallel and Distributed Computing ◽

10.1006/jpdc.1996.0036 ◽

1996 ◽

Vol 33 (2) ◽

pp. 172-188 ◽

Cited By ~ 5

Author(s):

David K. Poulsen ◽

Pen-Chung Yew

Keyword(s):

Shared Memory ◽

Message Passing ◽

Shared Memory Multiprocessors ◽

Fine Grained

Download Full-text

Enabling Fine-Grained OpenMP Tasking on Tightly-Coupled Shared Memory Clusters

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013 ◽

10.7873/date.2013.306 ◽

2013 ◽

Cited By ~ 6

Author(s):

Paolo Burgio ◽

Giuseppe Tagliavini ◽

Andrea Marongiu ◽

Luca Benini

Keyword(s):

Shared Memory ◽

Fine Grained ◽

Tightly Coupled

Download Full-text

Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers

2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) ◽

10.1109/isca52012.2021.00020 ◽

2021 ◽

Author(s):

Harini Muthukrishnan ◽

David Nellans ◽

Daniel Lustig ◽

Jeffrey A. Fessler ◽

Thomas F. Wenisch

Keyword(s):

Shared Memory ◽

Fine Grained ◽

Automatic Optimization

Download Full-text

The semantics of shared memory in Intel CPU/FPGA systems

Proceedings of the ACM on Programming Languages ◽

10.1145/3485497 ◽

2021 ◽

Vol 5 (OOPSLA) ◽

pp. 1-28

Author(s):

Dan Iorga ◽

Alastair F. Donaldson ◽

Tyler Sorensen ◽

John Wickerson

Keyword(s):

Shared Memory ◽

Bounded Model Checking ◽

Multiple Channels ◽

Modelling Language ◽

Operational Model ◽

Fine Grained ◽

Litmus Test ◽

Prohibitive Cost ◽

Common Memory ◽

The Cost

Heterogeneous CPU/FPGA devices, in which a CPU and an FPGA can execute together while sharing memory, are becoming popular in several computing sectors. In this paper, we study the shared-memory semantics of these devices, with a view to providing a firm foundation for reasoning about the programs that run on them. Our focus is on Intel platforms that combine an Intel FPGA with a multicore Xeon CPU. We describe the weak-memory behaviours that are allowed (and observable) on these devices when CPU threads and an FPGA thread access common memory locations in a fine-grained manner through multiple channels. Some of these behaviours are familiar from well-studied CPU and GPU concurrency; others are weaker still. We encode these behaviours in two formal memory models: one operational, one axiomatic. We develop executable implementations of both models, using the CBMC bounded model-checking tool for our operational model and the Alloy modelling language for our axiomatic model. Using these, we cross-check our models against each other via a translator that converts Alloy-generated executions into queries for the CBMC model. We also validate our models against actual hardware by translating 583 Alloy-generated executions into litmus tests that we run on CPU/FPGA devices; when doing this, we avoid the prohibitive cost of synthesising a hardware design per litmus test by creating our own 'litmus-test processor' in hardware. We expect that our models will be useful for low-level programmers, compiler writers, and designers of analysis tools. Indeed, as a demonstration of the utility of our work, we use our operational model to reason about a producer/consumer buffer implemented across the CPU and the FPGA. When the buffer uses insufficient synchronisation -- a situation that our model is able to detect -- we observe that its performance improves at the cost of occasional data corruption.

Download Full-text

Fine-grained Parallel Ant Colony System for Shared-Memory Architectures

International Journal of Computer Applications ◽

10.5120/8439-2223 ◽

2012 ◽

Vol 53 (8) ◽

pp. 8-13 ◽

Cited By ~ 2

Author(s):

Ali Hadian ◽

Saeed Shahrivari ◽

Behrouz Minaei-Bidgoli

Keyword(s):

Shared Memory ◽

Ant Colony ◽

Ant Colony System ◽

Fine Grained ◽

Memory Architectures ◽

Shared Memory Architectures

Download Full-text

LOCALITY-PRESERVING LOAD-BALANCING MECHANISMS FOR SYNCHRONOUS SIMULATIONS ON SHARED-MEMORY MULTIPROCESSORS

Parallel Processing Letters ◽

10.1142/s0129626400000123 ◽

2000 ◽

Vol 10 (01) ◽

pp. 111-132 ◽

Cited By ~ 3

Author(s):

VOON-YEE VEE ◽

WEN-JING HSU

Keyword(s):

Load Balancing ◽

Shared Memory ◽

Dynamic Load ◽

Simulation Models ◽

Dynamic Load Balancing ◽

Shared Memory Multiprocessors ◽

Actual Performance ◽

Fine Grained ◽

Locality Preserving ◽

The Given

In the past decade, many synchronous algorithms have been proposed for parallel and discrete simulations. However, the actual performance of these algorithms have been far from ideal, especially when event granularity is small. Barring the case of low parallelism in the given simulation models, one of the main reasons of low speedups is in the uneven load distribution among processors. To amend for this, both static and dynamic load balancing approaches have been proposed. Nevertheless, static schemes based on partitioning of LPs are often subject to the dynamic behavior of the specific simulation models and are therefore application dependent; dynamic load balancing schemes, on the other hand, often suffer from loss of localities and hence cache misses, which could severely penalize on fine-grained event processing. In this paper, we present several new locality-preserving load balancing mechanisms for synchronous simulations on shared-memory multiprocessors. We focus on the type of synchronous simulations where the number of LPs to be processed within a cycle decreases monotonically. We show both theoretically and empirically that some of these mechanisms incur very low overhead. The mechanisms have been implemented by using MIT's Cilk and tested with a number of simulation applications. The results confirm that one of the new mechanisms is indeed more efficient and scalable than common existing approaches.

Download Full-text

Rely-guarantee bound analysis of parameterized concurrent shared-memory programs

Formal Methods in System Design ◽

10.1007/s10703-021-00370-8 ◽

2021 ◽

Author(s):

Thomas Pani ◽

Georg Weissenbacher ◽

Florian Zuleger

Keyword(s):

Shared Memory ◽

Time Complexity ◽

Concurrent Programs ◽

Automate Reasoning ◽

Fine Grained ◽

Free Data ◽

Concurrent Algorithms ◽

Resource Bound ◽

First Time ◽

Bound Analysis

AbstractWe present a thread-modular proof method for complexity and resource bound analysis of concurrent, shared-memory programs. To this end, we lift Jones’ rely-guarantee reasoning to assumptions and commitments capable of expressing bounds. The compositionality (thread-modularity) of this framework allows us to reason about parameterized programs, i.e., programs that execute arbitrarily many concurrent threads. We automate reasoning in our logic by reducing bound analysis of concurrent programs to the sequential case. As an application, we automatically infer time complexity for a family of fine-grained concurrent algorithms, lock-free data structures, to our knowledge for the first time.

Download Full-text

Application of an image management system in microscopy

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100129899 ◽

1992 ◽

Vol 50 (2) ◽

pp. 1052-1053

Author(s):

Richard S. Chemock

Keyword(s):

Data Storage ◽

Cost Savings ◽

Image Data ◽

Analytical Techniques ◽

Operational Mode ◽

Fine Grained ◽

Image Recording ◽

Primary Output ◽

Capacity Data ◽

Image Management System

One of the most common tasks in a typical analysis lab is the recording of images. Many analytical techniques (TEM, SEM, and metallography for example) produce images as their primary output. Until recently, the most common method of recording images was by using film. Current PS/2R systems offer very large capacity data storage devices and high resolution displays, making it practical to work with analytical images on PS/2s, thereby sidestepping the traditional film and darkroom steps. This change in operational mode offers many benefits: cost savings, throughput, archiving and searching capabilities as well as direct incorporation of the image data into reports.The conventional way to record images involves film, either sheet film (with its associated wet chemistry) for TEM or PolaroidR film for SEM and light microscopy. Although film is inconvenient, it does have the highest quality of all available image recording techniques. The fine grained film used for TEM has a resolution that would exceed a 4096x4096x16 bit digital image.

Download Full-text

Use of fractals to describe microstructures of porous thick-film platinum electrodes

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100121971 ◽

1992 ◽

Vol 50 (1) ◽

pp. 314-315

Author(s):

Steven D. Toteda

Keyword(s):

Fractal Dimension ◽

Power Plants ◽

Electrical Response ◽

Cubic Phase ◽

Platinum Electrodes ◽

Fine Grained ◽

Impedance Response ◽

Platinum Particles ◽

Energy Surfaces ◽

Highly Porous

Zirconia oxygen sensors, in such applications as power plants and automobiles, generally utilize platinum electrodes for the catalytic reaction of dissociating O2 at the surface. The microstructure of the platinum electrode defines the resulting electrical response. The electrode must be porous enough to allow the oxygen to reach the zirconia surface while still remaining electrically continuous. At low sintering temperatures, the platinum is highly porous and fine grained. The platinum particles sinter together as the firing temperatures are increased. As the sintering temperatures are raised even further, the surface of the platinum begins to facet with lower energy surfaces. These microstructural changes can be seen in Figures 1 and 2, but the goal of the work is to characterize the microstructure by its fractal dimension and then relate the fractal dimension to the electrical response. The sensors were fabricated from zirconia powder stabilized in the cubic phase with 8 mol% percent yttria. Each substrate was sintered for 14 hours at 1200°C. The resulting zirconia pellets, 13mm in diameter and 2mm in thickness, were roughly 97 to 98 percent of theoretical density. The Engelhard #6082 platinum paste was applied to the zirconia disks after they were mechanically polished ( diamond). The electrodes were then sintered at temperatures ranging from 600°C to 1000°C. Each sensor was tested to determine the impedance response from 1Hz to 5,000Hz. These frequencies correspond to the electrode at the test temperature of 600°C.

Download Full-text