Throughput Regulation in Shared Memory Multicore Processors

Author(s):  
X. Chen ◽  
H. Xiao ◽  
Y. Wardi ◽  
S. Yalamanchili
2010 ◽  
Vol 39 ◽  
pp. 689-743 ◽  
Author(s):  
E. Burns ◽  
S. Lemons ◽  
W. Ruml ◽  
R. Zhou

To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we compare different approaches to parallel best-first search in a shared-memory setting. We present a new method, PBNF, that uses abstraction to partition the state space and to detect duplicate states without requiring frequent locking. PBNF allows speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions in our approach, proving its correctness using temporal logic. Our approach is general, allowing it to extend easily to suboptimal and anytime heuristic search. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using 8-core machines, we show that A*, weighted A* and Anytime weighted A* implemented using PBNF yield faster search than improved versions of previous parallel search proposals.


Author(s):  
Martin J. Chorley ◽  
David W. Walker ◽  
Martyn F. Guest

Hybrid programming, whereby shared-memory and message-passing programming techniques are combined within a single parallel application, has often been discussed as a method for increasing code performance on clusters of symmetric multiprocessors (SMPs). This paper examines whether the hybrid model brings any performance benefits for clusters based on multicore processors. A molecular dynamics application has been parallelized using both MPI and hybrid MPI/OpenMP programming models. The performance of this application has been examined on two high-end multicore clusters using both Infiniband and Gigabit Ethernet interconnects. The hybrid model has been found to perform well on the higher-latency Gigabit Ethernet connection, but offers no performance benefit on low-latency Infiniband interconnects. The changes in performance are attributed to the differing communication profiles of the hybrid and MPI codes.


2014 ◽  
Vol 22 (1) ◽  
pp. 21-35 ◽  
Author(s):  
Pascual Pérez-Carro ◽  
Francisco Grimaldo ◽  
Miguel Lozano ◽  
Juan M. Orduña

Multiagent platforms need to be evaluated focusing on the underlying computer architecture in order to allow developers to exploit the parallelism available in multicore processors. This paper presents the characterization of Jason, a well-known Java-based multiagent platform, when executed on distributed shared memory architectures. Since this kind of architecture is already present in current multicore processors, this should be the first step for the characterization of this platform on distributed systems. To this end, we propose the execution of a set of benchmarks recently proposed for evaluating multiagent platforms. The results obtained show that Jason can be used to program CPU-intensive multiagent applications without loosing the Java scalability over multicore processors. Though, Jason's performance for communication-intensive applications depends on the traffic pattern generated by the agents, the layout of the cores and the selected execution mode (i.e. synchronous or asynchronous).


2010 ◽  
Vol 2010 ◽  
pp. 1-22 ◽  
Author(s):  
Shoaib Akram ◽  
Alexandros Papakonstantinou ◽  
Rakesh Kumar ◽  
Deming Chen

Interconnection networks for multicore processors are traditionally designed to serve a diversity of workloads. However, different workloads or even different execution phases of the same workload may benefit from different interconnect configurations. In this paper, we first motivate the need for workload-adaptive interconnection networks. Subsequently, we describe an interconnection network framework based on reconfigurable switches for use in medium-scale (up to 32 cores) shared memory multicore processors. Our cost-effective reconfigurable interconnection network is implemented on a traditional shared bus interconnect with snoopy-based coherence, and it enables improved multicore performance. The proposed interconnect architecture distributes the cores of the processor into clusters with reconfigurable logic between clusters to support workload-adaptive policies for inter-cluster communication. Our interconnection scheme is complemented by interconnect-aware scheduling and additional interconnect optimizations which help boost the performance of multiprogramming and multithreaded workloads. We provide experimental results that show that the overall throughput of multiprogramming workloads (consisting of two and four programs) can be improved by up to 60% with our configurable bus architecture. Similar gains can be achieved also for multithreaded applications as shown by further experiments. Finally, we present the performance sensitivity of the proposed interconnect architecture on shared memory bandwidth availability.


2021 ◽  
Vol 13 (10) ◽  
pp. 1963
Author(s):  
Pasquale Imperatore ◽  
Eugenio Sansosti

Within the framework of multi-temporal Synthetic Aperture Radar (SAR) interferometric processing, image coregistration is a fundamental operation that might be extremely time-consuming. This paper explores the possibility of addressing fast and accurate SAR image geometric coregistration, with sub-pixel accuracy and in the presence of a complex 3-D object scene, by exploiting the parallelism offered by shared-memory architectures. An efficient and scalable processor is proposed by designing a parallel algorithm incorporating thread-level parallelism for solving the inherent computationally intensive problem. The adopted functional scheme is first mathematically framed and then investigated in detail in terms of its computational structures. Subsequently, a parallel version of the algorithm is designed, according to a fork-join model, by suitably taking into account the granularity of the decomposition, load-balancing, and different scheduling strategies. The developed parallel algorithm implements parallelism at the thread-level by using OpenMP (Open Multi-Processing) and it is specifically targeted at shared-memory multiprocessors. The parallel performance of the implemented multithreading-based SAR image coregistration prototype processor is experimentally investigated and quantitatively assessed by processing high-resolution X-band COSMO-SkyMed SAR data and using two different multicore architectures. The effectiveness of the developed multithreaded prototype solution in fully benefitting from the computing power offered by multicore processors has successfully been demonstrated via a suitable experimental performance analysis conducted in terms of parallel speedup and efficiency. The demonstrated scalable performance and portability of the developed parallel processor confirm its potential for operational use in the interferometric SAR data processing at large scales.


2017 ◽  
Vol 25 (7) ◽  
pp. 2095-2108 ◽  
Author(s):  
Chenchen Fu ◽  
Yingchao Zhao ◽  
Minming Li ◽  
Chun Jason Xue

2009 ◽  
Vol 28 (9) ◽  
pp. 2303-2305
Author(s):  
Xiao-gang WANG ◽  
Xiao-juan WU ◽  
Xin ZHOU ◽  
Xiao-yan ZHANG

1990 ◽  
Author(s):  
Yehunda Afek ◽  
Hagit Attiya ◽  
Danny Dolev ◽  
Eli Gafni ◽  
Michael Merritt
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document