Throughput Regulation in Shared Memory Multicore Processors

Best-First Heuristic Search for Multicore Machines

Journal of Artificial Intelligence Research ◽

10.1613/jair.3094 ◽

2010 ◽

Vol 39 ◽

pp. 689-743 ◽

Cited By ~ 16

Author(s):

E. Burns ◽

S. Lemons ◽

W. Ruml ◽

R. Zhou

Keyword(s):

State Space ◽

Shared Memory ◽

Temporal Logic ◽

Heuristic Search ◽

Multicore Processors ◽

The State ◽

New Method ◽

Parallel Search ◽

Empirical Comparison ◽

Best First Search

To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we compare different approaches to parallel best-first search in a shared-memory setting. We present a new method, PBNF, that uses abstraction to partition the state space and to detect duplicate states without requiring frequent locking. PBNF allows speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions in our approach, proving its correctness using temporal logic. Our approach is general, allowing it to extend easily to suboptimal and anytime heuristic search. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using 8-core machines, we show that A*, weighted A* and Anytime weighted A* implemented using PBNF yield faster search than improved versions of previous parallel search proposals.

Download Full-text

Hybrid Message-Passing and Shared-Memory Programming in a Molecular Dynamics Application On Multicore Clusters

The International Journal of High Performance Computing Applications ◽

10.1177/1094342009106188 ◽

2009 ◽

Vol 23 (3) ◽

pp. 196-211 ◽

Cited By ~ 4

Author(s):

Martin J. Chorley ◽

David W. Walker ◽

Martyn F. Guest

Keyword(s):

Molecular Dynamics ◽

Shared Memory ◽

Hybrid Model ◽

Message Passing ◽

Multicore Processors ◽

Parallel Application ◽

Gigabit Ethernet ◽

Code Performance ◽

Programming Techniques ◽

Multicore Clusters

Hybrid programming, whereby shared-memory and message-passing programming techniques are combined within a single parallel application, has often been discussed as a method for increasing code performance on clusters of symmetric multiprocessors (SMPs). This paper examines whether the hybrid model brings any performance benefits for clusters based on multicore processors. A molecular dynamics application has been parallelized using both MPI and hybrid MPI/OpenMP programming models. The performance of this application has been examined on two high-end multicore clusters using both Infiniband and Gigabit Ethernet interconnects. The hybrid model has been found to perform well on the higher-latency Gigabit Ethernet connection, but offers no performance benefit on low-latency Infiniband interconnects. The changes in performance are attributed to the differing communication profiles of the hybrid and MPI codes.

Download Full-text

Characterization of the Jason Multiagent Platform on Multicore Processors

Scientific Programming ◽

10.1155/2014/576907 ◽

2014 ◽

Vol 22 (1) ◽

pp. 21-35 ◽

Cited By ~ 2

Author(s):

Pascual Pérez-Carro ◽

Francisco Grimaldo ◽

Miguel Lozano ◽

Juan M. Orduña

Keyword(s):

Distributed Systems ◽

Computer Architecture ◽

Shared Memory ◽

Multicore Processors ◽

Distributed Shared Memory ◽

Traffic Pattern ◽

Execution Mode ◽

Memory Architectures ◽

Shared Memory Architectures

Multiagent platforms need to be evaluated focusing on the underlying computer architecture in order to allow developers to exploit the parallelism available in multicore processors. This paper presents the characterization of Jason, a well-known Java-based multiagent platform, when executed on distributed shared memory architectures. Since this kind of architecture is already present in current multicore processors, this should be the first step for the characterization of this platform on distributed systems. To this end, we propose the execution of a set of benchmarks recently proposed for evaluating multiagent platforms. The results obtained show that Jason can be used to program CPU-intensive multiagent applications without loosing the Java scalability over multicore processors. Though, Jason's performance for communication-intensive applications depends on the traffic pattern generated by the agents, the layout of the cores and the selected execution mode (i.e. synchronous or asynchronous).

Download Full-text

A Workload-Adaptive and Reconfigurable Bus Architecture for Multicore Processors

International Journal of Reconfigurable Computing ◽

10.1155/2010/205852 ◽

2010 ◽

Vol 2010 ◽

pp. 1-22 ◽

Cited By ~ 5

Author(s):

Shoaib Akram ◽

Alexandros Papakonstantinou ◽

Rakesh Kumar ◽

Deming Chen

Keyword(s):

Shared Memory ◽

Interconnection Networks ◽

Interconnection Network ◽

Multicore Processors ◽

Scale Up ◽

Cost Effective ◽

Reconfigurable Logic ◽

Multithreaded Workloads ◽

Adaptive Policies ◽

Bus Architecture

Interconnection networks for multicore processors are traditionally designed to serve a diversity of workloads. However, different workloads or even different execution phases of the same workload may benefit from different interconnect configurations. In this paper, we first motivate the need for workload-adaptive interconnection networks. Subsequently, we describe an interconnection network framework based on reconfigurable switches for use in medium-scale (up to 32 cores) shared memory multicore processors. Our cost-effective reconfigurable interconnection network is implemented on a traditional shared bus interconnect with snoopy-based coherence, and it enables improved multicore performance. The proposed interconnect architecture distributes the cores of the processor into clusters with reconfigurable logic between clusters to support workload-adaptive policies for inter-cluster communication. Our interconnection scheme is complemented by interconnect-aware scheduling and additional interconnect optimizations which help boost the performance of multiprogramming and multithreaded workloads. We provide experimental results that show that the overall throughput of multiprogramming workloads (consisting of two and four programs) can be improved by up to 60% with our configurable bus architecture. Similar gains can be achieved also for multithreaded applications as shown by further experiments. Finally, we present the performance sensitivity of the proposed interconnect architecture on shared memory bandwidth availability.

Download Full-text

Workload adaptive shared memory multicore processors with reconfigurable interconnects

2009 IEEE 7th Symposium on Application Specific Processors ◽

10.1109/sasp.2009.5226329 ◽

2009 ◽

Cited By ~ 2

Author(s):

Shoaib Akram ◽

Rakesh Kumar ◽

Deming Chen

Keyword(s):

Shared Memory ◽

Multicore Processors

Download Full-text

Multithreading Based Parallel Processing for Image Geometric Coregistration in SAR Interferometry

Remote Sensing ◽

10.3390/rs13101963 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1963

Author(s):

Pasquale Imperatore ◽

Eugenio Sansosti

Keyword(s):

Parallel Algorithm ◽

Shared Memory ◽

Multicore Processors ◽

Sar Interferometry ◽

Sar Image ◽

Multicore Architectures ◽

Image Coregistration ◽

Functional Scheme ◽

Sar Data ◽

Computationally Intensive

Within the framework of multi-temporal Synthetic Aperture Radar (SAR) interferometric processing, image coregistration is a fundamental operation that might be extremely time-consuming. This paper explores the possibility of addressing fast and accurate SAR image geometric coregistration, with sub-pixel accuracy and in the presence of a complex 3-D object scene, by exploiting the parallelism offered by shared-memory architectures. An efficient and scalable processor is proposed by designing a parallel algorithm incorporating thread-level parallelism for solving the inherent computationally intensive problem. The adopted functional scheme is first mathematically framed and then investigated in detail in terms of its computational structures. Subsequently, a parallel version of the algorithm is designed, according to a fork-join model, by suitably taking into account the granularity of the decomposition, load-balancing, and different scheduling strategies. The developed parallel algorithm implements parallelism at the thread-level by using OpenMP (Open Multi-Processing) and it is specifically targeted at shared-memory multiprocessors. The parallel performance of the implemented multithreading-based SAR image coregistration prototype processor is experimentally investigated and quantitatively assessed by processing high-resolution X-band COSMO-SkyMed SAR data and using two different multicore architectures. The effectiveness of the developed multithreaded prototype solution in fully benefitting from the computing power offered by multicore processors has successfully been demonstrated via a suitable experimental performance analysis conducted in terms of parallel speedup and efficiency. The demonstrated scalable performance and portability of the developed parallel processor confirm its potential for operational use in the interferometric SAR data processing at large scales.

Download Full-text