An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures

Alessandro Varsi; Simon Maskell; Paul G. Spirakis

doi:10.3390/a14120342

An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures

Algorithms ◽

10.3390/a14120342 ◽

2021 ◽

Vol 14 (12) ◽

pp. 342

Author(s):

Alessandro Varsi ◽

Simon Maskell ◽

Paul G. Spirakis

Keyword(s):

Parallel Computing ◽

Shared Memory ◽

Time Complexity ◽

Distributed Memory ◽

Particle Filters ◽

Dynamic Models ◽

State Of The Art ◽

Novel Approach ◽

Non Gaussian ◽

Memory Architectures

Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes O((log2N)2) computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in O(log2N) on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an O(log2N) time complexity. We also present empirical results that indicate that our novel approach outperforms the O((log2N)2) approach.

Download Full-text

Teaching tools for parallel processing

Facta universitatis - series Electronics and Energetics ◽

10.2298/fuee0502219m ◽

2005 ◽

Vol 18 (2) ◽

pp. 219-224

Author(s):

Emina Milovanovic ◽

Natalija Stojanovic

Keyword(s):

Parallel Computing ◽

Parallel Processing ◽

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Cost Effective ◽

Parallel Computers ◽

Free Software ◽

Teaching Tools ◽

Network Of Workstations

Because many universities lack the funds to purchase expensive parallel computers, cost effective alternatives are needed to teach students about parallel processing. Free software is available to support the three major paradigms of parallel computing. Parallaxis is a sophisticated SIMD simulator which runs on a variety of platforms.jBACI shared memory simulator supports the MIMD model of computing with a common shared memory. PVM and MPI allow students to treat a network of workstations as a message passing MIMD multicomputer with distributed memory. Each of this software tools can be used in a variety of courses to give students experience with parallel algorithms.

Download Full-text

Towards Structured Parallel Computing on Architecture-Independent Parallel Algorithm Design for Distributed-Memory Architectures

Journal of Computer and System Sciences ◽

10.1006/jcss.1996.0053 ◽

1996 ◽

Vol 53 (1) ◽

pp. 112-128

Author(s):

Feng Gao

Keyword(s):

Parallel Computing ◽

Parallel Algorithm ◽

Distributed Memory ◽

Algorithm Design ◽

Parallel Algorithm Design ◽

Memory Architectures

Download Full-text

Extending OpenMP for NUMA Machines

Scientific Programming ◽

10.1155/2000/464182 ◽

2000 ◽

Vol 8 (3) ◽

pp. 163-181 ◽

Cited By ~ 16

Author(s):

John Bircsak ◽

Peter Craig ◽

RaeLyn Crowell ◽

Zarka Cvetanovic ◽

Jonathan Harris ◽

...

Keyword(s):

Shared Memory ◽

High Performance ◽

Distributed Memory ◽

Parallel Programs ◽

Compiler Optimizations ◽

High Performance Fortran ◽

Efficient Code ◽

Memory Architectures ◽

Shared Memory Architectures ◽

Fast Access

This paper describes extensions to OpenMP that implement data placement features needed for NUMA architectures. OpenMP is a collection of compiler directives and library routines used to write portable parallel programs for shared-memory architectures. Writing efficient parallel programs for NUMA architectures, which have characteristics of both shared-memory and distributed-memory architectures, requires that a programmer control the placement of data in memory and the placement of computations that operate on that data. Optimal performance is obtained when computations occur on processors that have fast access to the data needed by those computations. OpenMP -- designed for shared-memory architectures -- does not by itself address these issues. The extensions to OpenMP Fortran presented here have been mainly taken from High Performance Fortran. The paper describes some of the techniques that the Compaq Fortran compiler uses to generate efficient code based on these extensions. It also describes some additional compiler optimizations, and concludes with some preliminary results.

Download Full-text

Parallel Array Classes and Lightweight Sharing Mechanisms

Scientific Programming ◽

10.1155/1993/393409 ◽

1993 ◽

Vol 2 (4) ◽

pp. 203-216

Author(s):

Steve W. Otto

Keyword(s):

Finite Element Method ◽

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Programming Model ◽

Memory Usage ◽

Particle In Cell ◽

Parallel Array ◽

Memory Architectures ◽

Shared Memory Architectures

We discuss a set of parallel array classes, MetaMP, for distributed-memory architectures. The classes are implemented in C++ and interface to the PVM or Intel NX message-passing systems. An array class implements a partitioned array as a set of objects distributed across the nodes – a "collective" object. Object methods hide the low-level message-passing and implement meaningful array operations. These include transparent guard strips (or sharing regions) that support finite-difference stencils, reductions and multibroadcasts for support of pivoting and row operations, and interpolation/contraction operations for support of multigrid algorithms. The concept of guard strips is generalized to an object implementation of lightweight sharing mechanisms for finite element method (FEM) and particle-in-cell (PIC) algorithms. The sharing is accomplished through the mechanism of weak memory coherence and can be efficiently implemented. The price of the efficient implementation is memory usage and the need to explicitly specify the coherence operations. An intriguing feature of this programming model is that it maps well to both distributed-memory and shared-memory architectures.

Download Full-text

Parallel simulation

10.1093/oso/9780198803195.003.0007 ◽

2017 ◽

Author(s):

Michael P. Allen ◽

Dominic J. Tildesley

Keyword(s):

Shared Memory ◽

Message Passing ◽

High Performance ◽

Distributed Memory ◽

Nested Loops ◽

Code Domain ◽

Basic Approaches ◽

Effective Use ◽

Memory Architectures ◽

Performance Computing

Parallelization is essential for the effective use of modern high-performance computing facilities. This chapter summarizes some of the basic approaches that are commonly used in molecular simulation programs. The underlying shared-memory and distributed-memory architectures are explained. The concept of program threads and their use in parallelizing nested loops on a shared memory machine is described. Parallel tempering using message passing on a distributed memory machine is discussed and illustrated with an example code. Domain decomposition, and the implementation of constraints on parallel computers, are also explained.

Download Full-text

A study of performance on SMP and distributed memory architectures using a shared memory programming model

Proceedings of the second ACM workshop on Role-based access control - RBAC '97 ◽

10.1145/509593.509637 ◽

1997 ◽

Cited By ~ 2

Author(s):

Eugene D. Brooks ◽

Karen H. Warren

Keyword(s):

Shared Memory ◽

Distributed Memory ◽

Programming Model ◽

Memory Architectures

Download Full-text

A novel approach to the real-time modelling of large urban drainage systems

Water Science & Technology ◽

10.2166/wst.1997.0638 ◽

1997 ◽

Vol 36 (8-9) ◽

pp. 19-24 ◽

Cited By ~ 6

Author(s):

Richard Norreys ◽

Ian Cluckie

Keyword(s):

Real Time ◽

Dynamic Models ◽

Drainage System ◽

Radar Data ◽

Urban Drainage ◽

Real Time Control ◽

Empirical Modelling ◽

Time Control ◽

Novel Approach ◽

Non Linear Systems

Conventional UDS models are mechanistic which though appropriate for design purposes are less well suited to real-time control because they are slow running, difficult to calibrate, difficult to re-calibrate in real time and have trouble handling noisy data. At Salford University a novel hybrid of dynamic and empirical modelling has been developed, to combine the speed of the empirical model with the ability to simulate complex and non-linear systems of the mechanistic/dynamic models. This paper details the ‘knowledge acquisition module’ software and how it has been applied to construct a model of a large urban drainage system. The paper goes on to detail how the model has been linked with real-time radar data inputs from the MARS c-band radar.

Download Full-text

A Formal Definition of Logic Topology for All-to-One Reduces in Distributed Memory Parallel Computing

2009 International Conference on Intelligent Human-Machine Systems and Cybernetics ◽

10.1109/ihmsc.2009.241 ◽

2009 ◽

Cited By ~ 1

Author(s):

Yuqing Xiong

Keyword(s):

Parallel Computing ◽

Distributed Memory ◽

Formal Definition ◽

Logic Topology ◽

Definition Of

Download Full-text

Deep Learning for Transient Image Reconstruction from ToF Data

Sensors ◽

10.3390/s21061962 ◽

2021 ◽

Vol 21 (6) ◽

pp. 1962

Author(s):

Enrico Buratto ◽

Adriano Simonetto ◽

Gianluca Agresti ◽

Henrik Schäfer ◽

Pietro Zanuttigh

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Light Response ◽

Real Data ◽

Depth Image ◽

Learning Approach ◽

Multiple Reflections ◽

Noisy Input ◽

Novel Approach ◽

Incoming Light

In this work, we propose a novel approach for correcting multi-path interference (MPI) in Time-of-Flight (ToF) cameras by estimating the direct and global components of the incoming light. MPI is an error source linked to the multiple reflections of light inside a scene; each sensor pixel receives information coming from different light paths which generally leads to an overestimation of the depth. We introduce a novel deep learning approach, which estimates the structure of the time-dependent scene impulse response and from it recovers a depth image with a reduced amount of MPI. The model consists of two main blocks: a predictive model that learns a compact encoded representation of the backscattering vector from the noisy input data and a fixed backscattering model which translates the encoded representation into the high dimensional light response. Experimental results on real data show the effectiveness of the proposed approach, which reaches state-of-the-art performances.

Download Full-text

Implementing actor-based primitives on distributed-memory architectures

ACM SIGPLAN OOPS Messenger ◽

10.1145/127070.127078 ◽

1991 ◽

Vol 2 (2) ◽

pp. 45-49 ◽

Cited By ~ 1

Author(s):

Michele Di Santo ◽

Giulio Iannello

Keyword(s):

Distributed Memory ◽

Memory Architectures

Download Full-text