Scalable group-based checkpoint/restart for large-scale message-passing systems

Hybrid Graph Neural Networks for Crowd Counting

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6839 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11693-11700 ◽

Cited By ~ 2

Author(s):

Ao Luo ◽

Fan Yang ◽

Xin Li ◽

Dong Nie ◽

Zhicheng Jiao ◽

...

Keyword(s):

Network Architecture ◽

Message Passing ◽

Large Scale ◽

State Of The Art ◽

Density Variation ◽

Feature Maps ◽

Crowd Counting ◽

Multi Scale ◽

Crowd Density ◽

Graph Neural Networks

Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to jointly represent the task-specific feature maps of different scales as nodes, and two types of relations as edges: (i) multi-scale relations capturing the feature dependencies across scales and (ii) mutual beneficial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can capture and distill richer relations between nodes to obtain more powerful representations, providing robust and accurate results. Our HyGnn performs significantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming the state-of-the-art algorithms by a large margin.

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text

Automatic Vertebra Labeling in Large-Scale Medical Images Using Deep Image-to-Image Network with Message Passing and Sparsity Regularization

Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics - Advances in Computer Vision and Pattern Recognition ◽

10.1007/978-3-030-13969-8_9 ◽

2019 ◽

pp. 179-197

Author(s):

Dong Yang ◽

Tao Xiong ◽

Daguang Xu

Keyword(s):

Message Passing ◽

Large Scale ◽

Medical Images ◽

Sparsity Regularization ◽

Deep Image ◽

Image Network

Download Full-text

Parallel Object-Oriented Computation Applied to a Finite Element Problem

Scientific Programming ◽

10.1155/1993/859092 ◽

1993 ◽

Vol 2 (4) ◽

pp. 133-144 ◽

Cited By ~ 2

Author(s):

Jon B. Weissman ◽

Andrew S. Grimshaw ◽

R.D. Ferraro

Keyword(s):

Finite Element ◽

Message Passing ◽

Large Scale ◽

Processing System ◽

Object Oriented ◽

Data Parallel ◽

Programming Tools ◽

Comparable Performance ◽

Oriented Parallel ◽

Performance Results

The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.

Download Full-text

ABySS 2.0: Resource-Efficient Assembly of Large Genomes using a Bloom Filter

10.1101/068338 ◽

2016 ◽

Cited By ~ 4

Author(s):

Shaun D Jackman ◽

Benjamin P Vandervalk ◽

Hamid Mohamadi ◽

Justin Chu ◽

Sarah Yeo ◽

...

Keyword(s):

Human Genome ◽

Dna Sequences ◽

Message Passing ◽

Large Scale ◽

De Novo ◽

Bloom Filter ◽

Genomic Variation ◽

De Bruijn Graph ◽

Single Individual ◽

Probabilistic Data Structure

AbstractThe assembly of DNA sequences de novo is fundamental to genomics research. It is the first of many steps towards elucidating and characterizing whole genomes. Downstream applications, including analysis of genomic variation between species, between or within individuals critically depends on robustly assembled sequences. In the span of a single decade, the sequence throughput of leading DNA sequencing instruments has increased drastically, and coupled with established and planned large-scale, personalized medicine initiatives to sequence genomes in the thousands and even millions, the development of efficient, scalable and accurate bioinformatics tools for producing high-quality reference draft genomes is timely.With ABySS 1.0, we originally showed that assembling the human genome using short 50 bp sequencing reads was possible by aggregating the half terabyte of compute memory needed over several computers using a standardized message-passing system (MPI). We present here its re-design, which departs from MPI and instead implements algorithms that employ a Bloom filter, a probabilistic data structure, to represent a de Bruijn graph and reduce memory requirements.We present assembly benchmarks of human Genome in a Bottle 250 bp Illumina paired-end and 6 kbp mate-pair libraries from a single individual, yielding a NG50 (NGA50) scaffold contiguity of 3.5 (3.0) Mbp using less than 35 GB of RAM, a modest memory requirement by today’s standard that is often available on a single computer. We also investigate the use of BioNano Genomics and 10x Genomics’ Chromium data to further improve the scaffold contiguity of this assembly to 42 (15) Mbp.

Download Full-text

Parallelization of the Lattice Boltzmann Method in Simulating Buoyancy-Driven Convection Heat Transfer

Heat Transfer, Volume 2 ◽

10.1115/imece2004-61871 ◽

2004 ◽

Author(s):

Anoosheh Niavarani-Kheirier ◽

Masoud Darbandi ◽

Gerry E. Schneider

Keyword(s):

Lattice Boltzmann Method ◽

Lattice Boltzmann ◽

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Parallel Machines ◽

Convection Heat Transfer ◽

Wide Range ◽

Buoyancy Driven Convection ◽

Boltzmann Method

The main objective of the current work is to utilize Lattice Boltzmann Method (LBM) for simulating buoyancy-driven flow considering the hybrid thermal lattice Boltzmann equation (HTLBE). After deriving the required formulations, they are validated against a wide range of Rayleigh numbers in buoyancy-driven square cavity problem. The performance of the method is investigated on parallel machines using Message Passing Interface (MPI) library and implementing domain decomposition technique to solve problems with large order of computations. The achieved results show that the code is highly efficient to solve large scale problems with excellent speedup.

Download Full-text

Interpretive MPI for Parallel Computing

Volume 3: 28th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2008-49996 ◽

2008 ◽

Author(s):

Yu-Cheng Chou ◽

Harry H. Cheng

Keyword(s):

Parallel Computing ◽

Programming Languages ◽

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Rapid Development ◽

Web Based ◽

Heterogeneous Platforms ◽

C Programs ◽

Computation Speedup

Message Passing Interface (MPI) is a standardized library specification designed for message-passing parallel programming on large-scale distributed systems. A number of MPI libraries have been implemented to allow users to develop portable programs using the scientific programming languages, Fortran, C and C++. Ch is an embeddable C/C++ interpreter that provides an interpretive environment for C/C++ based scripts and programs. Combining Ch with any MPI C/C++ library provides the functionality for rapid development of MPI C/C++ programs without compilation. In this article, the method of interfacing Ch scripts with MPI C implementations is introduced by using the MPICH2 C library as an example. The MPICH2-based Ch MPI package provides users with the ability to interpretively run MPI C program based on the MPICH2 C library. Running MPI programs through the MPICH2-based Ch MPI package across heterogeneous platforms consisting of Linux and Windows machines is illustrated. Comparisons for the bandwidth, latency, and parallel computation speedup between C MPI, Ch MPI, and MPI for Python in an Ethernet-based environment comprising identical Linux machines are presented. A Web-based example is given to demonstrate the use of Ch and MPICH2 in C based CGI scripting to facilitate the development of Web-based applications for parallel computing.

Download Full-text

Comparing Message Passing Interface and MapReduce for large-scale parallel ranking and selection

2015 Winter Simulation Conference (WSC) ◽

10.1109/wsc.2015.7408542 ◽

2015 ◽

Cited By ~ 2

Author(s):

Eric C. Ni ◽

Dragos F. Ciocan ◽

Shane G. Henderson ◽

Susan R. Hunter

Keyword(s):

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Ranking And Selection

Download Full-text

Analysis of the Calculation of a Plasma Sheath Using the Parallel SO-DGTD Method

International Journal of Antennas and Propagation ◽

10.1155/2019/7160913 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9

Author(s):

Qian Yang ◽

Bing Wei ◽

Linqian Li ◽

Debiao Ge

Keyword(s):

Discontinuous Galerkin ◽

Cross Section ◽

Time Domain ◽

Message Passing ◽

High Speed ◽

Large Scale ◽

Message Passing Interface ◽

Shift Operator ◽

Plasma Sheath ◽

Blunt Cone

The plasma sheath is known as a popular topic of computational electromagnetics, and the plasma case is more resource-intensive than the non-plasma case. In this paper, a parallel shift-operator discontinuous Galerkin time-domain method using the MPI (Message Passing Interface) library is proposed to solve the large-scale plasma problems. To demonstrate our algorithm, a plasma sheath model of the high-speed blunt cone was established based on the results of the multiphysics software, and our algorithm was used to extract the radar cross-section (RCS) versus different incident angles of the model.

Download Full-text

Model Order Reduction of Large-Scale Finite Element Systems in an MPI Parallelized Environment for Usage in Multibody Simulation

Archive of Mechanical Engineering ◽

10.1515/meceng-2016-0027 ◽

2016 ◽

Vol 63 (4) ◽

pp. 475-494 ◽

Cited By ~ 1

Author(s):

Thomas Volzer ◽

Peter Eberhard

Keyword(s):

Finite Element ◽

Model Reduction ◽

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Block Size ◽

Reduction Process ◽

Element Model ◽

Multibody Simulation ◽

Elastic Bodies

Abstract The use of elastic bodies within a multibody simulation became more and more important within the last years. To include the elastic bodies, described as a finite element model in multibody simulations, the dimension of the system of ordinary differential equations must be reduced by projection. For this purpose, in this work, the modal reduction method, a component mode synthesis based method and a moment-matching method are used. Due to the always increasing size of the non-reduced systems, the calculation of the projection matrix leads to a large demand of computational resources and cannot be done on usual serial computers with available memory. In this paper, the model reduction software Morembs++ is presented using a parallelization concept based on the message passing interface to satisfy the need of memory and reduce the runtime of the model reduction process. Additionally, the behaviour of the Block-Krylov-Schur eigensolver, implemented in the Anasazi package of the Trilinos project, is analysed with regard to the choice of the size of the Krylov base, the block size and the number of blocks. Besides, an iterative solver is considered within the CMS-based method.

Download Full-text