Orchestrating Multiple Data-Parallel Kernels on Multiple Devices

Author(s):  
Janghaeng Lee ◽  
Mehrzad Samadi ◽  
Scott Mahlke
2021 ◽  
Author(s):  
Rui Huang

Current trends of autonomous driving apply the hybrid use of on-vehicle and roadside smart devices to perform collaborative data sensing and computing, so as to achieve a comprehensive and stable decision making. The integrated system is usually named as C-V2X. However, several challenges have significantly hindered the development and adoption of such systems. For example, the difficulty of accessing multiple data protocols of multiple devices at the bottom layer, and the centralized deployment of computing arithmetic power. Therefore, this work proposes a novel framework for the design of C-V2X systems. First, a highly aggregated architecture is designed with fully integration with multiple traffic data resources. Then a multilevel information fusion model is designed based on multi-sensors in vehicle-road coordination. The model can fit different detection environments, detection mechanisms, and time frames. Finally, a lightweight and efficient identity-based authentication method is given. The method can realize bidirectional authentication between end devices and edge gateways.


2014 ◽  
Vol E97.D (11) ◽  
pp. 2827-2834 ◽  
Author(s):  
Ittetsu TANIGUCHI ◽  
Junya KAIDA ◽  
Takuji HIEDA ◽  
Yuko HARA-AZUMI ◽  
Hiroyuki TOMIYAMA

2013 ◽  
Vol E96.D (10) ◽  
pp. 2268-2271
Author(s):  
Junya KAIDA ◽  
Yuko HARA-AZUMI ◽  
Takuji HIEDA ◽  
Ittetsu TANIGUCHI ◽  
Hiroyuki TOMIYAMA ◽  
...  

1997 ◽  
Vol 6 (1) ◽  
pp. 3-27 ◽  
Author(s):  
Corinne Ancourt ◽  
Fabien Coelho ◽  
FranÇois Irigoin ◽  
Ronan Keryell

High Performance Fortran (HPF) was developed to support data parallel programming for single-instruction multiple-data (SIMD) and multiple-instruction multiple-data (MIMD) machines with distributed memory. The programmer is provided a familiar uniform logical address space and specifies the data distribution by directives. The compiler then exploits these directives to allocate arrays in the local memories, to assign computations to elementary processors, and to migrate data between processors when required. We show here that linear algebra is a powerful framework to encode HPF directives and to synthesize distributed code with space-efficient array allocation, tight loop bounds, and vectorized communications forINDEPENDENTloops. The generated code includes traditional optimizations such as guard elimination, message vectorization and aggregation, and overlap analysis. The systematic use of an affine framework makes it possible to prove the compilation scheme correct.


1997 ◽  
Vol 07 (02) ◽  
pp. 145-156
Author(s):  
Manish Gupta ◽  
Edith Schonberg

For a program with sufficient parallelism, reducing synchronization costs is an important objective for achieving efficient execution. This paper presents a novel methodology for reducing synchronization costs of programs compiled for SPMD execution. This methodology combines data flow analyisis with communication analysis to determine the ordering between production and consumption of data on different processors, which helps in identifying redundant synchronization. The resulting framework is more powerful than any that have been previously presented, as it provides the first algorithm that can eliminate synchronization messages even from computations that need communication. We show that several commonly occuring computation patterns such as reductions and stencil computations with reciprocal producer-consumer relationship between processors lend themselves well to this optimization, an observation that is confirmed by an examination of some HPE benchmark programs. Our framework also recognizes situations where the synchronization needs for multiple data transfers can be satisfied by a single synchronization message. This analysis, while applicable to all shared memory machines as well, is especially useful for those with a flexible cache-coherence protocol, as it identifies efficient ways of moving data directly from producers to consumers, often without any extra synchronization.


1992 ◽  
Vol 21 (5) ◽  
pp. 363-386 ◽  
Author(s):  
Bradley K. Seevers ◽  
Michael J. Quinn ◽  
Philip J. Hatcher

1995 ◽  
Vol 4 (3) ◽  
pp. 193-201
Author(s):  
Dan Williams ◽  
Luc Bauwens

This article describes the porting and optimization of an explicit, time-dependent, computational fluid dynamics code on an 8,192-node MasPar MP-1. The MasPar is a very fine-grained, single instruction, multiple data parallel computer. The code uses the flux-corrected transport algorithm. We describe the techniques used to port and optimize the code, and the behavior of a test problem. The test problem used to benchmark the flux-corrected transport code on the MasPar was a two-dimensional exploding shock with periodic boundary conditions. We discuss the performance that our code achieved on the MasPar, and compare its performance on the MasPar with its performance on other architectures. The comparisons show that the performance of the code on the MasPar is slightly better than on a CRAY Y-MP for a functionally equivalent, optimized two-dimensional code.


1993 ◽  
Vol 28 (1) ◽  
pp. 44-47 ◽  
Author(s):  
Bradley K. Seevers ◽  
Michael J. Quinn ◽  
Philip J. Hatcher

1994 ◽  
Vol 3 (3) ◽  
pp. 169-186 ◽  
Author(s):  
Matt Rosing ◽  
Robert Schnabel

The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. In this article, we give an overview of the language and discuss some of the critical implementation details.


Sign in / Sign up

Export Citation Format

Share Document