A Compiler-assisted locality aware CTA Mapping Scheme

Mapping Intimacies ◽

10.29007/55pq ◽

2019 ◽

Author(s):

Lifeng Liu ◽

Meilin Liu ◽

Chongjun Wang

Keyword(s):

General Purpose ◽

Data Locality ◽

Data Reuse ◽

Mapping Algorithm ◽

Performance Improvements ◽

Many Core ◽

General Purpose Gpu ◽

And Control ◽

Level Parallelism ◽

Mapping Scheme

General purpose GPU (GPGPU) is an effective many-core architecture that can yield high throughput for many scientific applications with thread-level parallelism. However, several challenges still limit further performance improvements and make GPU program- ming challenging for programmers who lack the knowledge of GPU hardware architecture. In this paper, we design a compiler-assisted locality aware CTA (cooperative thread array) mapping scheme for GPUs to take advantage of the inter CTA data reuses in the GPU kernels. Using the data reuse analysis based on the polyhedron model, we can detect inter CTA data reuse patterns in the GPU kernels and control the CTA mapping pattern to improve the data locality on each SM. The compiler-assisted locality aware CTA mapping scheme can also be combined with the programmable warp scheduler to further improve the performance. The experimental results show that our CTA mapping algorithm can improve the overall performance of the input GPU programs by 23.3% on average and by 56.7% when combined with the programmable warp scheduler.

Download Full-text

Improving ILP via Fused In-Order Superscalar and VLIW Instruction Dispatch Methods

Journal of Circuits System and Computers ◽

10.1142/s0218126619500208 ◽

2018 ◽

Vol 28 (02) ◽

pp. 1950020 ◽

Cited By ~ 1

Author(s):

Yumin Hou ◽

Xu Wang ◽

Jiawei Fu ◽

Junping Ma ◽

Hu He ◽

...

Keyword(s):

Prediction Method ◽

Digital Signal ◽

General Purpose ◽

Performance Comparison ◽

Instruction Level Parallelism ◽

Superscalar Processor ◽

Performance Improvements ◽

General Purpose Processor ◽

Evaluation Board ◽

Level Parallelism

In order to expand the computation capability of digital signal processing on a General Purpose Processor (GPP), we propose a fused microarchitecture that improves Instruction Level Parallelism (ILP) by supporting both in-order superscalar and very long instruction word (VLIW) dispatch methods in a single pipeline. This design is based on ARMv7-A&R Instruction Set Architecture (ISA). To provide a performance comparison, we first design an in-order superscalar processor, considering that ARM GPPs always adopt superscalar approaches. And then we expand VLIW dispatch method based on this processor, to realize the fused microarchitecture. The two designs are both evaluated on the Xilinx 7-series FPGA (XC7K325T-2FFG900C), using Xilinx Vivado design suite. The results show that, compared with the superscalar processor, the processor working under VLIW mode can improve the performance by 15% and 8%, respectively, when running EEMBC and DSPstone benchmarks. We also run the two benchmarks on ARM Cortex-A9 processor, which is integrated in the Zynq-7000 AP SoC device on Xilinx ZC706 evaluation board. The processor in VLIW mode shows 44% and 30% performance improvements than ARM Cortex-A9. The fused microarchitecture adopts a combined bimodal and PAp branch prediction method. This method achieves 93.7% prediction accuracy with limited hardware overhead.

Download Full-text

Research on highly parallel embedded control system design and implementation method

Impact ◽

10.21820/23987073.2019.10.44 ◽

2019 ◽

Vol 2019 (10) ◽

pp. 44-46

Author(s):

Masato Edahiro ◽

Masaki Gondo

Keyword(s):

Computer Architecture ◽

Intelligent Systems ◽

Large Scale ◽

General Purpose ◽

Heterogeneous Structure ◽

Single Chip ◽

Powertrain Control ◽

Processing Power ◽

Hardware Description ◽

Many Core

The pace of technology's advancements is ever-increasing and intelligent systems, such as those found in robots and vehicles, have become larger and more complex. These intelligent systems have a heterogeneous structure, comprising a mixture of modules such as artificial intelligence (AI) and powertrain control modules that facilitate large-scale numerical calculation and real-time periodic processing functions. Information technology expert Professor Masato Edahiro, from the Graduate School of Informatics at the Nagoya University in Japan, explains that concurrent advances in semiconductor research have led to the miniaturisation of semiconductors, allowing a greater number of processors to be mounted on a single chip, increasing potential processing power. 'In addition to general-purpose processors such as CPUs, a mixture of multiple types of accelerators such as GPGPU and FPGA has evolved, producing a more complex and heterogeneous computer architecture,' he says. Edahiro and his partners have been working on the eMBP, a model-based parallelizer (MBP) that offers a mapping system as an efficient way of automatically generating parallel code for multi- and many-core systems. This ensures that once the hardware description is written, eMBP can bridge the gap between software and hardware to ensure that not only is an efficient ecosystem achieved for hardware vendors, but the need for different software vendors to adapt code for their particular platforms is also eliminated.

Download Full-text

Introducing multi-level parallelism, at coarse, fine and instruction level to enhance the performance of iterative solvers for large sparse linear systems on Multi- and Many-core architecture

2020 IEEE/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar) ◽

10.1109/llvmhpchipar51896.2020.00014 ◽

2020 ◽

Author(s):

Jean-Marc Gratien

Keyword(s):

Linear Systems ◽

Iterative Solvers ◽

Sparse Linear Systems ◽

Multi Level ◽

Many Core ◽

Level Parallelism

Download Full-text

GPIOCP: Timing-accurate general purpose I/O controller for many-core real-time systems

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 ◽

10.23919/date.2017.7927099 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zhe Jiang ◽

Neil C. Audsley

Keyword(s):

Real Time ◽

General Purpose ◽

Real Time Systems ◽

Many Core ◽

Time Systems

Download Full-text

Administración del agua y los recursos de la nación: la Junta Federal de Mejoras Materiales, Ciudad Juárez, Chihuahua, 1931-1936

región y sociedad ◽

10.22198/rys.2017.70.a340 ◽

2017 ◽

Vol 29 (70) ◽

Author(s):

María Del Carmen Zetina Rodríguez ◽

Rutilio García Pereyra ◽

Efraín Rangel Guzmán

Keyword(s):

Water Management ◽

Water Resources ◽

General Purpose ◽

Economic Resources ◽

Public Works ◽

Background Information ◽

Ciudad Juarez ◽

Historical Archives ◽

Loss Of Autonomy ◽

And Control

El gobierno constituyó la Junta Federal de Mejoras Materiales para administrar y controlar los recursos económicos y la construcción de obras públicas en las fronteras y los puertos de México. El objetivo general de esta investigación fue analizar cómo se instauró y funcionó dicho organismo en Ciudad Juárez, en el contexto de la centralización/federalización de los recursos hídricos del país, de 1931 a 1936; para ello se revisaron los archivos históricos. Una de las limitaciones del estudio fue el desconocimiento de los antecedentes de la administración de los recursos hídricos en este poblado. Por lo que su aportación amplía el conocimiento escaso que había sobre el funcionamiento de las juntas en las fronteras. Entre los descubrimientos se puede citar que en el Ayuntamiento de Juárez, la pérdida de autonomía en la administración de las aguas se sumó a un despojo material y económico, en el que intervinieron varias instituciones y dependencias de gobierno. Water management and the nation’s resources: the Federal Board of Material Improvements, Ciudad Juarez, Chihuahua, 1931-1936The government constituted the Federal Board of Material Improvements in order to manage and control the economic resources and the construction of public works at México’s borders and ports. The general purpose of this research was to analyze how this agency was established and operated in Ciudad Juarez, in the context of the centralization/federalization of the country’s water resources, from 1931 to 1936, and, to this end, the historical archives were reviewed. One of the study’s limitations was the lack of background information about the management of the water resources in this town. Its contribution broadens the scarce existing knowledge about the boards’ functioning at the borders. Among the findings made, it can be mentioned that in the municipality of Juarez the loss of autonomy concerning water management was accompanied by a material and economic dispossession, in which several government institutions and agencies participated.

Download Full-text

AGENTS for cooperating expert systems in concurrent engineering design

Artificial intelligence for engineering design analysis and manufacturing ◽

10.1017/s0890060400000858 ◽

1993 ◽

Vol 7 (3) ◽

pp. 145-158 ◽

Cited By ~ 3

Author(s):

Guo Q. Huang ◽

John A. Brandon

Keyword(s):

Expert Systems ◽

Engineering Design ◽

Concurrent Engineering ◽

General Purpose ◽

Main Theme ◽

Computer Tools ◽

System A ◽

And Control ◽

Domain Independent

A main theme of concurrent engineering is the effective communication between relevant disciplines. Any computer tools for concurrent engineering must provide sufficient constructs and strategies for this purpose. This paper describes the AGENTS system, a domain-independent general-purpose Object-Oriented Prolog language for cooperating expert systems in concurrent engineering design. Emphasis is placed on demonstrating the use of the AGENTS constructs for distributed knowledge representation and the cooperation strategies for communication, collaboration, conflict resolution, and control. A simple case study is presented to illustrate the balance between simplicity and flexibility.

Download Full-text

Ultra-Fast Digital Tomosynthesis Reconstruction Using General-Purpose GPU Programming for Image-Guided Radiation Therapy

Technology in Cancer Research & Treatment ◽

10.7785/tcrt.2012.500206 ◽

2011 ◽

Vol 10 (4) ◽

pp. 295-306 ◽

Cited By ~ 22

Author(s):

Justin C. Park ◽

Sung Ho Park ◽

Jin Sung Kim ◽

Youngyih Han ◽

Min Kook Cho ◽

...

Keyword(s):

Radiation Therapy ◽

General Purpose ◽

Gpu Programming ◽

Digital Tomosynthesis ◽

Image Guided Radiation Therapy ◽

Image Guided ◽

General Purpose Gpu

Download Full-text

A general purpose selective signaling and control system

Electrical Engineering ◽

10.1109/ee.1958.6445128 ◽

1958 ◽

Vol 77 (6) ◽

pp. 486-491

Author(s):

W. V. K. Large ◽

H. J. Michael

Keyword(s):

Control System ◽

General Purpose ◽

And Control

Download Full-text

DO-178C Certification of General-Purpose GPU Software: Review of Existing Methods and Future Directions

10.1109/dasc52595.2021.9594412 ◽

2021 ◽

Author(s):

Matina Maria Trompouki ◽

Leonidas Kosmidis

Keyword(s):

General Purpose ◽

Software Review ◽

Future Directions ◽

General Purpose Gpu

Download Full-text

Benchmarking a Many-Core Neuromorphic Platform With an MPI-Based DNA Sequence Matching Algorithm

Electronics ◽

10.3390/electronics8111342 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1342

Author(s):

Gianvito Urgese ◽

Francesco Barchi ◽

Emanuele Parisi ◽

Evelina Forno ◽

Andrea Acquaviva ◽

...

Keyword(s):

Dna Sequence ◽

Parallel Architecture ◽

General Purpose ◽

Sequence Matching ◽

Matching Algorithm ◽

Computing Platform ◽

Globally Asynchronous Locally Synchronous ◽

Efficient Communication ◽

The Many ◽

Many Core

SpiNNaker is a neuromorphic globally asynchronous locally synchronous (GALS) multi-core architecture designed for simulating a spiking neural network (SNN) in real-time. Several studies have shown that neuromorphic platforms allow flexible and efficient simulations of SNN by exploiting the efficient communication infrastructure optimised for transmitting small packets across the many cores of the platform. However, the effectiveness of neuromorphic platforms in executing massively parallel general-purpose algorithms, while promising, is still to be explored. In this paper, we present an implementation of a parallel DNA sequence matching algorithm implemented by using the MPI programming paradigm ported to the SpiNNaker platform. In our implementation, all cores available in the board are configured for executing in parallel an optimised version of the Boyer-Moore (BM) algorithm. Exploiting this application, we benchmarked the SpiNNaker platform in terms of scalability and synchronisation latency. Experimental results indicate that the SpiNNaker parallel architecture allows a linear performance increase with the number of used cores and shows better scalability compared to a general-purpose multi-core computing platform.

Download Full-text