NUMA-BTDM: A Thread Mapping Algorithm for Balanced Data Locality on NUMA Systems

Efficient Thread Mapping for Heterogeneous Multicore IoT Systems

Mobile Information Systems ◽

10.1155/2017/3021565 ◽

2017 ◽

Vol 2017 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Thomas Mezmur Birhanu ◽

Zhetao Li ◽

Hiroo Sekiya ◽

Nobuyoshi Komuro ◽

Young-June Choi

Keyword(s):

Time Complexity ◽

Multicore Systems ◽

Thread Scheduling ◽

Mapping Algorithm ◽

Heterogeneous Multicore ◽

Heterogeneous Architectures ◽

Cpu Utilization ◽

Speed Up ◽

Thread Mapping

This paper proposes a thread scheduling mechanism primed for heterogeneously configured multicore systems. Our approach considers CPU utilization for mapping running threads with the appropriate core that can potentially deliver the actual needed capacity. The paper also introduces a mapping algorithm that is able to map threads to cores in anO(N log M)time complexity, whereNis the number of cores andMis the number of types of cores. In addition to that we also introduced a method of profiling heterogeneous architectures based on the discrepancy between the performances of individual cores. Our heterogeneity aware scheduler was able to speed up processing by 52.62% and save power by 2.22% as compared to the CFS scheduler that is a default in Linux systems.

Download Full-text

A Compiler-assisted locality aware CTA Mapping Scheme

10.29007/55pq ◽

2019 ◽

Author(s):

Lifeng Liu ◽

Meilin Liu ◽

Chongjun Wang

Keyword(s):

General Purpose ◽

Data Locality ◽

Data Reuse ◽

Mapping Algorithm ◽

Performance Improvements ◽

Many Core ◽

General Purpose Gpu ◽

And Control ◽

Level Parallelism ◽

Mapping Scheme

General purpose GPU (GPGPU) is an effective many-core architecture that can yield high throughput for many scientific applications with thread-level parallelism. However, several challenges still limit further performance improvements and make GPU program- ming challenging for programmers who lack the knowledge of GPU hardware architecture. In this paper, we design a compiler-assisted locality aware CTA (cooperative thread array) mapping scheme for GPUs to take advantage of the inter CTA data reuses in the GPU kernels. Using the data reuse analysis based on the polyhedron model, we can detect inter CTA data reuse patterns in the GPU kernels and control the CTA mapping pattern to improve the data locality on each SM. The compiler-assisted locality aware CTA mapping scheme can also be combined with the programmable warp scheduler to further improve the performance. The experimental results show that our CTA mapping algorithm can improve the overall performance of the input GPU programs by 23.3% on average and by 56.7% when combined with the programmable warp scheduler.

Download Full-text

RaceR: A Thread Mapping Algorithm for Race Reduction in Multi-Level Shared Caches

2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) ◽

10.1109/empdp.2019.8671576 ◽

2019 ◽

Author(s):

Pezhman Shojaa Sahneh ◽

Amin Sarihi ◽

Benjamin Warburton ◽

Ahmad Patooghy

Keyword(s):

Mapping Algorithm ◽

Shared Caches ◽

Multi Level ◽

Thread Mapping

Download Full-text

Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree

Computers ◽

10.3390/computers7040066 ◽

2018 ◽

Vol 7 (4) ◽

pp. 66

Author(s):

Iulia Știrb

Keyword(s):

Compiler Optimization ◽

Memory Access ◽

Task Parallelism ◽

Real Time Control ◽

Shared Resources ◽

Mapping Algorithm ◽

Time Control ◽

Parallel Code ◽

Thread Mapping ◽

Task Level

The paper presents a Non-Uniform Memory Access (NUMA)-aware compiler optimization for task-level parallel code. The optimization is based on Non-Uniform Memory Access—Balanced Task and Loop Parallelism (NUMA-BTLP) algorithm Ştirb, 2018. The algorithm gets the type of each thread in the source code based on a static analysis of the code. After assigning a type to each thread, NUMA-BTLP Ştirb, 2018 calls NUMA-BTDM mapping algorithm Ştirb, 2016 which uses PThreads routine pthread_setaffinity_np to set the CPU affinities of the threads (i.e., thread-to-core associations) based on their type. The algorithms perform an improve thread mapping for NUMA systems by mapping threads that share data on the same core(s), allowing fast access to L1 cache data. The paper proves that PThreads based task-level parallel code which is optimized by NUMA-BTLP Ştirb, 2018 and NUMA-BTDM Ştirb, 2016 at compile-time, is running time and energy efficiently on NUMA systems. The results show that the energy is optimized with up to 5% at the same execution time for one of the tested real benchmarks and up to 15% for another benchmark running in infinite loop. The algorithms can be used on real-time control systems such as client/server based applications which require efficient access to shared resources. Most often, task parallelism is used in the implementation of the server and loop parallelism is used for the client.

Download Full-text

SENTINEL LYMPH NODE MAPPING ALGORITHM IN ENDOMETRIAL CANCER WITH NEAR-INFRARED FLUORESCENT IMAGING AND INDOCYANINE GREEN: A VALIDATION STUDY USING LAPAROSCOPIC SYSTEM

10.26226/morressier.599bdc7bd462b80296ca11ba ◽

2017 ◽

Author(s):

Altin Duygu

Keyword(s):

Endometrial Cancer ◽

Lymph Node ◽

Sentinel Lymph Node ◽

Indocyanine Green ◽

Validation Study ◽

Near Infrared ◽

Sentinel Lymph Node Mapping ◽

Fluorescent Imaging ◽

Lymph Node Mapping ◽

Mapping Algorithm

Download Full-text

A Longest Matching Resource Mapping Algorithm with State Compression Dynamic Programming Optimization

Intelligent Automation & Soft Computing ◽

10.31209/2019.100000117 ◽

2019 ◽

pp. 625-636

Author(s):

Zhang Min ◽

Teng Haibin ◽

Jiang Ming ◽

Wen Tao ◽

Tang Jingfan

Keyword(s):

Dynamic Programming ◽

Mapping Algorithm ◽

Resource Mapping

Download Full-text

Faculty Opinions recommendation of Comparison of a sentinel lymph node mapping algorithm and comprehensive lymphadenectomy in the detection of stage IIIC endometrial carcinoma at higher risk for nodal disease.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.731602853.793560781 ◽

2019 ◽

Author(s):

Steve Plaxe

Keyword(s):

Lymph Node ◽

Sentinel Lymph Node ◽

Endometrial Carcinoma ◽

Sentinel Lymph Node Mapping ◽

Nodal Disease ◽

Stage Iiic ◽

Lymph Node Mapping ◽

Mapping Algorithm

Download Full-text

Face recognition based on improved isometric feature mapping algorithm

Journal of Computer Applications ◽

10.3724/sp.j.1087.2013.00076 ◽

2013 ◽

Vol 33 (1) ◽

pp. 76-79

Author(s):

Jiamin LIU ◽

Huiyan WANG ◽

Xiaoli ZHOU ◽

Fulin LUO

Keyword(s):

Face Recognition ◽

Feature Mapping ◽

Mapping Algorithm ◽

Isometric Feature Mapping

Download Full-text

A Novel Virtual Network Mapping Algorithm Based on Regionalization

JOURNAL OF ELECTRONICS INFORMATION TECHNOLOGY ◽

10.3724/sp.j.1146.2011.00116 ◽

2011 ◽

Vol 33 (10) ◽

pp. 2347-2352

Author(s):

Bo Lü ◽

Fan Yang ◽

Zhen-kai Wang ◽

Jian-ya Chen ◽

Yun-jie Liu

Keyword(s):

Virtual Network ◽

Mapping Algorithm ◽

Virtual Network Mapping ◽

Network Mapping

Download Full-text

A First Step Toward On-Chip Memory Mapping for Parallel Turbo and LDPC Decoders: A Polynomial Time Mapping Algorithm

IEEE Transactions on Signal Processing ◽

10.1109/tsp.2013.2264057 ◽

2013 ◽

Vol 61 (16) ◽

pp. 4127-4140 ◽

Cited By ~ 15

Author(s):

Awais Hussain Sani ◽

Philippe Coussy ◽

Cyrille Chavet

Keyword(s):

Polynomial Time ◽

Mapping Algorithm ◽

Memory Mapping ◽

On Chip

Download Full-text