CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU

The Scientific World JOURNAL ◽

10.1155/2015/848416 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10

Author(s):

Jianliang Ma ◽

Jinglei Meng ◽

Tianzhou Chen ◽

Minghui Wu

Keyword(s):

Scheduling Algorithm ◽

Global Memory ◽

Request Sequence ◽

Thread Level Parallelism ◽

Level Parallelism ◽

Memory Request ◽

Request Service

Ultra high thread-level parallelism in modern GPUs usually introduces numerous memory requests simultaneously. So there are always plenty of memory requests waiting at each bank of the shared LLC (L2 in this paper) and global memory. For global memory, various schedulers have already been developed to adjust the request sequence. But we find few work has ever focused on the service sequence on the shared LLC. We measured that a big number of GPU applications always queue at LLC bank for services, which provide opportunity to optimize the service order on LLC. Through adjusting the GPU memory request service order, we can improve the schedulability of SM. So we proposed a critical-aware shared LLC request scheduling algorithm (CaLRS) in this paper. The priority representative of memory request is critical for CaLRS. We use the number of memory requests that originate from the same warp but have not been serviced when they arrive at the shared LLC bank to represent the criticality of each warp. Experiments show that the proposed scheme can boost the SM schedulability effectively by promoting the scheduling priority of the memory requests with high criticality and improves the performance of GPU indirectly.

Download Full-text

Comparative Study of Heterogeneous Multicore Scheduling Algorithms on Media Codecs

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/05355 ◽

2021 ◽

Vol 23 (06) ◽

pp. 840-849

Author(s):

Nagendra Kumar Jamadagni ◽

◽

Aniruddh M ◽

Dr. Govinda Raju M ◽

Dr. Usha Rani K. R ◽

...

Keyword(s):

Scheduling Algorithm ◽

Scheduling Algorithms ◽

Optimal Scheduling ◽

Multicore Architecture ◽

Heterogeneous Multicore ◽

Multicore Scheduling ◽

Fast Pace ◽

Streaming Services ◽

Thread Level Parallelism ◽

Level Parallelism

All modern-day computers and smartphones come with multi-core CPUs. The multicore architecture is generally heterogeneous in nature to maximize computational throughput. These multicore systems exploit thread-level parallelism to deliver higher performance, but they are limited by the requirement of good scheduling algorithms that maximize CPU utility and minimize wasted and idle cycles. With the rise in streaming services and multimedia capabilities of smartphones, it is necessary to have efficient heterogeneous cores which are capable of performing multimedia processing at a fast pace. It is also needed that they utilize efficient scheduling algorithms to achieve this task. This paper compares some heterogeneous multi-core scheduling algorithms available and determines which is the most optimal scheduling algorithm given various codecs.

Download Full-text

An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture - ISCA '09 ◽

10.1145/1555754.1555775 ◽

2009 ◽

Cited By ~ 256

Author(s):

Sunpyo Hong ◽

Hyesoon Kim

Keyword(s):

Analytical Model ◽

Thread Level Parallelism ◽

Level Parallelism ◽

Gpu Architecture ◽

With Memory

Download Full-text

Thread partitioning and value prediction for exploiting speculative thread-level parallelism

IEEE Transactions on Computers ◽

10.1109/tc.2004.1261823 ◽

2004 ◽

Vol 53 (2) ◽

pp. 114-125 ◽

Cited By ~ 11

Author(s):

P. Marcuello ◽

A. Gonzalez ◽

J. Tubella

Keyword(s):

Value Prediction ◽

Thread Level Parallelism ◽

Thread Partitioning ◽

Level Parallelism

Download Full-text

GPU Performance vs. Thread-Level Parallelism

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3177964 ◽

2018 ◽

Vol 15 (1) ◽

pp. 1-21 ◽

Cited By ~ 4

Author(s):

Zhen Lin ◽

Michael Mantor ◽

Huiyang Zhou

Keyword(s):

Thread Level Parallelism ◽

Level Parallelism

Download Full-text

Exploiting Thread-Level Parallelism of Irregular LDPC Decoder with Simultaneous Multi-threading Technique

Lecture Notes in Computer Science - Advanced Parallel Processing Technologies ◽

10.1007/978-3-540-76837-1_70 ◽

2007 ◽

pp. 650-657 ◽

Cited By ~ 1

Author(s):

Xing Fang ◽

Dong Wang ◽

Shuming Chen

Keyword(s):

Ldpc Decoder ◽

Thread Level Parallelism ◽

Level Parallelism

Download Full-text

Executing linear algebra kernels in heterogeneous distributed infrastructures with PyCOMPSs

Oil & Gas Science and Technology – Revue d’IFP Energies nouvelles ◽

10.2516/ogst/2018047 ◽

2018 ◽

Vol 73 ◽

pp. 47 ◽

Cited By ~ 3

Author(s):

Ramon Amela ◽

Cristian Ramon-Cortes ◽

Jorge Ejarque ◽

Javier Conejero ◽

Rosa M. Badia

Keyword(s):

Programming Languages ◽

Linear Algebra ◽

Programming Model ◽

Xeon Phi ◽

Scientific Communities ◽

Heterogeneous Architectures ◽

Parallel Programming Model ◽

Significant Performance ◽

Thread Level Parallelism ◽

Level Parallelism

Python is a popular programming language due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. The adoption from multiple scientific communities has evolved in the emergence of a large number of libraries and modules, which has helped to put Python on the top of the list of the programming languages [1]. Task-based programming has been proposed in the recent years as an alternative parallel programming model. PyCOMPSs follows such approach for Python, and this paper presents its extensions to combine task-based parallelism and thread-level parallelism. Also, we present how PyCOMPSs has been adapted to support heterogeneous architectures, including Xeon Phi and GPUs. Results obtained with linear algebra benchmarks demonstrate that significant performance can be obtained with a few lines of Python.

Download Full-text

A study of Thread Level Parallelism on mobile devices

2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) ◽

10.1109/ispass.2014.6844468 ◽

2014 ◽

Cited By ~ 13

Author(s):

Cao Gao ◽

Anthony Gutierrez ◽

Ronald G. Dreslinski ◽

Trevor Mudge ◽

Krisztian Flautner ◽

...

Keyword(s):

Mobile Devices ◽

Thread Level Parallelism ◽

Level Parallelism

Download Full-text

Exploring thread-level parallelism based on cost-driven model for irregular programs

2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) ◽

10.1109/icspcc.2017.8242607 ◽

2017 ◽

Author(s):

Yuancheng Li ◽

Bin Liu

Keyword(s):

Thread Level Parallelism ◽

Level Parallelism ◽

Irregular Programs

Download Full-text

Runtime support for integrating precomputation and thread-level parallelism on simultaneous multithreaded processors

10.1145/1066650.1066667 ◽

2004 ◽

Cited By ~ 3

Author(s):

Tanping Wang ◽

Filip Blagojevic ◽

Dimitrios S. Nikolopoulos

Keyword(s):

Multithreaded Processors ◽

Runtime Support ◽

Thread Level Parallelism ◽

Level Parallelism

Download Full-text

Potential thread-level-parallelism exploration with superblock reordering

Computing ◽

10.1007/s00607-014-0387-8 ◽

2014 ◽

Vol 96 (6) ◽

pp. 545-564 ◽

Cited By ~ 7

Author(s):

John Ye ◽

Hui Yan ◽

Honglun Hou ◽

Tianzhou Chen

Keyword(s):

Thread Level Parallelism ◽

Level Parallelism

Download Full-text