scholarly journals CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU

2015 ◽  
Vol 2015 ◽  
pp. 1-10
Author(s):  
Jianliang Ma ◽  
Jinglei Meng ◽  
Tianzhou Chen ◽  
Minghui Wu

Ultra high thread-level parallelism in modern GPUs usually introduces numerous memory requests simultaneously. So there are always plenty of memory requests waiting at each bank of the shared LLC (L2 in this paper) and global memory. For global memory, various schedulers have already been developed to adjust the request sequence. But we find few work has ever focused on the service sequence on the shared LLC. We measured that a big number of GPU applications always queue at LLC bank for services, which provide opportunity to optimize the service order on LLC. Through adjusting the GPU memory request service order, we can improve the schedulability of SM. So we proposed a critical-aware shared LLC request scheduling algorithm (CaLRS) in this paper. The priority representative of memory request is critical for CaLRS. We use the number of memory requests that originate from the same warp but have not been serviced when they arrive at the shared LLC bank to represent the criticality of each warp. Experiments show that the proposed scheme can boost the SM schedulability effectively by promoting the scheduling priority of the memory requests with high criticality and improves the performance of GPU indirectly.

2021 ◽  
Vol 23 (06) ◽  
pp. 840-849
Author(s):  
Nagendra Kumar Jamadagni ◽  
◽  
Aniruddh M ◽  
Dr. Govinda Raju M ◽  
Dr. Usha Rani K. R ◽  
...  

All modern-day computers and smartphones come with multi-core CPUs. The multicore architecture is generally heterogeneous in nature to maximize computational throughput. These multicore systems exploit thread-level parallelism to deliver higher performance, but they are limited by the requirement of good scheduling algorithms that maximize CPU utility and minimize wasted and idle cycles. With the rise in streaming services and multimedia capabilities of smartphones, it is necessary to have efficient heterogeneous cores which are capable of performing multimedia processing at a fast pace. It is also needed that they utilize efficient scheduling algorithms to achieve this task. This paper compares some heterogeneous multi-core scheduling algorithms available and determines which is the most optimal scheduling algorithm given various codecs.


2018 ◽  
Vol 15 (1) ◽  
pp. 1-21 ◽  
Author(s):  
Zhen Lin ◽  
Michael Mantor ◽  
Huiyang Zhou

Author(s):  
Ramon Amela ◽  
Cristian Ramon-Cortes ◽  
Jorge Ejarque ◽  
Javier Conejero ◽  
Rosa M. Badia

Python is a popular programming language due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. The adoption from multiple scientific communities has evolved in the emergence of a large number of libraries and modules, which has helped to put Python on the top of the list of the programming languages [1]. Task-based programming has been proposed in the recent years as an alternative parallel programming model. PyCOMPSs follows such approach for Python, and this paper presents its extensions to combine task-based parallelism and thread-level parallelism. Also, we present how PyCOMPSs has been adapted to support heterogeneous architectures, including Xeon Phi and GPUs. Results obtained with linear algebra benchmarks demonstrate that significant performance can be obtained with a few lines of Python.


Author(s):  
Cao Gao ◽  
Anthony Gutierrez ◽  
Ronald G. Dreslinski ◽  
Trevor Mudge ◽  
Krisztian Flautner ◽  
...  

Computing ◽  
2014 ◽  
Vol 96 (6) ◽  
pp. 545-564 ◽  
Author(s):  
John Ye ◽  
Hui Yan ◽  
Honglun Hou ◽  
Tianzhou Chen

Sign in / Sign up

Export Citation Format

Share Document