Energy-Efficient Run-Time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCs

Recently, technology scaling has enabled the placement of an increasing number of cores, in the form of chip-multiprocessors (CMPs) on a chip and continually shrinking transistor sizes to improve performance. In this context, power consumption has become the main constraint in designing CMPs. As a result, uncore components power consumption taking increasing portion from the on-chip power budget; therefore, designing power management techniques, particularly memory and network-on-chip (NoC) systems, has become an important issue to solve. Consequently, a considerable attention has been directed toward power management based on CMPs components, particularly shared caches and uncore interconnected structures, to overcome the challenges of limited chip power budget.<div>This work targets to design an energy-efficient uncore architecture by using heterogeneity in components (cache cells) and operational parameters (Voltage/Frequency). In order to ensure the minimum impact on the system performance, a run-time approach is investigated to assess the proposed method. An architecture is proposed where the cache layer contains the heterogenous cache banks in all placed in one frequency voltage domain. Average memory access time (AMAT) was selected as a network monitor to monitor the performance on the run-time. The appropriate size and type of the last level cache (LLC) and Voltage/Frequency for the uncore domain is adjusted according to the calculated AMAT which indicates the system demand from the uncore.<br></div><div>The proposed hybrid architecture was implemented, investigated and compared with the a baseline model where only SRAM banks were used in the last level cache. Experimental results on the Princeton Application Repository for Shared-Memory Computers (PARSEC) benchmark suit,show that the proposed architecture yields up to a 40% reduction in overall chip energy-delay product with a marginal performance degradation in average of -1.2% below the baseline one. The best energy saving was 55% and the worse degradation was only 15%.<br></div>

Download Full-text

Energy-efficient Run-time Detection of Malware-infected Executables and Dynamic Libraries on Mobile Devices

2009 Software Technologies for Future Dependable Distributed Systems ◽

10.1109/stfssd.2009.17 ◽

2009 ◽

Cited By ~ 1

Author(s):

Jong-seok Lee ◽

Tae-Hyung Kim ◽

Jong Kim

Keyword(s):

Mobile Devices ◽

Energy Efficient ◽

Run Time

Download Full-text

Low-Complexity Run-time Management of Concurrent Workloads for Energy-Efficient Multi-Core Systems

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea10030025 ◽

2020 ◽

Vol 10 (3) ◽

pp. 25

Author(s):

Ali Aalsaud ◽

Fei Xia ◽

Ashur Rafiev ◽

Rishad Shafik ◽

Alexander Romanovsky ◽

...

Keyword(s):

Time Management ◽

Energy Efficient ◽

Low Cost ◽

Low Complexity ◽

Optimization Approach ◽

System Configuration ◽

Efficient System ◽

Time Control ◽

Time Optimization ◽

Run Time

Contemporary embedded systems may execute multiple applications, potentially concurrently on heterogeneous platforms, with different system workloads (CPU- or memory-intensive or both) leading to different power signatures. This makes finding the most energy-efficient system configuration for each type of workload scenario extremely challenging. This paper proposes a novel run-time optimization approach aiming for maximum power normalized performance under such circumstances. Based on experimenting with PARSEC applications on an Odroid XU-3 and Intel Core i7 platforms, we model power normalized performance (in terms of instruction per second (IPS)/Watt) through multivariate linear regression (MLR). We derive run-time control methods to exploit the models in different ways, trading off optimization results with control overheads. We demonstrate low-cost and low-complexity run-time algorithms that continuously adapt system configuration to improve the IPS/Watt by up to 139% compared to existing approaches.

Download Full-text

Geyser: Energy-Eﬃcient MIPS CPU Core with Fine-Grained Run-Time Power Gating

Handbook of Energy-Aware and Green Computing, Volume 1 ◽

10.1201/b11643-8 ◽

2012 ◽

pp. 67-84

Keyword(s):

Energy Efficient ◽

Power Gating ◽

Fine Grained ◽

Run Time

Download Full-text

Run-time communication bypassing for energy-efficient, low-latency per-core DVFS on Network-on-Chip

23rd IEEE International SOC Conference ◽

10.1109/socc.2010.5784674 ◽

2010 ◽

Cited By ~ 2

Author(s):

Liang Guang ◽

Ethiopia Nigussie ◽

Hannu Tenhunen

Keyword(s):

Energy Efficient ◽

Network On Chip ◽

Low Latency ◽

Run Time ◽

On Chip

Download Full-text

Energy-Efficient Hybrid Unicore Architecture In Future Embedded Chip-Multiprocessor

10.32920/17303795 ◽

2021 ◽

Author(s):

Akram Hadeed

Keyword(s):

Power Consumption ◽

Power Management ◽

Energy Efficient ◽

Access Time ◽

Hybrid Architecture ◽

Operational Parameters ◽

Power Budget ◽

Voltage Frequency ◽

Run Time ◽

On Chip

Recently, technology scaling has enabled the placement of an increasing number of cores, in the form of chip-multiprocessors (CMPs) on a chip and continually shrinking transistor sizes to improve performance. In this context, power consumption has become the main constraint in designing CMPs. As a result, uncore components power consumption taking increasing portion from the on-chip power budget; therefore, designing power management techniques, particularly memory and network-on-chip (NoC) systems, has become an important issue to solve. Consequently, a considerable attention has been directed toward power management based on CMPs components, particularly shared caches and uncore interconnected structures, to overcome the challenges of limited chip power budget.<div>This work targets to design an energy-efficient uncore architecture by using heterogeneity in components (cache cells) and operational parameters (Voltage/Frequency). In order to ensure the minimum impact on the system performance, a run-time approach is investigated to assess the proposed method. An architecture is proposed where the cache layer contains the heterogenous cache banks in all placed in one frequency voltage domain. Average memory access time (AMAT) was selected as a network monitor to monitor the performance on the run-time. The appropriate size and type of the last level cache (LLC) and Voltage/Frequency for the uncore domain is adjusted according to the calculated AMAT which indicates the system demand from the uncore.<br></div><div>The proposed hybrid architecture was implemented, investigated and compared with the a baseline model where only SRAM banks were used in the last level cache. Experimental results on the Princeton Application Repository for Shared-Memory Computers (PARSEC) benchmark suit,show that the proposed architecture yields up to a 40% reduction in overall chip energy-delay product with a marginal performance degradation in average of -1.2% below the baseline one. The best energy saving was 55% and the worse degradation was only 15%.<br></div>

Download Full-text

Run-Time Exploitation of Application Dynamism for Energy-Efficient Exascale Computing (READEX)

2015 IEEE 18th International Conference on Computational Science and Engineering ◽

10.1109/cse.2015.55 ◽

2015 ◽

Cited By ~ 12

Author(s):

Yury Oleynik ◽

Michael Gerndt ◽

Joseph Schuchart ◽

Per Gunnar Kjeldsberg ◽

Wolfgang E. Nagel

Keyword(s):

Energy Efficient ◽

Exascale Computing ◽

Run Time

Download Full-text

Energy efficient on-chip power delivery with run-time voltage regulator clustering

2016 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas.2016.7527464 ◽

2016 ◽

Cited By ~ 5

Author(s):

Divya Pathak ◽

Mohammad Hossein Hajkazemi ◽

Mohammad Khavari Tavana ◽

Houman Homayoun ◽

Ioannis Savidis

Keyword(s):

Energy Efficient ◽

Power Delivery ◽

Voltage Regulator ◽

Run Time ◽

On Chip

Download Full-text