Just-in-Time Instruction Set Extension - Feasibility and Limitations for an FPGA-Based Reconfigurable ASIP Architecture

On the Feasibility and Limitations of Just-in-Time Instruction Set Extension for FPGA-Based Reconfigurable Processors

International Journal of Reconfigurable Computing ◽

10.1155/2012/418315 ◽

2012 ◽

Vol 2012 ◽

pp. 1-21 ◽

Cited By ~ 3

Author(s):

Mariusz Grad ◽

Christian Plessl

Keyword(s):

Scientific Computing ◽

Just In Time ◽

Use Case ◽

Instruction Set ◽

Embedded Computing ◽

Custom Instruction ◽

Reconfigurable Processors ◽

Instruction Set Extension ◽

Instruction Identification ◽

Instruction Set Processors

Reconfigurable instruction set processors provide the possibility of tailor the instruction set of a CPU to a particular application. While this customization process could be performed during runtime in order to adapt the CPU to the currently executed workload, this use case has been hardly investigated. In this paper, we study the feasibility of moving the customization process to runtime and evaluate the relation of the expected speedups and the associated overheads. To this end, we present a tool flow that is tailored to the requirements of this just-in-time ASIP specialization scenario. We evaluate our methods by targeting our previously introduced Woolcano reconfigurable ASIP architecture for a set of applications from the SPEC2006, SPEC2000, MiBench, and SciMark2 benchmark suites. Our results show that just-in-time ASIP specialization is promising for embedded computing applications, where average speedups of 5x can be achieved by spending 50 minutes for custom instruction identification and hardware generation. These overheads will be compensated if the applications execute for more than 2 hours. For the scientific computing benchmarks, the achievable speedup is only 1.2x, which requires significant execution times in the order of days to amortize the overheads.

Download Full-text

Efficiently parallelizing instruction set simulation of embedded multi-core processors using region-based just-in-time dynamic binary translation

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems - LCTES '12 ◽

10.1145/2248418.2248422 ◽

2012 ◽

Cited By ~ 5

Author(s):

Stephen Kyle ◽

Igor Böhm ◽

Björn Franke ◽

Hugh Leather ◽

Nigel Topham

Keyword(s):

Just In Time ◽

Instruction Set ◽

Binary Translation ◽

Time Dynamic ◽

Dynamic Binary Translation

Download Full-text

Instruction-set-extension exploration using decomposable heuristic search

19th International Conference on VLSI Design held jointly with 5th International Conference on Embedded Systems Design (VLSID'06) ◽

10.1109/vlsid.2006.106 ◽

2006 ◽

Author(s):

S. Das ◽

P.P. Chakrabarti ◽

P. Dasgupta

Keyword(s):

Heuristic Search ◽

Instruction Set ◽

Instruction Set Extension

Download Full-text

An exploration of mechanisms for dynamic cryptographic instruction set extension

Journal of Cryptographic Engineering ◽

10.1007/s13389-011-0025-8 ◽

2012 ◽

Vol 2 (1) ◽

pp. 1-18 ◽

Cited By ~ 1

Author(s):

P. Grabher ◽

J. Großschädl ◽

S. Hoerder ◽

K. Järvinen ◽

D. Page ◽

...

Keyword(s):

Instruction Set ◽

Instruction Set Extension

Download Full-text

Instruction set extension for high throughput disparity estimation in stereo image processing

ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors ◽

10.1109/asap.2011.6043265 ◽

2011 ◽

Cited By ~ 12

Author(s):

Christian Banz ◽

Carsten Dolar ◽

Fabian Cholewa ◽

Holger Blume

Keyword(s):

Image Processing ◽

High Throughput ◽

Disparity Estimation ◽

Stereo Image ◽

Instruction Set ◽

Instruction Set Extension ◽

Stereo Image Processing

Download Full-text

GNSS-ISE: Instruction Set Extension for GNSS Baseband Processing

Sensors ◽

10.3390/s20020465 ◽

2020 ◽

Vol 20 (2) ◽

pp. 465 ◽

Cited By ~ 1

Author(s):

Krzysztof Marcinek ◽

Witold A. Pleskacz

Keyword(s):

Flash Memory ◽

Low Cost ◽

Satellite System ◽

Microprocessor System ◽

Instruction Set ◽

Advantages And Disadvantages ◽

Wide Range ◽

Instruction Set Extension ◽

Baseband Processing ◽

Global Navigation Satellite

This work presents the results of research toward designing an instruction set extension dedicated to Global Navigation Satellite System (GNSS) baseband processing. The paper describes the state-of-the-art techniques of GNSS receiver implementation. Their advantages and disadvantages are discussed. Against this background, a new versatile instruction set extension for GNSS baseband processing is presented. The authors introduce improved mechanisms for instruction set generation focused on multi-channel processing. The analytical approach used by the authors leads to the introduction of a GNSS-instruction set extension (ISE) for GNSS baseband processing. The developed GNSS-ISE is simulated extensively using PC software and field-programmable gate array (FPGA) emulation. Finally, the developed GNSS-ISE is incorporated into the first-in-the-world, according to the authors’ best knowledge, integrated, multi-frequency, and multi-constellation microcontroller with embedded flash memory. Additionally, this microcontroller may serve as an application processor, which is a unique feature. The presented results show the feasibility of implementing the GNSS-ISE into an embedded microprocessor system and its capability of performing baseband processing. The developed GNSS-ISE can be implemented in a wide range of applications including smart IoT (internet of things) devices or remote sensors, fostering the adaptation of multi-frequency and multi-constellation GNSS receivers to the low-cost consumer mass-market.

Download Full-text

Lightweight Cryptographic Instruction Set Extension on Xtensa Processor

2020 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas45731.2020.9180579 ◽

2020 ◽

Author(s):

Gabriel H. Eisenkraemer ◽

Fernando G. Moraes ◽

Leonardo L. de Oliveira ◽

Everton Carara

Keyword(s):

Instruction Set ◽

Instruction Set Extension

Download Full-text

Hardware Acceleration for RLNC: A Case Study Based on the Xtensa Processor with the Tensilica Instruction-Set Extension

Electronics ◽

10.3390/electronics7090180 ◽

2018 ◽

Vol 7 (9) ◽

pp. 180 ◽

Cited By ~ 2

Author(s):

Javier Acevedo ◽

Robert Scheffel ◽

Simon Wunderlich ◽

Mattis Hasler ◽

Sreekrishna Pandi ◽

...

Keyword(s):

Hardware Acceleration ◽

Code Word ◽

Instruction Set ◽

Linear Network ◽

Galois Fields ◽

Linear Network Coding ◽

Multiple Data ◽

Instruction Set Extension ◽

Energy Constrained

Random linear network coding (RLNC) can greatly aid data transmission in lossy wireless networks. However, RLNC requires computationally complex matrix multiplications and inversions in finite fields (Galois fields). These computations are highly demanding for energy-constrained mobile devices. The presented case study evaluates hardware acceleration strategies for RLNC in the context of the Tensilica Xtensa LX5 processor with the tensilica instruction set extension (TIE). More specifically, we develop TIEs for multiply-accumulate (MAC) operations for accelerating matrix multiplications in Galois fields, single instruction multiple data (SIMD) instructions operating on consecutive memory locations, as well as the flexible-length instruction extension (FLIX). We evaluate the number of clock cycles required for RLNC encoding and decoding without and with the MAC, SIMD, and FLIX acceleration strategies. We also evaluate the RLNC encoding and decoding throughput and energy consumption for a range of RLNC generation and code word sizes. We find that for GF ( 2 8 ) and GF ( 2 16 ) RLNC encoding, the SIMD and FLIX acceleration strategies achieve speedups of approximately four hundred fold compared to a benchmark C code implementation without TIE. We also find that the unicore Xtensa LX5 with SIMD has seven to thirty times higher RLNC encoding and decoding throughput than the state-of-the-art ODROID XU3 system-on-a-chip (SoC) operating with a single core; the Xtensa LX5 with FLIX, in turn, increases the throughput by roughly 25% compared to utilizing only SIMD. Furthermore, the Xtensa LX5 with FLIX consumes roughly four orders of magnitude less energy than the ODROID XU3 SoC.

Download Full-text