Just-in-Time Instruction Set Extension - Feasibility and Limitations for an FPGA-Based Reconfigurable ASIP Architecture

Author(s):  
Mariusz Grad ◽  
Christian Plessl
2012 ◽  
Vol 2012 ◽  
pp. 1-21 ◽  
Author(s):  
Mariusz Grad ◽  
Christian Plessl

Reconfigurable instruction set processors provide the possibility of tailor the instruction set of a CPU to a particular application. While this customization process could be performed during runtime in order to adapt the CPU to the currently executed workload, this use case has been hardly investigated. In this paper, we study the feasibility of moving the customization process to runtime and evaluate the relation of the expected speedups and the associated overheads. To this end, we present a tool flow that is tailored to the requirements of this just-in-time ASIP specialization scenario. We evaluate our methods by targeting our previously introduced Woolcano reconfigurable ASIP architecture for a set of applications from the SPEC2006, SPEC2000, MiBench, and SciMark2 benchmark suites. Our results show that just-in-time ASIP specialization is promising for embedded computing applications, where average speedups of 5x can be achieved by spending 50 minutes for custom instruction identification and hardware generation. These overheads will be compensated if the applications execute for more than 2 hours. For the scientific computing benchmarks, the achievable speedup is only 1.2x, which requires significant execution times in the order of days to amortize the overheads.


2012 ◽  
Vol 2 (1) ◽  
pp. 1-18 ◽  
Author(s):  
P. Grabher ◽  
J. Großschädl ◽  
S. Hoerder ◽  
K. Järvinen ◽  
D. Page ◽  
...  

Sensors ◽  
2020 ◽  
Vol 20 (2) ◽  
pp. 465 ◽  
Author(s):  
Krzysztof Marcinek ◽  
Witold A. Pleskacz

This work presents the results of research toward designing an instruction set extension dedicated to Global Navigation Satellite System (GNSS) baseband processing. The paper describes the state-of-the-art techniques of GNSS receiver implementation. Their advantages and disadvantages are discussed. Against this background, a new versatile instruction set extension for GNSS baseband processing is presented. The authors introduce improved mechanisms for instruction set generation focused on multi-channel processing. The analytical approach used by the authors leads to the introduction of a GNSS-instruction set extension (ISE) for GNSS baseband processing. The developed GNSS-ISE is simulated extensively using PC software and field-programmable gate array (FPGA) emulation. Finally, the developed GNSS-ISE is incorporated into the first-in-the-world, according to the authors’ best knowledge, integrated, multi-frequency, and multi-constellation microcontroller with embedded flash memory. Additionally, this microcontroller may serve as an application processor, which is a unique feature. The presented results show the feasibility of implementing the GNSS-ISE into an embedded microprocessor system and its capability of performing baseband processing. The developed GNSS-ISE can be implemented in a wide range of applications including smart IoT (internet of things) devices or remote sensors, fostering the adaptation of multi-frequency and multi-constellation GNSS receivers to the low-cost consumer mass-market.


Author(s):  
Gabriel H. Eisenkraemer ◽  
Fernando G. Moraes ◽  
Leonardo L. de Oliveira ◽  
Everton Carara

Electronics ◽  
2018 ◽  
Vol 7 (9) ◽  
pp. 180 ◽  
Author(s):  
Javier Acevedo ◽  
Robert Scheffel ◽  
Simon Wunderlich ◽  
Mattis Hasler ◽  
Sreekrishna Pandi ◽  
...  

Random linear network coding (RLNC) can greatly aid data transmission in lossy wireless networks. However, RLNC requires computationally complex matrix multiplications and inversions in finite fields (Galois fields). These computations are highly demanding for energy-constrained mobile devices. The presented case study evaluates hardware acceleration strategies for RLNC in the context of the Tensilica Xtensa LX5 processor with the tensilica instruction set extension (TIE). More specifically, we develop TIEs for multiply-accumulate (MAC) operations for accelerating matrix multiplications in Galois fields, single instruction multiple data (SIMD) instructions operating on consecutive memory locations, as well as the flexible-length instruction extension (FLIX). We evaluate the number of clock cycles required for RLNC encoding and decoding without and with the MAC, SIMD, and FLIX acceleration strategies. We also evaluate the RLNC encoding and decoding throughput and energy consumption for a range of RLNC generation and code word sizes. We find that for GF ( 2 8 ) and GF ( 2 16 ) RLNC encoding, the SIMD and FLIX acceleration strategies achieve speedups of approximately four hundred fold compared to a benchmark C code implementation without TIE. We also find that the unicore Xtensa LX5 with SIMD has seven to thirty times higher RLNC encoding and decoding throughput than the state-of-the-art ODROID XU3 system-on-a-chip (SoC) operating with a single core; the Xtensa LX5 with FLIX, in turn, increases the throughput by roughly 25% compared to utilizing only SIMD. Furthermore, the Xtensa LX5 with FLIX consumes roughly four orders of magnitude less energy than the ODROID XU3 SoC.


Sign in / Sign up

Export Citation Format

Share Document