Optimized Memory Allocation and Power Minimization for FPGA-Based Image Processing

High level Synthesis (HLS) or Electronic System Level (ESL) synthesis requires scheduling algorithms that have strong capability to reach optimal/near-optimal solutions with significant rapidity and greater accuracy. A novel power efficient scheduling approach using ‘PI’ method has been presented in this thesis that reduces the final power consumption of the solution at the expenditure of minimal latency clock cycles. The proposed scheduling approach is based on ‘Priority indicator (PI)’ metric and ‘Intersect Matrix’ topology methods that have a tendency to escape local optimal solutions and thereby reach global solutions. Application of the proposed approach results in even distribution of allocated hardware functional units thereby yielding power efficient scheduling solutions. The two main novel and significant aspects of the thesis are: a) Introduction of ‘Intersect Matrix’ topology with its associated algorithm which is used to check for precedence violation during scheduling b) Introduction of PI method using Priority indicator metric that assists in choosing the highest priority node during each iteration of the scheduling optimization process. Comparative analysis of the proposed approach has been done with an existing design space exploration method for qualitative assessment using proposed ‘Quality Cost Factor (Q- metric)’. This Q-metric is a combination of latency and power consumption values for the solution found, which dictates the quality of the final solutions found in terms of cost for both the proposed and existing approaches. An average improvement of approximately 12 % in quality of final solution and average reduction of 59 % in runtime has been achieved by the proposed approach compared to a current scheduling approach for the DSP benchmarks.

Download Full-text

Power Efficient Rapid Design Space Exploration of Integrated Scheduling and Module Selection in High Level Synthesis

10.32920/ryerson.14644968.v1 ◽

2021 ◽

Author(s):

Pallabi Sarkar

Keyword(s):

Power Consumption ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

High Level Synthesis ◽

Optimal Solutions ◽

Power Efficient ◽

Pi Method ◽

High Level

High level Synthesis (HLS) or Electronic System Level (ESL) synthesis requires scheduling algorithms that have strong capability to reach optimal/near-optimal solutions with significant rapidity and greater accuracy. A novel power efficient scheduling approach using ‘PI’ method has been presented in this thesis that reduces the final power consumption of the solution at the expenditure of minimal latency clock cycles. The proposed scheduling approach is based on ‘Priority indicator (PI)’ metric and ‘Intersect Matrix’ topology methods that have a tendency to escape local optimal solutions and thereby reach global solutions. Application of the proposed approach results in even distribution of allocated hardware functional units thereby yielding power efficient scheduling solutions. The two main novel and significant aspects of the thesis are: a) Introduction of ‘Intersect Matrix’ topology with its associated algorithm which is used to check for precedence violation during scheduling b) Introduction of PI method using Priority indicator metric that assists in choosing the highest priority node during each iteration of the scheduling optimization process. Comparative analysis of the proposed approach has been done with an existing design space exploration method for qualitative assessment using proposed ‘Quality Cost Factor (Q- metric)’. This Q-metric is a combination of latency and power consumption values for the solution found, which dictates the quality of the final solutions found in terms of cost for both the proposed and existing approaches. An average improvement of approximately 12 % in quality of final solution and average reduction of 59 % in runtime has been achieved by the proposed approach compared to a current scheduling approach for the DSP benchmarks.

Download Full-text

High-level synthesis for medical image processing on Systems on Chip: A case study

2016 26th International Conference on Field Programmable Logic and Applications (FPL) ◽

10.1109/fpl.2016.7577390 ◽

2016 ◽

Cited By ~ 2

Author(s):

Fraser D Robinson ◽

Louise H Crockett ◽

William H Nailon ◽

Robert W Stewart

Keyword(s):

Image Processing ◽

Medical Image ◽

Medical Image Processing ◽

High Level Synthesis ◽

Systems On Chip ◽

On Chip ◽

High Level

Download Full-text

Ultracompact and low-power-consumption silicon thermo-optic switch for high-speed data

Nanophotonics ◽

10.1515/nanoph-2020-0496 ◽

2020 ◽

Vol 10 (2) ◽

pp. 937-945

Author(s):

Ruihuan Zhang ◽

Yu He ◽

Yong Zhang ◽

Shaohua An ◽

Qingming Zhu ◽

...

Keyword(s):

Power Consumption ◽

Low Power ◽

High Speed ◽

High Performance ◽

Pulse Amplitude ◽

Telecommunication Networks ◽

Low Power Consumption ◽

Power Efficient ◽

High Speed Data ◽

On Chip

AbstractUltracompact and low-power-consumption optical switches are desired for high-performance telecommunication networks and data centers. Here, we demonstrate an on-chip power-efficient 2 × 2 thermo-optic switch unit by using a suspended photonic crystal nanobeam structure. A submilliwatt switching power of 0.15 mW is obtained with a tuning efficiency of 7.71 nm/mW in a compact footprint of 60 μm × 16 μm. The bandwidth of the switch is properly designed for a four-level pulse amplitude modulation signal with a 124 Gb/s raw data rate. To the best of our knowledge, the proposed switch is the most power-efficient resonator-based thermo-optic switch unit with the highest tuning efficiency and data ever reported.

Download Full-text

Dependency Graph-based High-level Synthesis for Maximum Instruction Parallelism

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3468875 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1-15

Author(s):

Zhenghua Gu ◽

Wenqing Wan ◽

Jundong Xie ◽

Chang Wu

Keyword(s):

Performance Optimization ◽

Directed Acyclic Graph ◽

Scheduling Algorithm ◽

Dependency Graph ◽

High Level Synthesis ◽

Limiting Factor ◽

Circuit Performance ◽

State Transition Graph ◽

High Level ◽

Basic Blocks

Performance optimization is an important goal for High-level Synthesis (HLS). Existing HLS scheduling algorithms are all based on Control and Data Flow Graph (CDFG) and will schedule basic blocks in sequential order. Our study shows that the sequential scheduling order of basic blocks is a big limiting factor for achievable circuit performance. In this article, we propose a Dependency Graph (DG) with two important properties for scheduling. First, DG is a directed acyclic graph. Thus, no loop breaking heuristic is needed for scheduling. Second, DG can be used to identify the exact instruction parallelism. Our experiment shows that DG can lead to 76% instruction parallelism increase over CDFG. Based on DG, we propose a bottom-up scheduling algorithm to achieve much higher instruction parallelism than existing algorithms. Hierarchical state transition graph with guard conditions is proposed for efficient implementation of such high parallelism scheduling. Our experimental results show that our DG-based HLS algorithm can outperform the CDFG-based LegUp and the state-of-the-art industrial tool Vivado HLS by 2.88× and 1.29× on circuit latency, respectively.

Download Full-text

A New ALU Design using PNS-FCR: Static CMOS Logic for Microprocessors

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1215.0886s219 ◽

2019 ◽

Vol 8 (6S2) ◽

pp. 876-882

Keyword(s):

Power Consumption ◽

Integrated Circuit ◽

Design Methodology ◽

Total Power ◽

Data Path ◽

Power Efficient ◽

Efficient Data ◽

Main Component ◽

Static Logic ◽

Important Design

Arithmetic Logic Unit (ALU) is the main component in the processors. Most important design consideration in integrated circuit is power. In all the components of ALU data path is the active one and it consumes more percent of power in the total power. In the modern microprocessors it is important to have power efficient data paths. To reduce the power consumption in microprocessors the ALU is designed using PNS-FCR static CMOS logic. In this paper static CMOS logic is used to reduce power consumption. Static technique does not need any clock. So it leads to less power consumption. For the implementation of the ALU with the PNS-FCR static logic mentor graphics tool is used. The power consumption of ALU is compared with and without using FCR. An 8-bit ALU is designed in mentor graphics with 130nm technology. The proposed design methodology gives less power consumption

Download Full-text

AN EFFICIENT FPGA OVERLAY FOR COLOR TRANSFORMATION FUNCTION USING HIGH LEVEL SYNTHESIS

INFORMATION TECHNOLOGY IN INDUSTRY ◽

10.17762/itii.v9i1.130 ◽

2021 ◽

Vol 9 (1) ◽

pp. 280-287

Author(s):

Minal Deshmukh, Prasad Khandekar, Nishikant Sadafale

Keyword(s):

Image Processing ◽

Real Time ◽

Transformation Function ◽

High Level Synthesis ◽

Smart Devices ◽

Real Time Image Processing ◽

Real Time Image ◽

Color Transformation ◽

High Level ◽

Time Image

Image Processing is a significantly desirable in commercial, industrial, and medical applications. Processor based architectures are inappropriate for real time applications as Image processing algorithms are quite intensive in terms of computations. To reduce latency and limitation in performance due to limited amount of memory and fixed clock frequency for synthesis in processor-based architecture, FPGA can be used in smart devices for implementing real time image processing applications. To increase speed of real time image processing custom overlays (Hardware Library of programmable logic circuit) can be designed to run on FPGA fabric. The IP core generated by the HLS (High Level Synthesis) can be implemented on a reconfigurable platform which allows effective utilization of channel bandwidth and storage. In this paper we have presented FPGA overlay design for color transformation function using Xilinx’s python productivity board PYNQ-Z2 to get benefit in performance over a traditional processor. Performance comparison of custom overlay on FPGA and Processor based platform shows FPGA execution yields minimum computation time.

Download Full-text

An image processing library for C-based high-level synthesis

2014 24th International Conference on Field Programmable Logic and Applications (FPL) ◽

10.1109/fpl.2014.6927424 ◽

2014 ◽

Cited By ~ 8

Author(s):

Moritz Schmid ◽

Nicolas Apelt ◽

Frank Hannig ◽

Jurgen Teich

Keyword(s):

Image Processing ◽

High Level Synthesis ◽

High Level

Download Full-text

Communication-centric high level synthesis metrics for low vertical channel density 3-dimensional Networks-on-Chip

7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC) ◽

10.1109/recosoc.2012.6322897 ◽

2012 ◽

Cited By ~ 2

Author(s):

Haoyuan Ying ◽

Thomas Hollstein ◽

Klaus Hofmann

Keyword(s):

Vertical Channel ◽

High Level Synthesis ◽

Networks On Chip ◽

3 Dimensional ◽

Channel Density ◽

On Chip ◽

High Level

Download Full-text

Bus Optimization for Low Power in High-Level Synthesis

Journal of Circuits System and Computers ◽

10.1142/s0218126603000829 ◽

2003 ◽

Vol 12 (01) ◽

pp. 1-17

Author(s):

Sungpack Hong ◽

Taewhan Kim

Keyword(s):

Optimal Solution ◽

Minimum Cost ◽

Maximum Flow ◽

High Level Synthesis ◽

Benchmark Problems ◽

Timing Constraints ◽

Power Efficient ◽

High Level ◽

The Impact ◽

Operation Scheduling

Sub-micron feature sizes have resulted in a considerable portion of power to be dissipated on the buses, causing an increased attention on savings for power at the behavioral level and the RT level of design. This paper addresses the problem of minimizing power dissipated in the switching of the buses in the high-level synthesis of data-dominated behavioral descriptions. Unlike the previous approaches in which the minimization of the power consumed in buses has not been considered until operation scheduling is completed, our approach integrates the bus binding problem into scheduling to exploit the impact of scheduling on the reduction of power dissipated on the buses more fully and effectively. We accomplish this by formulating the problem into a flow problem in a network, and devising an efficient algorithm which iteratively finds the maximum flow of minimum cost solutions in the network. Experimental results on a number of benchmark problems show that given resource and global timing constraints our designs are 19.8% power-efficient over the designs produced by a random-move based solution, and 15.5% power-efficient over the designs by a clock-step based optimal solution.

Download Full-text