scholarly journals Optimized Memory Allocation and Power Minimization for FPGA-Based Image Processing

2019 ◽  
Vol 5 (1) ◽  
pp. 7 ◽  
Author(s):  
Paulo Garcia ◽  
Deepayan Bhowmik ◽  
Robert Stewart ◽  
Greg Michaelson ◽  
Andrew Wallace

Memory is the biggest limiting factor to the widespread use of FPGAs for high-level image processing, which require complete frame(s) to be stored in situ. Since FPGAs have limited on-chip memory capabilities, efficient use of such resources is essential to meet performance, size and power constraints. In this paper, we investigate allocation of on-chip memory resources in order to minimize resource usage and power consumption, contributing to the realization of power-efficient high-level image processing fully contained on FPGAs. We propose methods for generating memory architectures, from both Hardware Description Languages and High Level Synthesis designs, which minimize memory usage and power consumption. Based on a formalization of on-chip memory configuration options and a power model, we demonstrate how our partitioning algorithms can outperform traditional strategies. Compared to commercial FPGA synthesis and High Level Synthesis tools, our results show that the proposed algorithms can result in up to 60% higher utilization efficiency, increasing the sizes and/or number of frames that can be accommodated, and reduce frame buffers’ dynamic power consumption by up to approximately 70%. In our experiments using Optical Flow and MeanShift Tracking, representative high-level algorithms, data show that partitioning algorithms can reduce total power by up to 25% and 30%, respectively, without impacting performance.

2021 ◽  
Author(s):  
Pallabi Sarkar

High level Synthesis (HLS) or Electronic System Level (ESL) synthesis requires scheduling algorithms that have strong capability to reach optimal/near-optimal solutions with significant rapidity and greater accuracy. A novel power efficient scheduling approach using ‘PI’ method has been presented in this thesis that reduces the final power consumption of the solution at the expenditure of minimal latency clock cycles. The proposed scheduling approach is based on ‘Priority indicator (PI)’ metric and ‘Intersect Matrix’ topology methods that have a tendency to escape local optimal solutions and thereby reach global solutions. Application of the proposed approach results in even distribution of allocated hardware functional units thereby yielding power efficient scheduling solutions. The two main novel and significant aspects of the thesis are: a) Introduction of ‘Intersect Matrix’ topology with its associated algorithm which is used to check for precedence violation during scheduling b) Introduction of PI method using Priority indicator metric that assists in choosing the highest priority node during each iteration of the scheduling optimization process. Comparative analysis of the proposed approach has been done with an existing design space exploration method for qualitative assessment using proposed ‘Quality Cost Factor (Q- metric)’. This Q-metric is a combination of latency and power consumption values for the solution found, which dictates the quality of the final solutions found in terms of cost for both the proposed and existing approaches. An average improvement of approximately 12 % in quality of final solution and average reduction of 59 % in runtime has been achieved by the proposed approach compared to a current scheduling approach for the DSP benchmarks.


2021 ◽  
Author(s):  
Pallabi Sarkar

High level Synthesis (HLS) or Electronic System Level (ESL) synthesis requires scheduling algorithms that have strong capability to reach optimal/near-optimal solutions with significant rapidity and greater accuracy. A novel power efficient scheduling approach using ‘PI’ method has been presented in this thesis that reduces the final power consumption of the solution at the expenditure of minimal latency clock cycles. The proposed scheduling approach is based on ‘Priority indicator (PI)’ metric and ‘Intersect Matrix’ topology methods that have a tendency to escape local optimal solutions and thereby reach global solutions. Application of the proposed approach results in even distribution of allocated hardware functional units thereby yielding power efficient scheduling solutions. The two main novel and significant aspects of the thesis are: a) Introduction of ‘Intersect Matrix’ topology with its associated algorithm which is used to check for precedence violation during scheduling b) Introduction of PI method using Priority indicator metric that assists in choosing the highest priority node during each iteration of the scheduling optimization process. Comparative analysis of the proposed approach has been done with an existing design space exploration method for qualitative assessment using proposed ‘Quality Cost Factor (Q- metric)’. This Q-metric is a combination of latency and power consumption values for the solution found, which dictates the quality of the final solutions found in terms of cost for both the proposed and existing approaches. An average improvement of approximately 12 % in quality of final solution and average reduction of 59 % in runtime has been achieved by the proposed approach compared to a current scheduling approach for the DSP benchmarks.


Nanophotonics ◽  
2020 ◽  
Vol 10 (2) ◽  
pp. 937-945
Author(s):  
Ruihuan Zhang ◽  
Yu He ◽  
Yong Zhang ◽  
Shaohua An ◽  
Qingming Zhu ◽  
...  

AbstractUltracompact and low-power-consumption optical switches are desired for high-performance telecommunication networks and data centers. Here, we demonstrate an on-chip power-efficient 2 × 2 thermo-optic switch unit by using a suspended photonic crystal nanobeam structure. A submilliwatt switching power of 0.15 mW is obtained with a tuning efficiency of 7.71 nm/mW in a compact footprint of 60 μm × 16 μm. The bandwidth of the switch is properly designed for a four-level pulse amplitude modulation signal with a 124 Gb/s raw data rate. To the best of our knowledge, the proposed switch is the most power-efficient resonator-based thermo-optic switch unit with the highest tuning efficiency and data ever reported.


2021 ◽  
Vol 14 (4) ◽  
pp. 1-15
Author(s):  
Zhenghua Gu ◽  
Wenqing Wan ◽  
Jundong Xie ◽  
Chang Wu

Performance optimization is an important goal for High-level Synthesis (HLS). Existing HLS scheduling algorithms are all based on Control and Data Flow Graph (CDFG) and will schedule basic blocks in sequential order. Our study shows that the sequential scheduling order of basic blocks is a big limiting factor for achievable circuit performance. In this article, we propose a Dependency Graph (DG) with two important properties for scheduling. First, DG is a directed acyclic graph. Thus, no loop breaking heuristic is needed for scheduling. Second, DG can be used to identify the exact instruction parallelism. Our experiment shows that DG can lead to 76% instruction parallelism increase over CDFG. Based on DG, we propose a bottom-up scheduling algorithm to achieve much higher instruction parallelism than existing algorithms. Hierarchical state transition graph with guard conditions is proposed for efficient implementation of such high parallelism scheduling. Our experimental results show that our DG-based HLS algorithm can outperform the CDFG-based LegUp and the state-of-the-art industrial tool Vivado HLS by 2.88× and 1.29× on circuit latency, respectively.


Arithmetic Logic Unit (ALU) is the main component in the processors. Most important design consideration in integrated circuit is power. In all the components of ALU data path is the active one and it consumes more percent of power in the total power. In the modern microprocessors it is important to have power efficient data paths. To reduce the power consumption in microprocessors the ALU is designed using PNS-FCR static CMOS logic. In this paper static CMOS logic is used to reduce power consumption. Static technique does not need any clock. So it leads to less power consumption. For the implementation of the ALU with the PNS-FCR static logic mentor graphics tool is used. The power consumption of ALU is compared with and without using FCR. An 8-bit ALU is designed in mentor graphics with 130nm technology. The proposed design methodology gives less power consumption


2021 ◽  
Vol 9 (1) ◽  
pp. 280-287
Author(s):  
Minal Deshmukh, Prasad Khandekar, Nishikant Sadafale

Image Processing is a significantly desirable in commercial, industrial, and medical applications. Processor based architectures are inappropriate for real time applications as Image processing algorithms are quite intensive in terms of computations. To reduce latency and limitation in performance due to limited amount of memory and fixed clock frequency for synthesis in processor-based architecture, FPGA can be used in smart devices for implementing real time image processing applications. To increase speed of real time image processing custom overlays (Hardware Library of programmable logic circuit) can be designed to run on FPGA fabric. The IP core generated by the HLS (High Level Synthesis) can be implemented on a reconfigurable platform which allows effective utilization of channel bandwidth and storage. In this paper we have presented FPGA overlay design for color transformation function using Xilinx’s python productivity board PYNQ-Z2 to get benefit in performance over a traditional processor. Performance comparison of custom overlay on FPGA and Processor based platform shows FPGA execution yields minimum computation time.


2003 ◽  
Vol 12 (01) ◽  
pp. 1-17
Author(s):  
Sungpack Hong ◽  
Taewhan Kim

Sub-micron feature sizes have resulted in a considerable portion of power to be dissipated on the buses, causing an increased attention on savings for power at the behavioral level and the RT level of design. This paper addresses the problem of minimizing power dissipated in the switching of the buses in the high-level synthesis of data-dominated behavioral descriptions. Unlike the previous approaches in which the minimization of the power consumed in buses has not been considered until operation scheduling is completed, our approach integrates the bus binding problem into scheduling to exploit the impact of scheduling on the reduction of power dissipated on the buses more fully and effectively. We accomplish this by formulating the problem into a flow problem in a network, and devising an efficient algorithm which iteratively finds the maximum flow of minimum cost solutions in the network. Experimental results on a number of benchmark problems show that given resource and global timing constraints our designs are 19.8% power-efficient over the designs produced by a random-move based solution, and 15.5% power-efficient over the designs by a clock-step based optimal solution.


Sign in / Sign up

Export Citation Format

Share Document