Area Optimisation for Field-Programmable Gate Arrays in SystemC Hardware Compilation

International Journal of Reconfigurable Computing ◽

10.1155/2008/674340 ◽

2008 ◽

Vol 2008 ◽

pp. 1-14

Author(s):

Johan Ditmar ◽

Steve McKeever ◽

Alex Wilson

Keyword(s):

Clock Cycle ◽

Gate Arrays ◽

Field Programmable ◽

Separate Block ◽

Programmable Gate Arrays ◽

Source Level ◽

High Level ◽

Specific Implementation ◽

Mapping Arrays ◽

Language Construct

This paper discusses a pair of synthesis algorithms that optimise a SystemC design to minimise area when targeting FPGAs. Each can significantly improve the synthesis of a high-level language construct, thus allowing a designer to concentrate more on an algorithm description and less on hardware-specific implementation details. The first algorithm is a source-level transformation implementing function exlining—where a separate block of hardware implements a function and is shared between multiple calls to the function. The second is a novel algorithm for mapping arrays to memories which involves assigning array accesses to memory ports such that no port is ever accessed more than once in a clock cycle. This algorithm assigns accesses to read/write only ports and read-write ports concurrently, solving the assignment problem more efficiently for a wider range of memories compared to existing methods. Both optimisations operate on a high-level program representation and have been implemented in a commercial SystemC compiler. Experiments show that in suitable circumstances these techniques result in significant reductions in logic utilisation for FPGAs.

Download Full-text

AN EVOLUTIONARY ALGORITHM FOR THE ALLOCATION PROBLEM IN HIGH-LEVEL SYNTHESIS

Journal of Circuits System and Computers ◽

10.1142/s0218126605002362 ◽

2005 ◽

Vol 14 (02) ◽

pp. 347-366 ◽

Cited By ~ 3

Author(s):

HAIDAR M. HARMANANI ◽

RONY SALIBA

Keyword(s):

Evolutionary Algorithm ◽

Allocation Problem ◽

High Level Synthesis ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Functional Units ◽

High Level ◽

The Cost ◽

Short Time

This paper presents an evolutionary algorithm to solve the datapath allocation problem in high-level synthesis. The method performs allocation of functional units, registers, and multiplexers in addition to controller synthesis with the objective of minimizing the cost of hardware resources. The system handles multicycle functional units as well as structural pipelining. The proposed method was implemented using C++ on a Linux workstation. We tested our method on a set of high-level synthesis benchmarks, all yielding good solutions in a short time. An integration path to Field Programmable Gate Arrays (FPGAs) is provided through VHDL.

Download Full-text

A Gracefully Degrading and Energy-Efﬁcient FPGA Programming using LabVIEW

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v5.i3.pp165-175 ◽

2016 ◽

Vol 5 (3) ◽

pp. 165

Author(s):

B. Naresh Kumar Reddy ◽

N. Suresh ◽

J.V.N. Ramesh

Keyword(s):

Complex Signal ◽

Graphical Language ◽

Gate Arrays ◽

Signal Flow Graphs ◽

Field Programmable ◽

Signal Processing Algorithms ◽

Programmable Gate Arrays ◽

Labview Fpga ◽

High Level ◽

Flow Graphs

<p>Programming of Field Programmable Gate Arrays (FPGAs) have long been the domain of engineers with VHDL or Verilog expertise. FPGA’s have caught the attention of algorithm developers and communication researchers, who want to use FPGAs to instantiate systems or implement DSP algorithms. These efforts however, are often stifled by the complexities of programming FPGAs. RTL programming in either VHDL or Verilog is generally not a high level of abstraction needed to represent the world of signal flow graphs and complex signal processing algorithms. This paper describes the FPGA Programs using Graphical Language rather than Verilog, VHDL with the help of LabVIEW and features of the LabVIEW FPGA environment.</p>

Download Full-text

Image and video processing platform for field programmable gate arrays using a high-level synthesis

IET Computers & Digital Techniques ◽

10.1049/iet-cdt.2011.0156 ◽

2012 ◽

Vol 6 (6) ◽

pp. 414-425 ◽

Cited By ~ 7

Author(s):

C. Desmouliers ◽

F.M. Vallina ◽

S. Aslan ◽

J. Saniie ◽

E. Oruklu

Keyword(s):

Video Processing ◽

Field Programmable Gate Arrays ◽

High Level Synthesis ◽

Gate Arrays ◽

Image And Video Processing ◽

Field Programmable ◽

Programmable Gate Arrays ◽

High Level ◽

Processing Platform

Download Full-text

Combining Multiple Optimized FPGA-based Pulsar Search Modules Using OpenCL

Journal of Astronomical Instrumentation ◽

10.1142/s2251171719500089 ◽

2019 ◽

Vol 08 (03) ◽

pp. 1950008 ◽

Cited By ~ 1

Author(s):

Haomiao Wang ◽

Prabu Thiagaraj ◽

Oliver Sinnen

Keyword(s):

High Speed ◽

Design Space ◽

Hardware Accelerators ◽

Gate Arrays ◽

Fast Prototyping ◽

Multiple Input ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Square Kilometer Array ◽

High Level

Field-Programmable Gate Arrays (FPGAs) are widely used in the central signal processing design of the Square Kilometer Array (SKA) as hardware accelerators. The frequency domain acceleration search (FDAS) module is an important part of the SKA1-MID pulsar search engine. To develop for a yet to be finalized hardware, for cross-discipline interoperability and to achieve fast prototyping, OpenCL as a high-level FPGA synthesis approaches employed to create the sub-modules of FDAS. The FT convolution and the harmonic-summing plus some other minor sub-modules are elements in the FDAS module that have been well-optimized separately before. In this paper, we explore the design space of combining well-optimized designs, dealing with the ensuing need to trade-off and compromise. Pipeline computing is employed to handle multiple input arrays at high speed. The hardware target is to employ multiple high-end FPGAs to process the combined FDAS module. The results show interesting consequences, where the best individual solutions are not necessarily the best solutions for the speed of a pipeline where FPGA resources and memory bandwidth need to be shared. By proposing multiple buffering techniques to the pipeline, the combined FDAS module can achieve up to 2[Formula: see text] speedup over implementations without pipeline computing. We perform an extensive experimental evaluation on multiple high-end FPGA cards hosted in a workstation and compare to a technology comparable mid-range GPU.

Download Full-text

Zi-CAM: A Power and Resource Efficient Binary Content-Addressable Memory on FPGAs

Electronics ◽

10.3390/electronics8050584 ◽

2019 ◽

Vol 8 (5) ◽

pp. 584 ◽

Cited By ~ 3

Author(s):

Muhammad Irfan ◽

Zahid Ullah ◽

Ray C. C. Cheung

Keyword(s):

Clock Cycle ◽

Random Access ◽

Packet Classification ◽

Switching Activity ◽

Content Addressable Memory ◽

Gate Arrays ◽

Lut Block ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Configurable Hardware

Content-addressable memory (CAM) is a type of associative memory, which returns the address of a given search input in one clock cycle. Many designs are available to emulate the CAM functionality inside the re-configurable hardware, field-programmable gate arrays (FPGAs), using static random-access memory (SRAM) and flip-flops. FPGA-based CAMs are becoming popular due to the rapid growth in software defined networks (SDNs), which uses CAM for packet classification. Emulated designs of CAM consume much dynamic power owing to a high amount of switching activity and computation involved in finding the address of the search key. In this paper, we present a power and resource efficient binary CAM architecture, Zi-CAM, which consumes less power and uses fewer resources than the available architectures of SRAM-based CAM on FPGAs. Zi-CAM consists of two main blocks. RAM block (RB) is activated when there is a sequence of repeating zeros in the input search word; otherwise, lookup tables (LUT) block (LB) is activated. Zi-CAM is implemented on Xilinx Virtex-6 FPGA for the size 64 × 36 which improved power consumption and hardware cost by 30 and 32%, respectively, compared to the available FPGA-based CAMs.

Download Full-text

An Efficient approach for Design and Testing of FPGA Programming using LabVIEW

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v4.i3.pp192-200 ◽

2015 ◽

Vol 4 (3) ◽

pp. 192 ◽

Cited By ~ 1

Author(s):

Naresh Kumar Reddy ◽

N. Suresh

Keyword(s):

Complex Signal ◽

Gate Arrays ◽

Signal Flow Graphs ◽

Field Programmable ◽

Signal Processing Algorithms ◽

Programmable Gate Arrays ◽

Labview Fpga ◽

High Level ◽

Flow Graphs ◽

Design And Testing

Programming of Field Programmable Gate Arrays (FPGAs) have long been the domain of engineers with VHDL or Verilog expertise.FPGA’s have caught the attention of algorithm developers and communication researchers, who want to use FPGAs to instantiate systems or implement DSP algorithms. These efforts however, are often stifled by the complexities of programming FPGAs. RTL programming in either VHDL or Verilog is generally not a high level of abstraction needed to represent the world of signal flow graphs and complex signal processing algorithms. This paper describes the FPGA Programs using Graphical Language rather than Verilog, VHDL with the help of LabVIEW and features of the LabVIEW FPGA environment.

Download Full-text

Comparison of Different Design Alternatives for Hardware-in-the-Loop of Power Converters

Electronics ◽

10.3390/electronics10080926 ◽

2021 ◽

Vol 10 (8) ◽

pp. 926

Author(s):

Elyas Zamiri ◽

Alberto Sanchez ◽

Marina Yushkova ◽

Maria Sofia Martínez-García ◽

Angel de Castro

Keyword(s):

Ad Hoc ◽

Power Converters ◽

General Purpose ◽

Hardware In The Loop ◽

Gate Arrays ◽

Design Alternatives ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Hardware Description ◽

High Level

This paper aims to compare different design alternatives of hardware-in-the-loop (HIL) for emulating power converters in Field Programmable Gate Arrays (FPGAs). It proposes various numerical formats (fixed and floating-point) and different approaches (pure VHSIC Hardware Description Language (VHDL), Intellectual Properties (IPs), automated MATLAB HDL code, and High-Level Synthesis (HLS)) to design power converters. Although the proposed models are simple power electronics HIL systems, the idea can be extended to any HIL system. This study compares the design effort of different coding methods and numerical formats considering possible synthesis tools (Precision and Vivado), and it comprises an analytical discussion in terms of area and speed. The different models are synthesized as ad-hoc modules in general-purpose FPGAs, but also using the NI myRIO device as an example of a commercial tool capable of implementing HIL models. The comparison confirms that the optimum design alternative must be chosen based on the application (complexity, frequency, etc.) and designers’ constraints, such as available area, coding expertise, and design effort.

Download Full-text

A novel addressing algorithm of radix-2 FFT using single-bank dual-port memory

Circuit World ◽

10.1108/cw-06-2020-0108 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Zeynep Kaya ◽

Erol Seke

Keyword(s):

Design Methodology ◽

Clock Cycle ◽

Memory Location ◽

Content Type ◽

Gate Arrays ◽

Memory Block ◽

Fft Processor ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Single Block

Purpose This paper aims to present a single-block memory-based FFT processor design with a conflict-free addressing scheme for field-programmable gate arrays FPGAs with dual-port block memories. This study aims for a single-block dual-port memory-based N-point radix-2 FFT design that uses memory locations and spending minimum clock cycle. Design/methodology/approach A new memory-based Fast Fourier Transform (FFT) design that uses a dual-port memory block is proposed. Dual-port memory allows the design to perform two memory reads and writes in a single clock cycle. This approach achieves low operational clock and smallest memory simultaneously, excluding some small overhead for exceptional address changes. The methodology is to read from while writing to a memory location, eliminating the need for excess memory and additional clock cycles. Findings With the minimum memory size and the simplest architecture, radix-2 FFT and single-memory block are used. The number of clock pulses spent for all FFT operations does not provide much advantage for low-point FFT operations but is important for high-point FFT operations. With the developed algorithm, N memory is used, and the number of clock pulses spent for all FFT stages is (N/2 +1)log2N for all FFT operations. Originality/value This is an original paper, which has simultaneously in whole or in part been submitted anywhere else.

Download Full-text

A High-Level Synthesis Scheduling and Binding Heuristic for FPGA Fault Tolerance

International Journal of Reconfigurable Computing ◽

10.1155/2017/5419767 ◽

2017 ◽

Vol 2017 ◽

pp. 1-17 ◽

Cited By ~ 1

Author(s):

David Wilson ◽

Aniruddha Shastri ◽

Greg Stitt

Keyword(s):

Fault Tolerance ◽

High Energy ◽

High Level Synthesis ◽

Computing Systems ◽

Gate Arrays ◽

Resource Requirements ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Modular Redundancy ◽

High Level

Computing systems with field-programmable gate arrays (FPGAs) often achieve fault tolerance in high-energy radiation environments via triple-modular redundancy (TMR) and configuration scrubbing. Although effective, TMR suffers from a 3x area overhead, which can be prohibitive for many embedded usage scenarios. Furthermore, this overhead is often worsened because TMR often has to be applied to existing register-transfer-level (RTL) code that designers created without considering the triplicated resource requirements. Although a designer could redesign the RTL code to reduce resources, modifying RTL schedules and resource allocations is a time-consuming and error-prone process. In this paper, we present a more transparent high-level synthesis approach that uses scheduling and binding to provide attractive tradeoffs between area, performance, and redundancy, while focusing on FPGA implementation considerations, such as resource realization costs, to produce more efficient architectures. Compared to TMR applied to existing RTL, our approach shows resource savings up to 80% with average resource savings of 34% and an average clock degradation of 6%. Compared to the previous approach, our approach shows resource savings up to 74% with average resource savings of 19% and an average heuristic execution time improvement of 96x.

Download Full-text

Power optimisation using intelligent Clock gating dedicated for block RAM cascading technique in FPGA design.

10.21203/rs.3.rs-878601/v1 ◽

2021 ◽

Author(s):

gurwinder singh ◽

Munish Rattan ◽

Gurjot Kaur Walia

Keyword(s):

Digital Circuits ◽

Current Trend ◽

Critical Energy ◽

Clock Gating ◽

Fpga Design ◽

Gate Arrays ◽

Field Programmable ◽

Chip Size ◽

Programmable Gate Arrays ◽

High Level

Abstract The current trend is the combination of chip size reduction and an increase in the number of circuits on chips has provided significant growth in battery consumption and critical energy efficiency leading to growth in the emerging Low Power Electronics sector. Our paper is committed to optimizing the power by eliminating cascading in block RAM. It dominates the amount of power dissipated in SOCs (System on Chips). High-level integration (HLS) allows hardware designers to think logically and not worry about low-level, cyclical details. It arranges the capability to quickly access the slot of design and the tradeoff between resource utilization and operation. Field Programmable Gate Arrays (FP- GAs) show significant progress in measuring speed and capacity to create a platform for the use of digital circuits. In the design of the FPGA, integration tools are used that perform various mitigation and improvement strategies. Integration tools utilize the RTL representation of a project with time constraints and generate a network list of the same level. Today, the advanced Xilinx Vivado Design Suite is used for FPGA design as a blending tool. In some cases, the Xilinx Vivado is unable to meet the required designer delays and power constraints. Therefore the primary goal of this paper is to optimize the power in design constraints in the Xilinx Vivado software.

Download Full-text