High-throughput architectures for highresolution video coding: hardwired oriented algorithms and VLSI architectures

As the number of pixels per frame tends to increase in new high definition video coding standards such as HEVC and VP9, pel decimation appears as a viable means of increasing the energy efficiency of Sum of Absolute Differences (SAD) calculation. First, we analyze the quality costs of pel decimation using a video coding software. Then we present and evaluate two VLSI architectures to compute the SAD of 4x4 pixel blocks: one that can be configured with 1:1, 2:1 or 4:1 sampling ratios and a non-configurable one, to serve as baseline in comparisons. The architectures were synthesized for 90nm, 65nm and 45nm standard cell libraries assuming both nominal and Low-Vdd/High-Vt (LH) cases for maximum and for a given target throughput. The impacts of both subsampling and LH on delay, power and energy efficiency are analyzed. In a total of 24 syntheses, the 45nm/LH configurable SAD architecture synthesis achieved the highest energy efficiency for target throughput when operating in pel decimation 4:1, spending only 2.05pJ for each 4×4 block. This corresponds to about 13.65 times less energy than the 90nm/nominal configurable architecture operating in full sampling mode and maximum throughput and about 14.77 times less than the 90nm/nominal non-configurable synthesis for target throughput. Aside the improvements achieved by using LH, pel decimation solely was responsible for energy reductions of 40% and 60% when choosing 2:1 and 4:1 subsampling ratios, respectively, in the configurable architecture. Finally, it is shown that the configurable architecture is more energy-efficient than the non-configurable one.

Download Full-text

Sample-Level Filtering Order for High-Throughput and Memory-Aware H.264 Deblocking Filter

ISRN Signal Processing ◽

10.5402/2012/805346 ◽

2012 ◽

Vol 2012 ◽

pp. 1-6

Author(s):

Guilherme Correa ◽

Luciano Agostini ◽

Luis A. da Silva Cruz

Keyword(s):

Video Coding ◽

High Throughput ◽

Memory Usage ◽

Deblocking Filter ◽

Memory Space ◽

Data Dependencies ◽

Block Level ◽

Processing Order ◽

Filter Process ◽

Level Order

This paper presents a new sample-level filtering order for the Deblocking Filter process of the H.264/AVC video coding standard to be used instead of the traditional block-level order presented in previous works. This processing order allows a better exploration of the parallelism in the filtering process by reducing data dependencies in comparison to other works. The proposed sample-level order allows four parallel and independent samples filtering simultaneously, completing one complete macroblock filtering in fewer cycles and requiring less memory space than the related works. The proposed filtering order can be applied to the Deblocking Filter presented in a conventional H.264/AVC encoder or decoder and to the H.264/SVC interlayer Deblocking Filter. When compared to the original H.264/AVC filter and to the best related work found in the literature, the proposed scheme achieves a reduction of 72% and 25% in the number of clock cycles and a memory usage decrease of 75% and 43%, respectively.

Download Full-text

Iterative and Fully Pipelined High Throughput Efficient Architectures of AES in FPGA and ASIC

Journal of Circuits System and Computers ◽

10.1142/s0218126616500493 ◽

2016 ◽

Vol 25 (05) ◽

pp. 1650049 ◽

Cited By ~ 9

Author(s):

Vijay K. Sharma ◽

Saurabh Kumar ◽

K. K. Mahapatra

Keyword(s):

High Throughput ◽

Unit Area ◽

Critical Path ◽

Advanced Encryption Standard ◽

Vlsi Architectures ◽

Aes Algorithm ◽

Composite Field ◽

Low Energy Consumption ◽

Field Programmable ◽

Hardware Efficiency

This paper presents high throughput iterative and pipelined VLSI architectures of the Advanced encryption standard (AES) algorithm based on composite field arithmetic in polynomial basis. A logical rearrangement has been performed in the byte substitution (S-box) module to reduce the number of gates in the critical path. Also, inversion in GF(24) module has been separately optimized. ASIC implementation of our S-box has comparatively low power and low energy consumption. The iterative and pipelined implementations of AES in field programmable gate array (FPGA) and ASIC using proposed S-box have high hardware efficiency in terms of throughput per unit area (slices in FPGA).

Download Full-text