An Efficient Hardware Implementation of Residual Data Binarization in HEVC CABAC Encoder

Dinh-Lam Tran; Xuan-Tu Tran; Duy-Hieu Bui; Cong-Kha Pham

doi:10.3390/electronics9040684

An Efficient Hardware Implementation of Residual Data Binarization in HEVC CABAC Encoder

Electronics ◽

10.3390/electronics9040684 ◽

2020 ◽

Vol 9 (4) ◽

pp. 684

Author(s):

Dinh-Lam Tran ◽

Xuan-Tu Tran ◽

Duy-Hieu Bui ◽

Cong-Kha Pham

Keyword(s):

Power Consumption ◽

Power Efficiency ◽

High Performance ◽

Hardware Implementation ◽

Video Quality ◽

Clock Cycle ◽

Work Load ◽

Video Data ◽

Low Area ◽

Binary Arithmetic

HEVC-standardized encoders employ the CABAC (context-based adaptive binary arithmetic coding) to achieve high compression ratios and video quality that supports modern real-time high-quality video services. Binarizer is one of three main blocks in a CABAC architecture, where binary symbols (bins) are generated to feed the binary arithmetic encoder (BAE). The residual video data occupied an average of 75% of the CABAC’s work-load, thus its performance will significantly contribute to the overall performance of whole CABAC design. This paper proposes an efficient hardware implementation of a binarizer for CABAC that focuses on low area cost, low power consumption while still providing enough bins for high-throughput CABAC. On the average, the proposed design can process upto 3.5 residual syntax elements (SEs) per clock cycle at the maximum frequency of 500 MHz with an area cost of 9.45 Kgates (6.41 Kgates for the binarizer core) and power consumption of 0.239 mW (0.184 mW for the binarizer core) with NanGate 45 nm technology. It shows that our proposal achieved a high overhead-efficiency of 1.293 Mbins/Kgate/mW, much better than the other related high performance designs. In addition, our design also achieved a high power-efficiency of 8288 Mbins/mW; this is important factor for handheld applications.

Download Full-text

Microprocessors KOMDIV for High Performance Embedded Systems

INFORMATION TECHNOLOGY IN INDUSTRY ◽

10.17762/itii.v7i3.71 ◽

2021 ◽

Vol 7 (3) ◽

Author(s):

S.G. Bobkov

Keyword(s):

Embedded Systems ◽

Power Consumption ◽

High Performance ◽

Clock Cycle ◽

Embedded Computing ◽

Computing Systems ◽

Processor Performance

The problems of creating of high-performance embedded computing systems based on microprocessors KOMDIV is considered. Processor performance is dependent upon three characteristics: clock cycle, clock cycles per instruction, and instruction count. These characteristics for microprocessors KOMDIV are optimized using parameter performance/power consumption and requirements of embedded systems.

Download Full-text

Rack Server Solution in Data Center

Volume 1: Thermal Management ◽

10.1115/ipack2015-48258 ◽

2015 ◽

Cited By ~ 2

Author(s):

Sheng Kang ◽

Guofeng Chen ◽

Chun Wang ◽

Ruiquan Ding ◽

Jiajun Zhang ◽

...

Keyword(s):

Power Consumption ◽

Low Power ◽

Power Supply ◽

Data Center ◽

Power Efficiency ◽

High Performance ◽

High Efficiency ◽

High Growth ◽

General Purpose ◽

Power Supplies

With the advent of big data and cloud computing solutions, enterprise demand for servers is increasing. There is especially high growth for Intel based x86 server platforms. Today’s datacenters are in constant pursuit of high performance/high availability computing solutions coupled with low power consumption and low heat generation and the ability to manage all of this through advanced telemetry data gathering. This paper showcases one such solution of an updated rack and server architecture that promises such improvements. The ability to manage server and data center power consumption and cooling more completely is critical in effectively managing datacenter costs and reducing the PUE in the data center. Traditional Intel based 1U and 2U form factor servers have existed in the data center for decades. These general purpose x86 server designs by the major OEM’s are, for all practical purposes, very similar in their power consumption and thermal output. Power supplies and thermal designs for server in the past have not been optimized for high efficiency. In addition, IT managers need to know more information about servers in order to optimize data center cooling and power use, an improved server/rack design needs to be built to take advantage of more efficient power supplies or PDU’s and more efficient means of cooling server compute resources than from traditional internal server fans. This is the constant pursuit of corporations looking at new ways to improving efficiency and gaining a competitive advantage. A new way to optimize power consumption and improve cooling is a complete redesign of the traditional server rack. Extracting internal server power supplies and server fans and centralizing these within the rack aims to achieve this goal. This type of design achieves an entirely new low power target by utilizing centralized, high efficiency PDU’s that power all servers within the rack. Cooling is improved by also utilizing large efficient rack based fans for airflow to all servers. Also, opening up the server design is to allow greater airflow across server components for improved cooling. This centralized power supply breaks through the traditional server power limits. Rack based PDU’s can adjust the power efficiency to a more optimum point. Combine this with the use of online + offline modes within one single power supply. Cold backup makes data center power to achieve optimal power efficiency. In addition, unifying the mechanical structure and thermal definitions within the rack solution for server cooling and PSU information allows IT to collect all server power and thermal information centrally for improved ease in analyzing and processing.

Download Full-text

Low Latency Network-on-Chip Router Microarchitecture Using Request Masking Technique

International Journal of Reconfigurable Computing ◽

10.1155/2015/570836 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 14

Author(s):

Alireza Monemi ◽

Chia Yee Ooi ◽

Muhammad Nadzir Marsono

Keyword(s):

High Performance ◽

Clock Cycle ◽

Network On Chip ◽

Operating Frequency ◽

Low Latency ◽

Core System ◽

Low Area ◽

Area Overhead ◽

Logic Cells ◽

On Chip

Network-on-Chip (NoC) is fast emerging as an on-chip communication alternative for many-core System-on-Chips (SoCs). However, designing a high performance low latency NoC with low area overhead has remained a challenge. In this paper, we present a two-clock-cycle latency NoC microarchitecture. An efficient request masking technique is proposed to combine virtual channel (VC) allocation with switch allocation nonspeculatively. Our proposed NoC architecture is optimized in terms of area overhead, operating frequency, and quality-of-service (QoS). We evaluate our NoC against CONNECT, an open source low latency NoC design targeted for field-programmable gate array (FPGA). The experimental results on several FPGA devices show that our NoC router outperforms CONNECT with 50% reduction of logic cells (LCs) utilization, while it works with 100% and 35%~20% higher operating frequency compared to the one- and two-clock-cycle latency CONNECT NoC routers, respectively. Moreover, the proposed NoC router achieves 2.3 times better performance compared to CONNECT.

Download Full-text

HARDWARE IMPLEMENTATION OF AES ENCRYPTION AND DECRYPTION FOR LOW AREA & POWER CONSUMPTION

International Journal of Research in Engineering and Technology ◽

10.15623/ijret.2014.0305088 ◽

2014 ◽

Vol 03 (05) ◽

pp. 480-484 ◽

Cited By ~ 2

Author(s):

Pritamkumar N. Khose .

Keyword(s):

Power Consumption ◽

Hardware Implementation ◽

Low Area ◽

Encryption And Decryption

Download Full-text

Dual Die Package Design Strategy and Performance

Advances in Electronic Packaging, Parts A, B, and C ◽

10.1115/ipack2005-73391 ◽

2005 ◽

Cited By ~ 1

Author(s):

Mahadevan Suryakumar ◽

Lu-Vong T. Phan ◽

Mathew Ma ◽

Wajahat Ahmed

Keyword(s):

Power Efficiency ◽

High Speed ◽

High Performance ◽

Clock Cycle ◽

Average Power ◽

Cost Effective ◽

Design Strategy ◽

Leakage Power ◽

Memory Accesses ◽

And Performance

The alarming growth of power increase has presented numerous packaging challenges for high performance processors. The average power consumed by a processor is the sum of dynamic and leakage power. The dynamic power is proportional to V^2, while the leakage current (therefore leakage power) is proportional to V^b where V is the voltage and b>1 for modern processes. This means lowering voltage reduces energy consumed per clock cycle but reduces the maximum frequency at which the processor can operate at. Since reducing voltage reduces power faster than it does frequency, integrating more cores into the processor would result in better performance/power efficiency but would generate more memory accesses, driving a need for larger cache and high speed signaling [1]. In addition, the design goal to create unified package pinout for both single core and multicore product flavors adds additional constraint to create a cost effective package solution for both market segments. This paper discusses the design strategy and performance of dual die package to optimize package performance for cost.

Download Full-text

Resource Allocation in Cloud Computing for Energy Efficiency

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8356.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 1170-1174

Keyword(s):

Resource Allocation ◽

Cloud Computing ◽

Power Consumption ◽

High Performance ◽

High Reliability ◽

Low Cost ◽

Work Load ◽

Cloud Infrastructure ◽

Multiple Resources ◽

Resource Requirements

Cloud computing is a paradigm in which we have virtualized computer systems that deliver services, processing, storage, network, and other fundamental computing resources. Cloud computing enables low cost, device location independence, high reliability, scalability and sustainability. This paper describes the present state of cloud computing research by examining literature, identifying current study trends. We have analyzed the resource allocation method and concluded. It typically designs for high performance that supports the peak resource requirements. After several analyses the power consumption of data center and cloud systems as increased almost several times. There is a lack of research that addresses challenges of managing multiple resources with objective of allocating enough resources for each work load to optimizing power consumption. These papers survey various types of resource allocation algorithms that improve the cloud Infrastructure.

Download Full-text

A SINGLE FORMULA AND ITS IMPLEMENTATION IN FPGA FOR ELLIPTIC CURVE POINT ADDITION USING AFFINE REPRESENTATION

Journal of Circuits System and Computers ◽

10.1142/s0218126610006153 ◽

2010 ◽

Vol 19 (02) ◽

pp. 425-433 ◽

Cited By ~ 2

Author(s):

M. MORALES-SANDOVAL ◽

C. FEREGRINO-URIBE ◽

R. CUMPLIDO ◽

I. ALGREDO-BADILLO

Keyword(s):

Elliptic Curve ◽

Elliptic Curve Cryptography ◽

High Performance ◽

Hardware Implementation ◽

Side Channel ◽

Low Area ◽

Hardware Implementations ◽

New Formulation ◽

Single Formula ◽

New Formula

A formula for point addition in elliptic curves using affine representation and its implementation in FPGA is presented. The use of this new formula in hardware implementations of scalar multiplications for elliptic curve cryptography has the main advantages of: (i) reducing area for the implementations of elliptic curve point addition, and (ii) increasing the resistance to side channel attacks of the hardware implementation itself. Hardware implementation of scalar multiplication for elliptic curve cryptography using this new formulation requires low area resources while keeping high performance compared to implementations using projective coordinates, which are usually considered faster than the affine coordinates.

Download Full-text

Efficient Instruction and Data Caching for High Performance Embedded Processors

Jornada de Jóvenes Investigadores del I3A ◽

10.26754/jji-i3a.201201788 ◽

1970 ◽

pp. 9

Author(s):

A. Ferrerón Labari ◽

D. Suárez Gracia ◽

V. Viñals Yúfera

Keyword(s):

Embedded Systems ◽

Power Consumption ◽

Low Power ◽

Interconnection Networks ◽

High Performance ◽

Critical Issue ◽

Content Management ◽

Structure Design ◽

Portable Devices ◽

On Chip

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.

Download Full-text

Low Power Wide Fan-in Domino OR Gate Using CN-MOSFETs

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327909666190207163639 ◽

2020 ◽

Vol 10 (1) ◽

pp. 55-62

Author(s):

Deepika Bansal ◽

Bal Chand Nagar ◽

Brahamdeo Prasad Singh ◽

Ajay Kumar

Keyword(s):

Power Consumption ◽

High Performance ◽

Dynamic Logic ◽

Clock Frequency ◽

Charge Sharing ◽

Benchmark Circuit ◽

Domino Circuit ◽

Power Delay Product ◽

Domino Circuits ◽

Or Gate

Background & Objective: In this paper, a modified pseudo domino configuration has been proposed to improve the leakage power consumption and Power Delay Product (PDP) of dynamic logic using Carbon Nanotube MOSFETs (CN-MOSFETs). The simulations for proposed and published domino circuits are verified by using Synopsys HSPICE simulator with 32nm CN-MOSFET technology which is provided by Stanford. Methods: The simulation results of the proposed technique are validated for improvement of wide fan-in domino OR gate as a benchmark circuit at 500 MHz clock frequency. Results: The proposed configuration is suitable for cascading of the high performance wide fan-in circuits without any charge sharing. Conclusion: The performance analysis of 8-input OR gate demonstrate that the proposed circuit provides lower static and dynamic power consumption up to 62 and 40% respectively, and PDP improvement is 60% as compared to standard domino circuit.

Download Full-text

Ultracompact and low-power-consumption silicon thermo-optic switch for high-speed data

Nanophotonics ◽

10.1515/nanoph-2020-0496 ◽

2020 ◽

Vol 10 (2) ◽

pp. 937-945

Author(s):

Ruihuan Zhang ◽

Yu He ◽

Yong Zhang ◽

Shaohua An ◽

Qingming Zhu ◽

...

Keyword(s):

Power Consumption ◽

Low Power ◽

High Speed ◽

High Performance ◽

Pulse Amplitude ◽

Telecommunication Networks ◽

Low Power Consumption ◽

Power Efficient ◽

High Speed Data ◽

On Chip

AbstractUltracompact and low-power-consumption optical switches are desired for high-performance telecommunication networks and data centers. Here, we demonstrate an on-chip power-efficient 2 × 2 thermo-optic switch unit by using a suspended photonic crystal nanobeam structure. A submilliwatt switching power of 0.15 mW is obtained with a tuning efficiency of 7.71 nm/mW in a compact footprint of 60 μm × 16 μm. The bandwidth of the switch is properly designed for a four-level pulse amplitude modulation signal with a 124 Gb/s raw data rate. To the best of our knowledge, the proposed switch is the most power-efficient resonator-based thermo-optic switch unit with the highest tuning efficiency and data ever reported.

Download Full-text