scholarly journals Low Power Built-In Self-Test Schemes for Array and Booth Multipliers

VLSI Design ◽  
2001 ◽  
Vol 12 (3) ◽  
pp. 431-448 ◽  
Author(s):  
D. Bakalist ◽  
X. Kavousianos ◽  
H. T. Vergos ◽  
D. Nikolos ◽  
G. Ph. Alexiou

Recent trends in IC technology have given rise to a new requirement, that of low power dissipation during testing, that Built-In Self-Test (BIST) structures must target along with the traditional requirements. To this end, by exploiting the inherent properties of Carry Save, Carry Propagate and modified Booth multipliers, in this paper we propose new power-efficient BIST structures for them. The proposed BIST schemes are derived by: (a) properly assigning the Test Pattern Generator (TPG) outputs to the multiplier inputs, (b) modifying the TPG circuits and (c) reducing the test set length. Our results indicate that the total power dissipated during testing can be reduced from 29.3% to 54.9%, while the average power per test vector applied can be reduced from 5.8% to 36.5% and the peak power dissipation can be reduced from 15.5% to 50.2% depending on the implementation of the basic cells and the size of the multiplier. The test application time is also significantly reduced, while the introduced BIST schemes implementation area is small.

2008 ◽  
Vol 17 (02) ◽  
pp. 183-190 ◽  
Author(s):  
S. RAMAKRISHNAN ◽  
K. T. LAU

In this paper, a newly improved dynamic current mode logic (I-DyCML) is proposed to achieve low power dissipation. The principle used in I-DyCML is the reduction of the leakage current by turning the part of the circuit to "standby mode", when not in use, while achieving lower dynamic power during the active mode. HSpice simulations show that I-DyCML saves up to 15–30% of the total power dissipation when compared to Dynamic Current mode logic.


2014 ◽  
Vol 4 (3) ◽  
pp. 9-13
Author(s):  
M. Balaji ◽  
◽  
B. Keerthana ◽  
K. Varun ◽  
◽  
...  

2021 ◽  
Vol 17 (2) ◽  
pp. 1-27
Author(s):  
Morteza Hosseini ◽  
Tinoosh Mohsenin

This article presents a low-power, programmable, domain-specific manycore accelerator, Binarized neural Network Manycore Accelerator (BiNMAC), which adopts and efficiently executes binary precision weight/activation neural network models. Such networks have compact models in which weights are constrained to only 1 bit and can be packed several in one memory entry that minimizes memory footprint to its finest. Packing weights also facilitates executing single instruction, multiple data with simple circuitry that allows maximizing performance and efficiency. The proposed BiNMAC has light-weight cores that support domain-specific instructions, and a router-based memory access architecture that helps with efficient implementation of layers in binary precision weight/activation neural networks of proper size. With only 3.73% and 1.98% area and average power overhead, respectively, novel instructions such as Combined Population-Count-XNOR , Patch-Select , and Bit-based Accumulation are added to the instruction set architecture of the BiNMAC, each of which replaces execution cycles of frequently used functions with 1 clock cycle that otherwise would have taken 54, 4, and 3 clock cycles, respectively. Additionally, customized logic is added to every core to transpose 16×16-bit blocks of memory on a bit-level basis, that expedites reshaping intermediate data to be well-aligned for bitwise operations. A 64-cluster architecture of the BiNMAC is fully placed and routed in 65-nm TSMC CMOS technology, where a single cluster occupies an area of 0.53 mm 2 with an average power of 232 mW at 1-GHz clock frequency and 1.1 V. The 64-cluster architecture takes 36.5 mm 2 area and, if fully exploited, consumes a total power of 16.4 W and can perform 1,360 Giga Operations Per Second (GOPS) while providing full programmability. To demonstrate its scalability, four binarized case studies including ResNet-20 and LeNet-5 for high-performance image classification, as well as a ConvNet and a multilayer perceptron for low-power physiological applications were implemented on BiNMAC. The implementation results indicate that the population-count instruction alone can expedite the performance by approximately 5×. When other new instructions are added to a RISC machine with existing population-count instruction, the performance is increased by 58% on average. To compare the performance of the BiNMAC with other commercial-off-the-shelf platforms, the case studies with their double-precision floating-point models are also implemented on the NVIDIA Jetson TX2 SoC (CPU+GPU). The results indicate that, within a margin of ∼2.1%--9.5% accuracy loss, BiNMAC on average outperforms the TX2 GPU by approximately 1.9× (or 7.5× with fabrication technology scaled) in energy consumption for image classification applications. On low power settings and within a margin of ∼3.7%--5.5% accuracy loss compared to ARM Cortex-A57 CPU implementation, BiNMAC is roughly ∼9.7×--17.2× (or 38.8×--68.8× with fabrication technology scaled) more energy efficient for physiological applications while meeting the application deadline.


2015 ◽  
Vol 43 (7) ◽  
pp. 430
Author(s):  
Tomofumi KISE ◽  
Hitoshi SHIMIZU ◽  
Hideyuki NASU

2016 ◽  
Vol 37 (1) ◽  
pp. 33-37
Author(s):  
李辉 LI Hui ◽  
都继瑶 DU Ji-yao ◽  
曲轶 QU Yi ◽  
张晶 ZHANG Jing ◽  
李再金 LI Zai-jin ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document