A Novel Low-Area Point Multiplication Architecture for Elliptic-Curve Cryptography

Muhammad Rashid; Mohammad Mazyad Hazzazi ; Sikandar Zulqarnain Khan; Adel R. Alharbi ; Asher Sajid ; Amer Aljaedi

doi:10.3390/electronics10212698

A Novel Low-Area Point Multiplication Architecture for Elliptic-Curve Cryptography

Electronics ◽

10.3390/electronics10212698 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2698

Author(s):

Muhammad Rashid ◽

Mohammad Mazyad Hazzazi ◽

Sikandar Zulqarnain Khan ◽

Adel R. Alharbi ◽

Asher Sajid ◽

...

Keyword(s):

Elliptic Curve ◽

Elliptic Curve Cryptography ◽

State Of The Art ◽

Critical Path ◽

Clock Frequency ◽

Description Language ◽

Point Multiplication ◽

Low Area ◽

Field Programmable ◽

Hardware Description

This paper presents a Point Multiplication (PM) architecture of Elliptic-Curve Cryptography (ECC) over GF(2163) with a focus on the optimization of hardware resources and latency at the same time. The hardware resources are reduced with the use of a bit-serial (traditional schoolbook) multiplication method. Similarly, the latency is optimized with the reduction in a critical path using pipeline registers. To cope with the pipelining, we propose to reschedule point addition and double instructions, required for the computation of a PM operation in ECC. Subsequently, the proposed architecture over GF(2163) is modeled in Verilog Hardware Description Language (HDL) using Vivado Design Suite. To provide a fair performance evaluation, we synthesize our design on various FPGA (field-programmable gate array) devices. These FPGA devices are Virtex-4, Virtex-5, Virtex-6, Virtex-7, Spartan-7, Artix-7, and Kintex-7. The lowest area (433 FPGA slices) is achieved on Spartan-7. The highest speed is realized on Virtex-7, where our design achieves 391 MHz clock frequency and requires 416 μs for one PM computation (latency). For power, the lowest values are achieved on the Artix-7 (56 μW) and Kintex-7 (61 μW) devices. A ratio of throughput over area value of 4.89 is reached for Virtex-7. Our design outperforms most recent state-of-the-art solutions (in terms of area) with an overhead of latency.

Download Full-text

Elliptic-Curve Crypto Processor for RFID Applications

Applied Sciences ◽

10.3390/app11157079 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7079

Author(s):

Muhammad Rashid ◽

Sajjad Shaukat Jamal ◽

Sikandar Zulqarnain Khan ◽

Adel R. Alharbi ◽

Amer Aljaedi ◽

...

Keyword(s):

Elliptic Curve ◽

Radio Frequency Identification ◽

Initial Point ◽

Building Blocks ◽

Low Latency ◽

Clock Frequency ◽

Low Area ◽

Field Programmable ◽

Frequency Identification ◽

Rfid Applications

This work presents an Elliptic-curve Point Multiplication (ECP) architecture with a focus on low latency and low area for radio-frequency-identification (RFID) applications over GF(2163). To achieve low latency, we have reduced the clock cycles by using: (i) three-shift buffers in the datapath to load Elliptic-curve parameters as well as an initial point, (ii) the identical size of input/output interfaces in all building blocks of the architecture. The low area is preserved by using the same hardware resources of squaring and multiplication for inversion computation. Finally, an efficient controller is used to control the inferred logic. The proposed ECP architecture is modeled in Verilog and the synthesis results are given on three different 7-series FPGA (Field Programmable Gate Array) devices, i.e., Kintex-7, Artix-7, and Virtex-7. The performance of the architecture is provided with the integration of a schoolbook multiplier (implemented with two different logic styles, i.e., combinational and sequential). On Kintex-7, the combinational implementation style of a schoolbook multiplier results in power-optimized, i.e., 161 μW, values with an expense of (i) hardware resources, i.e., 3561 look-up-tables and 1527 flip-flops, (ii) clock frequency, i.e., 227 MHz, and (iii) latency, i.e., 11.57 μs. On the same Kintex-7 device, the sequential implementation style of a schoolbook multiplier provides, (i) 2.88 μs latency, (ii) 1786 look-up-tables and 1855 flip-flops, (iii) 647 μW power, and (iv) 909 MHz clock frequency. Therefore, the reported area, latency and power results make the proposed ECP architecture well-suited for RFID applications.

Download Full-text

High Performance and Low Latency ECC Processor for Cryptography

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-v2-i3-304 ◽

2021 ◽

pp. 24-30

Author(s):

Mohan Rao Thokala

Keyword(s):

Elliptic Curve ◽

Elliptic Curve Cryptography ◽

Field Programmable Gate Array ◽

High Performance ◽

Low Latency ◽

Data Dependency ◽

Drastic Reduction ◽

Point Multiplication ◽

Field Programmable ◽

Xilinx Fpga

Elliptic curve cryptography processor implemented for point multiplication on field programmable gate array. Segmented pipelined full-precision multiplier is used to reduce the latency and also data dependency can be avoided by modifying Lopez-Dahab Montgomery PM Algorithm, results in drastic reduction in the number of clock cycles required. The proposed ECC processor is implemented on Xilinx FPGA families i.e. virtex-4, vitrtex-5, virtex-7.single and three multiplier based designs show the fastest performance compared with reported work individually. Our three multiplier based ECC processor implementation is taking the lowest number of clock cycles on FPGA based design processor.

Download Full-text

AREEBA: An Area Efficient Binary Huff-Curve Architecture

Electronics ◽

10.3390/electronics10121490 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1490

Author(s):

Asher Sajid ◽

Muhammad Rashid ◽

Sajjad Shaukat Jamal ◽

Malik Imran ◽

Saud S. Alotaibi ◽

...

Keyword(s):

Elliptic Curve Cryptography ◽

Power Analysis ◽

State Of The Art ◽

Power Analysis Attacks ◽

Low Area ◽

Gate Arrays ◽

Asymmetric Cryptography ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Computational Resources

Elliptic curve cryptography is the most widely employed class of asymmetric cryptography algorithm. However, it is exposed to simple power analysis attacks due to the lack of unifiedness over point doubling and addition operations. The unified crypto systems such as Binary Edward, Hessian and Huff curves provide resistance against power analysis attacks. Furthermore, Huff curves are more secure than Edward and Hessian curves but require more computational resources. Therefore, this article has provided a low area hardware architecture for point multiplication computation of Binary Huff curves over GF(2163) and GF(2233). To achieve this, a segmented least significant digit multiplier for polynomial multiplications is proposed. In order to provide a realistic and reasonable comparison with state of the art solutions, the proposed architecture is modeled in Verilog and synthesized for different field programmable gate arrays. For Virtex-4, Virtex-5, Virtex-6, and Virtex-7 devices, the utilized hardware resources in terms of hardware slices over GF(2163) are 5302, 2412, 2982 and 3508, respectively. The corresponding achieved values over GF(2233) are 11,557, 10,065, 4370 and 4261, respectively. The reported low area values provide the acceptability of this work in area-constrained applications.

Download Full-text

SYNTHESIS AND FPGA–IMPLEMENTATION BASED NEURAL TECHNIQUE OF A NONLINEAR ADC MODEL

International Journal of Computing ◽

10.47839/ijc.4.1.321 ◽

2014 ◽

pp. 27-33

Author(s):

Mounir Bouhedda ◽

Mokhtar Attari

Keyword(s):

Integrated Circuit ◽

Field Programmable Gate Array ◽

High Speed ◽

Hardware Description Language ◽

Description Language ◽

Analog To Digital ◽

Field Programmable ◽

Hardware Description ◽

Nonlinear Analog ◽

Very High

The aim of this paper is to introduce a new architecture using Artificial Neural Networks (ANN) in designing a 6-bit nonlinear Analog to Digital Converter (ADC). A study was conducted to synthesise an optimal ANN in view to FPGA (Field Programmable Gate Array) implementation using Very High-speed Integrated Circuit Hardware Description Language (VHDL). Simulation and tests results are carried out to show the efficiency of the designed ANN.

Download Full-text

Efficient Lightweight Hardware Structures of Point Multiplication on Binary Edwards Curves for Elliptic Curve Cryptosystems

Journal of Circuits System and Computers ◽

10.1142/s0218126619501494 ◽

2019 ◽

Vol 28 (09) ◽

pp. 1950149

Author(s):

Bahram Rashidi ◽

Mohammad Abedini

Keyword(s):

High Speed ◽

Critical Path ◽

Low Cost ◽

Path Delay ◽

Point Multiplication ◽

Low Area ◽

Elliptic Curve Cryptosystems ◽

Edwards Curves ◽

Special Cases ◽

Field Multiplication

This paper presents efficient lightweight hardware implementations of the complete point multiplication on binary Edwards curves (BECs). The implementations are based on general and special cases of binary Edwards curves. The complete differential addition formulas have the cost of [Formula: see text] and [Formula: see text] for general and special cases of BECs, respectively, where [Formula: see text] and [Formula: see text] denote the costs of a field multiplication, a field squaring and a field multiplication by a constant, respectively. In the general case of BECs, the structure is implemented based on 3 concurrent multipliers. Also in the special case of BECs, two structures by employing 3 and 2 field multipliers are proposed for achieving the highest degree of parallelization and utilization of resources, respectively. The field multipliers are implemented based on the proposed efficient digit–digit polynomial basis multiplier. Two input operands of the multiplier proceed in digit level. This property leads to reduce hardware consumption and critical path delay. Also, in the structure, based on the change of input digit size from low digit size to high digit size the number of clock cycles and input words are different. Therefore, the multiplier can be flexible for different cryptographic considerations such as low-area and high-speed implementations. The point multiplication computation requires field inversion, therefore, we use a low-cost Extended Euclidean Algorithm (EEA) based inversion for implementation of this field operation. Implementation results of the proposed architectures based on Virtex-5 XC5VLX110 FPGA for two fields [Formula: see text] and [Formula: see text] are achieved. The results show improvements in terms of area and efficiency for the proposed structures compared to previous works.

Download Full-text

A 4-Stage Pipelined Architecture for Point Multiplication of Binary Huff Curves

Journal of Circuits System and Computers ◽

10.1142/s0218126620501790 ◽

2020 ◽

Vol 29 (11) ◽

pp. 2050179 ◽

Cited By ~ 2

Author(s):

Muhammad Rashid ◽

Malik Imran ◽

Atif Raza Jafri ◽

Zahid Mehmood

Keyword(s):

Critical Path ◽

Mathematical Formulation ◽

Area Ratio ◽

Design Tool ◽

Clock Frequency ◽

Point Multiplication ◽

Area Reduction ◽

Pipelined Architecture ◽

Two Factors ◽

Time Required

This work has proposed a 4-stage pipelined architecture to achieve an optimized throughput over area ratio for point multiplication (PM) computation in binary huff curves (BHC) cryptography. The original mathematical formulation of BHC is revisited with an objective to reduce the required area. Consequently, a simplified formulation of BHC is obtained with 43% reduction in the hardware resources. As far as the throughput is concerned, it is improved first by reducing the critical path and second by minimizing the number of clock cycles (CCs) required to compute one PM. The critical path is reduced through the placement of pipeline registers, whereas the number of required CCs are minimized through an efficient scheduling of computations. These two factors i.e., the area reduction and throughput optimizations, have resulted in maximizing the throughput over area ratio. The proposed pipelined architecture is implemented over [Formula: see text] field, using standard NIST curve parameters. The architecture is modeled in Verilog and synthesized using Xilinx (ISE 14.7) design tool on Virtex 7 FPGA. The implementation results show that 17% improvement in clock frequency, 13% reduction in the time required to compute one PM and 2.6% improvement in throughput/area are achieved when compared with the most recent state of the art solutions.

Download Full-text

New Hardware Architecture for Self-Organizing Map Used for Color Vector Quantization

Journal of Circuits System and Computers ◽

10.1142/s0218126620500024 ◽

2019 ◽

Vol 29 (01) ◽

pp. 2050002

Author(s):

Khaled Ben Khalifa ◽

Ahmed Ghazi Blaiech ◽

Mehdi Abadi ◽

Mohamed Hedi Bedoui

Keyword(s):

Vector Quantization ◽

Recent Literature ◽

Network Architectures ◽

Self Organizing Map ◽

Description Language ◽

Field Programmable ◽

Color Vector ◽

Hardware Description ◽

Som Network ◽

Self Organizing

In this paper, we present a new generic architectural approach of a Self-Organizing Map (SOM). The proposed architecture, called the Diagonal-SOM (D-SOM), is described as an Hardware–Description-Language as an intellectual property kernel with easily adjustable parameters.The D-SOM architecture is based on a generic formalism that exploits two levels of the nested parallelism of neurons and connections. This solution is therefore considered as a system based on the cooperation of a distributed set of independent computations. The organization and structure of these calculations process an oriented data flow in order to find a better treatment distribution between different neuroprocessors. To validate the D-SOM architecture, we evaluate the performance of several SOM network architectures after their integration on a Xilinx Virtex-7 Field Programmable Gate Array support. The proposed solution allows the easy adaptation of learning to a large number of SOM topologies without any considerable design effort. [Formula: see text] SOM hardware is validated through FPGA implementation, where temporal performance is almost twice as fast as that obtained in the recent literature. The suggested D-SOM architecture is also validated through simulation on variable-sized SOM networks applied to color vector quantization.

Download Full-text

Hardware Implementation of Stockwell Transform and Smoothed Pseudo Wigner Ville Distribution Transform on FPGA using CORDIC Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6705.0110522 ◽

2022 ◽

Vol 10 (5) ◽

pp. 57-60

Author(s):

B Murali Krishna ◽

◽

B.T. Krishna ◽

K Babulu ◽

◽

...

Keyword(s):

Cordic Algorithm ◽

Matlab Simulation ◽

Verilog Hdl ◽

Description Language ◽

S Transform ◽

Field Programmable ◽

Hardware Resource ◽

Linear Transform ◽

Hardware Description ◽

Stockwell Transform

A comparison of linear and quadratic transform implementation on field programmable gate array (FPGA) is presented. Popular linear transform namely Stockwell Transform and Smoothed Pseudo Wigner Ville Distribution (SPWVD) transform from Quadratic transforms is considered for the implementation on FPGA. Both the transforms are coded in Verilog hardware description language (Verilog HDL). Complex calculations of transformation are performed by using CORDIC algorithm. From FPGA family, Spartan-6 is chosen as hardware device to implement. Synthetic chirp signal is taken as input to test the both designed transforms. Summary of hardware resource utilization on Spartan-6 for both the transforms is presented. Finally, it is observed that both the transforms S-Transform and SPWVD are computed with low elapsed time with respect to MATLAB simulation.

Download Full-text

PROJECT DESIGN FOR COMPUTER ARCHITECTURE PRACTICAL SESSIONS BASED ON FIELD-PROGRAMMABLE GATE ARRAY

Mokslas - Lietuvos ateitis ◽

10.3846/mla.2021.15184 ◽

2021 ◽

Vol 13 (0) ◽

pp. 1-5

Author(s):

Kęstutis Bartnykas

Keyword(s):

Computer Architecture ◽

Project Design ◽

Electronic Systems ◽

Programmable Logic Arrays ◽

Description Language ◽

Field Programmable ◽

Hardware Description ◽

Design Ideas ◽

Dedicated Processors ◽

Dedicated Processor

Field-programmable logic arrays are often used in courses on computer architecture. The student must describe the processor with the external components necessary for its operation in the specified HDL (hardware description language) language according to the provided specification during a certain number of projects. The weakness of this approach is that the basis of such projects is a processor of one specific architecture, so the lecturer faces the issue of individualization of projects. This article proposes a solution based on dedicated processors instead of one programmable processor of a specific architecture. It’s shown here that the issue of project individualization is easier solvable in the proposed way, and it does not deviate from the theory of computer architecture, because the programmable processor is a generalization of a dedicated processor. The article describes project design ideas based on dedicated processors and gives some examples. Represented different instance than was applied during practical sessions of Computer Architecture that are held at the Department of Electronic Systems within VILNIUS TECH, i.e. certain modifications, and additions were applied.

Download Full-text

A Real Time Algorithm for Versatile Mode Parking System and Its Implementation on FPGA Board

Applied Sciences ◽

10.3390/app12020655 ◽

2022 ◽

Vol 12 (2) ◽

pp. 655

Author(s):

Baligh Naji ◽

Chokri Abdelmoula ◽

Mohamed Masmoudi

Keyword(s):

Real Time ◽

Evaluation Process ◽

Time Algorithm ◽

Hardware Description Language ◽

Optimal Solutions ◽

Description Language ◽

Parking System ◽

Parking Lot ◽

Field Programmable ◽

Hardware Description

This paper presents the design and development of a technique for an Autonomous and Versatile mode Parking System (AVPS) that combines a various number of parking modes. The proposed approach is different from that of many developed parking systems. Previous research has focused on choosing only a parking lot starting from two parking modes (which are parallel and perpendicular). This research aims at developing a parking system that automatically chooses a parking lot starting from four parking modes. The automatic AVPS was proposed for the car-parking control problem, and could be potentially exploited for future vehicle generation. A specific mode can be easily computed using the proposed strategy. A variety of candidate modes could be generated using one developed real time VHDL (VHSIC Hardware Description Language) algorithm providing optimal solutions with performance measures. Based on simulation and experimental results, the AVPS is able to find and recognize in advance which parking mode to select. This combination describes full implementation on a mobile robot, such as a car, based on a specific FPGA (Field-Programmable Gate Array) card. To prove the effectiveness of the proposed innovation, an evaluation process comparing the proposed technique with existing techniques was conducted and outlined.

Download Full-text