High-Performance Symmetric Block Ciphers on CUDA

Author(s):  
Naoki Nishikawa ◽  
Keisuke Iwai ◽  
Takakazu Kurokawa
2021 ◽  
Vol 3 (2) ◽  
pp. 58-65
Author(s):  
Ya. R. Sovyn ◽  
◽  
V. V. Khoma ◽  

The article is devoted to the issues of increasing the security and efficiency of software implementation for the symmetric block ciphers. For the implementation of cryptoalgorithms on low-end CPUs (8/16/32-bit microcontrollers), it is important to provide increased resistance to power consumption analysis attacks. With regard to the implementation of ciphers on high-end CPUs (x86, ARM Cortex-A), it is important to eliminate the vulnerability primarily to timing and cache attacks. The authors used a bitslice approach to securely implement block ciphers, which has potential advantages such as high speed and low computing resources. However, the known bitsliced methods have a significant limitation, since they work with deterministic S-Boxes or arbitrary S-Boxes of smaller sizes. The paper proposes a new heuristic method for bitsliced representation of cryptographic 8×8 S-Boxes containing randomly generated values. These values defy description using algebraic expressions. The method is based on the decomposition of the truth table, which describes the S-Box, into two parts. One part of the table forms logical masks, and the other is split into bit vectors. To find a logical description of these vectors an exhaustive search is used. After finding the description of all vectors, these two parts of the table are combined into one using logical operations. The use of this method oriented on software implementation in the logical basis {AND, OR, XOR, NOT} ensures the minimization of arbitrary 8×8 S-Boxes. The proposed method can be implemented using standard logical instructions on any 8/16/32/64-bit processors. It is also possible to use logical SIMD instructions from the SSE, AVX, AVX-512 extensions for x86-64 processors, which provides high performance due to the use of long registers. The corresponding software has been developed that implements the method of searching for bitsliced representations of a given S-Box, and also automatically generates C++ code for it based on SSE, AVX and AVX-512 instructions. The effectiveness of the method on the S-Box of known block ciphers, in particular the Ukrainian encryption standard "Kalyna", has been investigated. It was found that the developed algorithm requires almost half as many gates for the bitsliced description of an arbitrary S-Box than the best of known algorithm (370 gates versus 680, respectively). For ciphers that use two or four S-Box tables, joint minimization can yield up to 330 or 300 gates per table, respectively. Keywords: bitslicing; S-Box; logical minimization; SIMD; x86-64 CPU; software implementation; block ciphers.


2012 ◽  
Vol 2 (2) ◽  
pp. 251-268 ◽  
Author(s):  
Naoki Nishikawa ◽  
Keisuke Iwai ◽  
Takakazu Kurokawa

2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Dan Li ◽  
Jiazhe Chen ◽  
An Wang ◽  
Xiaoyun Wang

Low Entropy Masking Schemes (LEMS) are countermeasure techniques to mitigate the high performance overhead of masked hardware and software implementations of symmetric block ciphers by reducing the entropy of the mask sets. The security of LEMS depends on the choice of the mask sets. Previous research mainly focused on searching balanced mask sets for hardware implementations. In this paper, we find that those balanced mask sets may have vulnerabilities in terms of absolute difference when applied in software implemented LEMS. The experiments verify that such vulnerabilities certainly make the software LEMS implementations insecure. To fix the vulnerabilities, we present a selection criterion to choose the mask sets. When some feasible mask sets are already picked out by certain searching algorithms, our selection criterion could be a reference factor to help decide on a more secure one for software LEMS.


Author(s):  
Muhammad Junaid ◽  
Mukhtar Hussain ◽  
Ashraf Masood ◽  
Firdous Kausar ◽  
Ayesha Noreen ◽  
...  

Author(s):  
Kamel Mohammed Faraoun

This paper proposes a semantically secure construction of pseudo-random permutations using second-order reversible cellular automata. We show that the proposed construction is equivalent to the Luby-Rackoff model if it is built using non-uniform transition rules, and we prove that the construction is strongly secure if an adequate number of iterations is performed. Moreover, a corresponding symmetric block cipher is constructed and analysed experimentally in comparison with popular ciphers. Obtained results approve robustness and efficacy of the construction, while achieved performances overcome those of some existing block ciphers.


Author(s):  
Xiaolu Hou ◽  
Jakub Breier ◽  
Fuyuan Zhang ◽  
Yang Liu

Differential Fault Analysis (DFA) is considered as the most popular fault analysis method. While there are techniques that provide a fault analysis automation on the cipher level to some degree, it can be shown that when it comes to software implementations, there are new vulnerabilities, which cannot be found by observing the cipher design specification.This work bridges the gap by providing a fully automated way to carry out DFA on assembly implementations of symmetric block ciphers. We use a customized data flow graph to represent the program and develop a novel fault analysis methodology to capture the program behavior under faults. We establish an effective description of DFA as constraints that are passed to an SMT solver. We create a tool that takes assembly code as input, analyzes the dependencies among instructions, automatically attacks vulnerable instructions using SMT solver and outputs the attack details that recover the last round key (and possibly the earlier keys). We support our design with evaluations on lightweight ciphers SIMON, SPECK, and PRIDE, and a current NIST standard, AES. By automated assembly analysis, we were able to find new efficient DFA attacks on SPECK and PRIDE, exploiting implementation specific vulnerabilities, and previously published DFA on SIMON and AES. Moreover, we present a novel DFA on multiplication operation that has never been shown for symmetric block ciphers before. Our experimental evaluation also shows reasonable execution times that are scalable to current cipher designs and can easily outclass the manual analysis. Moreover, we present a method to check the countermeasure-protected implementations in a way that helps implementers to decide how many rounds should be protected. We note that this is the first work that automatically carries out DFA on cipher implementations without any plaintext or ciphertext information and therefore, can be generally applied to any input data to the cipher.


Sign in / Sign up

Export Citation Format

Share Document