Efficient Parallel Implementations of LWE-Based Post-Quantum Cryptosystems on Graphics Processing Units

SangWoo An; Seog Chung Seo

doi:10.3390/math8101781

Efficient Parallel Implementations of LWE-Based Post-Quantum Cryptosystems on Graphics Processing Units

Mathematics ◽

10.3390/math8101781 ◽

2020 ◽

Vol 8 (10) ◽

pp. 1781

Author(s):

SangWoo An ◽

Seog Chung Seo

Keyword(s):

Cloud Computing ◽

Parallel Processing ◽

Graphics Processing Units ◽

Optimization Techniques ◽

Parallel Optimization ◽

Processing Unit ◽

Processing Technologies ◽

Performance Improvements ◽

Cloud Computing Service ◽

Graphics Processing

With the development of the Internet of Things (IoT) and cloud computing technology, various cryptographic systems have been proposed to protect increasing personal information. Recently, Post-Quantum Cryptography (PQC) algorithms have been proposed to counter quantum algorithms that threaten public key cryptography. To efficiently use PQC in a server environment dealing with large amounts of data, optimization studies are required. In this paper, we present optimization methods for FrodoKEM and NewHope, which are the NIST PQC standardization round 2 competition algorithms in the Graphics Processing Unit (GPU) platform. For each algorithm, we present a part that can perform parallel processing of major operations with a large computational load using the characteristics of the GPU. In the case of FrodoKEM, we introduce parallel optimization techniques for matrix generation operations and matrix arithmetic operations such as addition and multiplication. In the case of NewHope, we present a parallel processing technique for polynomial-based operations. In the encryption process of FrodoKEM, the performance improvements have been confirmed up to 5.2, 5.75, and 6.47 times faster than the CPU implementation in FrodoKEM-640, FrodoKEM-976, and FrodoKEM-1344, respectively. In the encryption process of NewHope, the performance improvements have been shown up to 3.33 and 4.04 times faster than the CPU implementation in NewHope-512 and NewHope-1024, respectively. The results of this study can be used in the IoT devices server or cloud computing service server. In addition, the results of this study can be utilized in image processing technologies such as facial recognition technology.

Download Full-text

Cloud Computing Model for Big Geological Data Processing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.475-476.306 ◽

2013 ◽

Vol 475-476 ◽

pp. 306-311 ◽

Cited By ~ 2

Author(s):

Miao Miao Song ◽

Zhe Li ◽

Bin Zhou ◽

Chao Ling Li

Keyword(s):

Cloud Computing ◽

Big Data ◽

Parallel Processing ◽

Data Processing ◽

Processing Unit ◽

Geological Data ◽

Computing Model ◽

Operation Speed ◽

Graphics Processing ◽

Big Data Technology

Geological data with phyletic and various, huge and complex data format, the analysis of geological data processing is mainly divided into three parts: Mines forecast, mine evaluation and mine positioning. Traditional geological data analysis model is limited by limited storage space and computational efficiency, and cannot meet the needs of a large number of geological data fast operations. "Big data technology" provides the ideal solution to the vast amounts of geological data management, information extraction, and comprehensive analysis. For mass storage capacity and high-speed computing power that the "big data technology" need, we built an intelligence systems applied to the analysis of geological data based on MapReduce and GPU double parallel processing cloud computing model. For a large number of geological data, using hadoop cluster system to solve the problem of large amounts of data storage, and designing efficient parallel processing method based on GPU (Graphics Processing Units: calculation of Graphics Processing unit), the method was applied to MapReduce framework, finally completing MapReduce and GPU double parallel processing cloud computing model to improve the operation speed of the system. Through theoretical modeling and experimental verification, indicating that the system can meet the analysis of geological data operation precision, the operation data amount and the operation speed.

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text

Extending the usage of graphics processing units on the cloud for cost savings on seismic data regularization

Brazilian Journal of Geophysics ◽

10.22564/rbgf.v38i2.2048 ◽

2021 ◽

Vol 38 (2) ◽

Author(s):

Nicholas Torres Okita ◽

Tiago A. Coimbra ◽

José Ribeiro ◽

Martin Tygel

Keyword(s):

Cloud Computing ◽

Graphics Processing Units ◽

Cost Savings ◽

Data Sets ◽

Computing Paradigm ◽

Common Reflection Surface ◽

User Demand ◽

Computationally Intensive ◽

Zero Offset ◽

Graphics Processing

ABSTRACT. The usage of graphics processing units is already known as an alternative to traditional multi-core CPU processing, offering faster performance in the order of dozens of times in parallel tasks. Another new computing paradigm is cloud computing usage as a replacement to traditional in-house clusters, enabling seemingly unlimited computation power, no maintenance costs, and cutting-edge technology, dynamically on user demand. Previously those two tools were used to accelerate the estimation of Common Reflection Surface (CRS) traveltime parameters, both in zero-offset and finite-offset domain, delivering very satisfactory results with large time savings from GPU devices alongside cost savings on the cloud. This work extends those results by using GPUs on the cloud to accelerate the Offset Continuation Trajectory (OCT) traveltime parameter estimation. The results have shown that the time and cost savings from GPU devices’ usage are even larger than those seen in the CRS results, being up to fifty times faster and sixty times cheaper. This analysis reaffirms that it is possible to save both time and money when using GPU devices on the cloud and concludes that the larger the data sets are and the more computationally intensive the traveltime operators are, we can see larger improvements.Keywords: cloud computing, GPU, seismic processing. Estendendo o uso de placas gráficas na nuvem para economias em regularização de dados sísmicosRESUMO. O uso de aceleradores gráficos para processamento já é uma alternativa conhecida ao uso de CPUs multi-cores, oferecendo um desempenho na ordem de dezenas de vezes mais rápido em tarefas paralelas. Outro novo paradigma de computação é o uso da nuvem computacional como substituta para os tradicionais clusters internos, possibilitando o uso de um poder computacional aparentemente infinito sem custo de manutenção e com tecnologia de ponta, dinamicamente sob demanda de usuário. Anteriormente essas duas ferramentas foram utilizadas para acelerar a estimação de parâmetros do tempo de trânsito de Common Reflection Surface (CRS), tanto em zero-offset quanto em offsets finitos, obtendo resultados satisfatórios com amplas economias tanto de tempo quanto de dinheiro na nuvem. Este trabalho estende os resultados obtidos anteriormente, desta vez utilizando GPUs na nuvem para acelerar a estimação de parâmetros do tempo de trânsito em Offset Continuation Trajectory (OCT). Os resultados obtidos mostraram que as economias de tempo e dinheiro foram ainda maiores do que aquelas obtidas no CRS, sendo até cinquenta vezes mais rápido e sessenta vezes mais barato. Esta análise reafirma que é possível economizar tanto tempo quanto dinheiro usando GPUs na nuvem, e conclui que quanto maior for o dado e quanto mais computacionalmente intenso for o operador, maiores serão os ganhos de desempenho observados e economias.Palavras-chave: computação em nuvem, GPU, processamento sísmico.

Download Full-text

Performance Modeling of Spatio-Temporal Algorithms Over GEDS Framework

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2012070104 ◽

2012 ◽

Vol 4 (3) ◽

pp. 63-84

Author(s):

Jonathan Cazalas ◽

Ratan K. Guha

Keyword(s):

Parallel Processing ◽

Data Streams ◽

Graphics Processing Unit ◽

Performance Model ◽

Processing Unit ◽

Data Streaming ◽

Temporal Data ◽

Processing Power ◽

Spatio Temporal ◽

Graphics Processing

The efficient processing of spatio-temporal data streams is an area of intense research. However, all methods rely on an unsuitable processor (Govindaraju, 2004), namely a CPU, to evaluate concurrent, continuous spatio-temporal queries over these data streams. This paper presents a performance model of the execution of spatio-temporal queries over the authors’ GEDS framework (Cazalas & Guha, 2010). GEDS is a scalable, Graphics Processing Unit (GPU)-based framework, employing computation sharing and parallel processing paradigms to deliver scalability in the evaluation of continuous, spatio-temporal queries over spatio temporal data streams. Experimental evaluation shows the scalability and efficacy of GEDS in spatio-temporal data streaming environments and demonstrates that, despite the costs associated with memory transfers, the parallel processing power provided by GEDS clearly counters and outweighs any associated costs. To move beyond the analysis of specific algorithms over the GEDS framework, the authors developed an abstract performance model, detailing the relationship of the CPU and the GPU. From this model, they are able to extrapolate a list of attributes common to successful GPU-based applications, thereby providing insight into which algorithms and applications are best suited for the GPU and also providing an estimated theoretical speedup for said GPU-based applications.

Download Full-text

A Representation of Membrane Computing with a Clustering Algorithm on the Graphical Processing Unit

Processes ◽

10.3390/pr8091199 ◽

2020 ◽

Vol 8 (9) ◽

pp. 1199

Author(s):

Ravie Chandren Muniyandi ◽

Ali Maroosi

Keyword(s):

Graphics Processing Units ◽

Clustering Algorithm ◽

Hamiltonian Path ◽

Fold Increase ◽

General Purpose ◽

Processing Unit ◽

Thread Block ◽

Hard Problems ◽

Graphical Processing ◽

Graphics Processing

Long-timescale simulations of biological processes such as photosynthesis or attempts to solve NP-hard problems such as traveling salesman, knapsack, Hamiltonian path, and satisfiability using membrane systems without appropriate parallelization can take hours or days. Graphics processing units (GPU) deliver an immensely parallel mechanism to compute general-purpose computations. Previous studies mapped one membrane to one thread block on GPU. This is disadvantageous given that when the quantity of objects for each membrane is small, the quantity of active thread will also be small, thereby decreasing performance. While each membrane is designated to one thread block, the communication between thread blocks is needed for executing the communication between membranes. Communication between thread blocks is a time-consuming process. Previous approaches have also not addressed the issue of GPU occupancy. This study presents a classification algorithm to manage dependent objects and membranes based on the communication rate associated with the defined weighted network and assign them to sub-matrices. Thus, dependent objects and membranes are allocated to the same threads and thread blocks, thereby decreasing communication between threads and thread blocks and allowing GPUs to maintain the highest occupancy possible. The experimental results indicate that for 48 objects per membrane, the algorithm facilitates a 93-fold increase in processing speed compared to a 1.6-fold increase with previous algorithms.

Download Full-text

Performance improvements for iterative electron tomography reconstruction using graphics processing units (GPUs)

Journal of Structural Biology ◽

10.1016/j.jsb.2011.07.017 ◽

2011 ◽

Vol 176 (2) ◽

pp. 250-253 ◽

Cited By ~ 171

Author(s):

W.J. Palenstijn ◽

K.J. Batenburg ◽

J. Sijbers

Keyword(s):

Graphics Processing Units ◽

Electron Tomography ◽

Performance Improvements ◽

Tomography Reconstruction ◽

Graphics Processing

Download Full-text

Parallel map projection of vector-based big spatial data: Coupling cloud computing with graphics processing units

Computers Environment and Urban Systems ◽

10.1016/j.compenvurbsys.2014.01.001 ◽

2017 ◽

Vol 61 ◽

pp. 187-197 ◽

Cited By ~ 27

Author(s):

Wenwu Tang ◽

Wenpeng Feng

Keyword(s):

Cloud Computing ◽

Spatial Data ◽

Graphics Processing Units ◽

Map Projection ◽

Graphics Processing

Download Full-text

PI-FLAME: A parallel immune system simulator using the FLAME graphic processing unit environment

SIMULATION ◽

10.1177/0037549716673724 ◽

2016 ◽

Vol 93 (1) ◽

pp. 69-84 ◽

Cited By ~ 6

Author(s):

Shailesh Tamrakar ◽

Paul Richmond ◽

Roshan M D’Souza

Keyword(s):

Immune System ◽

Graphics Processing Units ◽

Processing Unit ◽

Human Immune System ◽

Innate And Adaptive Immunity ◽

Agent Based ◽

Central Processing ◽

Agent Simulation ◽

Study Population ◽

Graphics Processing

Agent-based models (ABMs) are increasingly being used to study population dynamics in complex systems, such as the human immune system. Previously, Folcik et al. (The basic immune simulator: an agent-based model to study the interactions between innate and adaptive immunity. Theor Biol Med Model 2007; 4: 39) developed a Basic Immune Simulator (BIS) and implemented it using the Recursive Porous Agent Simulation Toolkit (RePast) ABM simulation framework. However, frameworks such as RePast are designed to execute serially on central processing units and therefore cannot efficiently handle large model sizes. In this paper, we report on our implementation of the BIS using FLAME GPU, a parallel computing ABM simulator designed to execute on graphics processing units. To benchmark our implementation, we simulate the response of the immune system to a viral infection of generic tissue cells. We compared our results with those obtained from the original RePast implementation for statistical accuracy. We observe that our implementation has a 13× performance advantage over the original RePast implementation.

Download Full-text

An Accelerated 3D Navier–Stokes Solver for Flows in Turbomachines

Journal of Turbomachinery ◽

10.1115/1.4001192 ◽

2010 ◽

Vol 133 (2) ◽

Cited By ~ 43

Author(s):

Tobias Brandvik ◽

Graham Pullan

Keyword(s):

Graphics Processing Units ◽

Three Dimensional ◽

Navier Stokes ◽

Linear Scaling ◽

Test Case ◽

Processing Unit ◽

Central Processing ◽

Order Of Magnitude ◽

Graphics Processing ◽

Good Agreement

A new three-dimensional Navier–Stokes solver for flows in turbomachines has been developed. The new solver is based on the latest version of the Denton codes but has been implemented to run on graphics processing units (GPUs) instead of the traditional central processing unit. The change in processor enables an order-of-magnitude reduction in run-time due to the higher performance of the GPU. The scaling results for a 16 node GPU cluster are also presented, showing almost linear scaling for typical turbomachinery cases. For validation purposes, a test case consisting of a three-stage turbine with complete hub and casing leakage paths is described. Good agreement is obtained with previously published experimental results. The simulation runs in less than 10 min on a cluster with four GPUs.

Download Full-text