scholarly journals A solution for automatic parallelization of sequential assembly code

2013 ◽  
Vol 10 (1) ◽  
pp. 91-101 ◽  
Author(s):  
Djordje Kovacevic ◽  
Mladen Stanojevic ◽  
Vladimir Marinkovic ◽  
Miroslav Popovic

Since modern multicore processors can execute existing sequential programs only on a single core, there is a strong need for automatic parallelization of program code. Relying on existing algorithms, this paper describes one new software solution tool for parallelization of sequential assembly code. The main goal of this paper is to develop the parallelizator which reads sequential assembler code and at the output provides parallelized code for MIPS processor with multiple cores. The idea is the following: the parser translates assembler input file to program objects suitable for further processing. After that the static single assignment is done. Based on the data flow graph, the parallelization algorithm separates instructions on different cores. Once sequential code is parallelized by the parallelization algorithm, registers are allocated with the algorithm for linear allocation, and the result at the end of the program is distributed assembler code on each of the cores. In the paper we evaluate the speedup of the matrix multiplication example, which was processed by the parallelizator of assembly code. The result is almost linear speedup of code execution, which increases with the number of cores. The speed up on the two cores is 1.99, while on 16 cores the speed up is 13.88.

Aviation ◽  
2014 ◽  
Vol 18 (2) ◽  
pp. 80-85 ◽  
Author(s):  
Volodymyr Kharchenko ◽  
Maryna Mukhina

The peculiarities of correlation-extreme visual navigation are considered. Descriptors with 64 elements of feature points of surface images are selected on the basis of the speed-up robust feature method. An analysis of possible criteria correlation functions is carried out to find the best match between the template descriptors and current images. The use of normalized correlation function is proposed based on the matrix multiplication properties of descriptors. It allows minimizing the number of false matches in comparison with the Euclidean distance in the descriptor space. The proposed matching strategy sufficiently decreases the computation time.


2018 ◽  
Vol 18 (13&14) ◽  
pp. 1095-1114
Author(s):  
Zongyuan Zhang ◽  
Zhijin Guan ◽  
Hong Zhang ◽  
Haiying Ma ◽  
Weiping Ding

In order to realize the linear nearest neighbor{(LNN)} of the quantum circuits and reduce the quantum cost of linear reversible quantum circuits, a method for synthesizing and optimizing linear reversible quantum circuits based on matrix multiplication of the structure of the quantum circuit is proposed. This method shows the matrix representation of linear quantum circuits by multiplying matrices of different parts of the whole circuit. The LNN realization by adding the SWAP gates is proposed and the equivalence of two ways of adding the SWAP gates is proved. The elimination rules of the SWAP gates between two overlapped adjacent quantum gates in different cases are proposed, which reduce the quantum cost of quantum circuits after realizing the LNN architecture. We propose an algorithm based on parallel processing in order to effectively reduce the time consumption for large-scale quantum circuits. Experiments show that the quantum cost can be improved by 34.31\% on average and the speed-up ratio of the GPU-based algorithm can reach 4 times compared with the CPU-based algorithm. The average time optimization ratio of the benchmark large-scale circuits in RevLib processed by the parallel algorithm is {95.57\%} comparing with the serial algorithm.


Author(s):  
James Howe ◽  
Marco Martinoli ◽  
Elisabeth Oswald ◽  
Francesco Regazzoni

AbstractFrodoKEM is a lattice-based key encapsulation mechanism, currently a semi-finalist in NIST’s post-quantum standardisation effort. A condition for these candidates is to use NIST standards for sources of randomness (i.e. seed-expanding), and as such most candidates utilise SHAKE, an XOF defined in the SHA-3 standard. However, for many of the candidates, this module is a significant implementation bottleneck. Trivium is a lightweight, ISO standard stream cipher which performs well in hardware and has been used in previous hardware designs for lattice-based cryptography. This research proposes optimised designs for FrodoKEM, concentrating on high throughput by parallelising the matrix multiplication operations within the cryptographic scheme. This process is eased by the use of Trivium due to its higher throughput and lower area consumption. The parallelisations proposed also complement the addition of first-order masking to the decapsulation module. Overall, we significantly increase the throughput of FrodoKEM; for encapsulation we see a $$16\times $$ 16 × speed-up, achieving 825 operations per second, and for decapsulation we see a $$14\times $$ 14 × speed-up, achieving 763 operations per second, compared to the previous state of the art, whilst also maintaining a similar FPGA area footprint of less than 2000 slices.


2016 ◽  
Vol 19 (3) ◽  
pp. 1037-1051 ◽  
Author(s):  
Sandra Catalán ◽  
Francisco D. Igual ◽  
Rafael Mayo ◽  
Rafael Rodríguez-Sánchez ◽  
Enrique S. Quintana-Ortí

2018 ◽  
Vol 12 (3) ◽  
pp. 143-157 ◽  
Author(s):  
Håvard Raddum ◽  
Pavol Zajac

Abstract We show how to build a binary matrix from the MRHS representation of a symmetric-key cipher. The matrix contains the cipher represented as an equation system and can be used to assess a cipher’s resistance against algebraic attacks. We give an algorithm for solving the system and compute its complexity. The complexity is normally close to exhaustive search on the variables representing the user-selected key. Finally, we show that for some variants of LowMC, the joined MRHS matrix representation can be used to speed up regular encryption in addition to exhaustive key search.


Telematika ◽  
2020 ◽  
Vol 17 (1) ◽  
pp. 26
Author(s):  
Afif Irfan Abdurrahman ◽  
Bambang Yuwono ◽  
Yuli Fauziah

Flood disaster is a dangerous disaster, an event that occurs due to overflow of water resulting in submerged land is called a flood disaster. Almost every year Bantul Regency is affected by floods due to high rainfall. The flood disaster that struck in Bantul Regency made the Bantul District Disaster Management Agency (BPBD) difficult to handle so that it needed a mapping of the level of the impact of the flood disaster to minimize the occurrence of floods and provide information to the public.This study will create a system to map the level of impact of floods in Bantul Regency with a decision support method namely Multi Attribute Utility Theory (MAUT). The MAUT method stage in determining the level of impact of flood disasters through the process of normalization and matrix multiplication. The method helps in determining the areas affected by floods, by managing the Indonesian Disaster Information Data (DIBI). The data managed is data on criteria for the death toll, lost victims, damage to houses, damage to public facilities, and damage to roads. Each criteria data has a value that can be used to determine the level of impact of a flood disaster. The stages for determining the level of impact of a disaster require a weighting calculation process. The results of the weighting process display the scoring value which has a value of 1 = low, 2 = moderate, 3 = high. To assist in determining the affected areas using the matrix normalization and multiplication process the process is the application of the Multi Attribute Utility Theory (MAUT) method.This study resulted in a mapping of the level of impact displayed on google maps. The map view shows the affected area points and the level of impact of the flood disaster in Bantul Regency. The mapping produced from the DIBI data in 2017 produced the highest affected area in the Imogiri sub-district. The results of testing the data can be concluded that the results of this study have an accuracy rate of 95% when compared with the results of the mapping previously carried out by BPBD Bantul Regency. The difference in the level of accuracy is because the criteria data used are not the same as the criteria data used by BPBD in Bantul Regency so that the accuracy rate is 95%.


Electronics ◽  
2021 ◽  
Vol 10 (16) ◽  
pp. 1984
Author(s):  
Wei Zhang ◽  
Zihao Jiang ◽  
Zhiguang Chen ◽  
Nong Xiao ◽  
Yang Ou

Double-precision general matrix multiplication (DGEMM) is an essential kernel for measuring the potential performance of an HPC platform. ARMv8-based system-on-chips (SoCs) have become the candidates for the next-generation HPC systems with their highly competitive performance and energy efficiency. Therefore, it is meaningful to design high-performance DGEMM for ARMv8-based SoCs. However, as ARMv8-based SoCs integrate increasing cores, modern CPU uses non-uniform memory access (NUMA). NUMA restricts the performance and scalability of DGEMM when many threads access remote NUMA domains. This poses a challenge to develop high-performance DGEMM on multi-NUMA architecture. We present a NUMA-aware method to reduce the number of cross-die and cross-chip memory access events. The critical enabler for NUMA-aware DGEMM is to leverage two levels of parallelism between and within nodes in a purely threaded implementation, which allows the task independence and data localization of NUMA nodes. We have implemented NUMA-aware DGEMM in the OpenBLAS and evaluated it on a dual-socket server with 48-core processors based on the Kunpeng920 architecture. The results show that NUMA-aware DGEMM has effectively reduced the number of cross-die and cross-chip memory access, resulting in enhancing the scalability of DGEMM significantly and increasing the performance of DGEMM by 17.1% on average, with the most remarkable improvement being 21.9%.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Jin Wang

M M -2 semitensor product is a new and very useful mathematical tool, which breaks the limitation of traditional matrix multiplication on the dimension of matrices and has a wide application prospect. This article aims to investigate the solutions of the matrix equation A ° l X = B with respect to M M -2 semitensor product. The case where the solutions of the equation are vectors is discussed first. Compatible conditions of matrices and the necessary and sufficient condition for the solvability is studied successively. Furthermore, concrete methods of solving the equation are provided. Then, the case where the solutions of the equation are matrices is studied in a similar way. Finally, several examples are given to illustrate the efficiency of the results.


Author(s):  
K. Waldherr ◽  
T. Huckle ◽  
T. Auckenthaler ◽  
U. Sander ◽  
T. Schulte-Herbrüggen

Electronics ◽  
2019 ◽  
Vol 8 (2) ◽  
pp. 143 ◽  
Author(s):  
Ruidong Wu ◽  
Bing Liu ◽  
Ping Fu ◽  
Junbao Li ◽  
Shou Feng

Matrix multiplication is a critical time-consuming processing step in many machine learning applications. Due to the diversity of practical applications, the matrix dimensions are generally not fixed. However, most matrix calculation methods, based on field programmable gate array (FPGA) currently use fixed matrix dimensions, which limit the flexibility of machine learning algorithms in a FPGA. The bottleneck lies in the limited FPGA resources. Therefore, this paper proposes an accelerator architecture for matrix computing method with changeable dimensions. Multi-matrix synchronous calculation concept allows matrix data to be processed continuously, which improves the parallel computing characteristics of FPGA and optimizes the computational efficiency. This paper tests matrix multiplication using support vector machine (SVM) algorithm to verify the performance of proposed architecture on the ZYNQ platform. The experimental results show that, compared to the software processing method, the proposed architecture increases the performance by 21.18 times with 9947 dimensions. The dimension is changeable with a maximum value of 2,097,151, without changing hardware design. This method is also applicable to matrix multiplication processing with other machine learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document