simd processors
Recently Published Documents


TOTAL DOCUMENTS

46
(FIVE YEARS 1)

H-INDEX

8
(FIVE YEARS 0)

2021 ◽  
Vol 12 (6) ◽  
pp. 295-301
Author(s):  
A. A. Titova ◽  
◽  
V. A. Roganov ◽  
G. A. Lukyanchenko ◽  
S. G. Elizarov ◽  
...  

Cryptonight is one of the possible base algorithms for cryptocurrencies. It belongs to the group of memory-bound algorithms, designed to prevent mining on specialized processors and ASICs by using 2MB of memory for each hash. Thus, it is not easy to adapt for parallel computing. The aim of this work is to prove theoretically and experimentally that this algorithm can still be optimized for a specialized multicore processor to make mining more energetically efficient than on CPU. This article describes the process of optimization, which was conducted using the following methods: data clustering, storage of repeatedly used data in local memory, usage of SIMD for parallel computing, data prefetch. Those methods are first explained, their supposed effectiveness analyzed, and then implemented. As a result, two schemes of algorithm optimization were created: first one is based on the usage of MALTs slave cores, which compute hashes independently. Although memory-boundness creates multiple problems, we were able to increase the efficiency by clustering data. The second scheme is more complicated, it suggests using SIMD processors for most cryptographic computations and also involves data prefetch, which becomes possible if more than one hash is calculated on one core at the same time. All the results are demonstrated in the paper and they indicate that it is indeed possible to optimize Cryptonight for a specialized multicore processor MALT. The practical results show that energy efficiency has increased 5 times in comparison with CPU.


2019 ◽  
Vol 214 ◽  
pp. 02002 ◽  
Author(s):  
Giuseppe Cerati ◽  
Peter Elmer ◽  
Brian Gravelle ◽  
Matti Kortelainen ◽  
Vyacheslav Krutelyov ◽  
...  

The High-Luminosity Large Hadron Collider at CERN will be characterized by greater pileup of events and higher occupancy, making the track reconstruction even more computationally demanding. Existing algorithms at the LHC are based on Kalman filter techniques with proven excellent physics performance under a variety of conditions. Starting in 2014, we have been developing Kalman-filter-based methods for track finding and fitting adapted for many-core SIMD processors that are becoming dominant in high-performance systems. This paper summarizes the latest extensions to our software that allow it to run on the realistic CMS-2017 tracker geometry using CMSSW-generated events, including pileup. The reconstructed tracks can be validated against either the CMSSW simulation that generated the detector hits, or the CMSSW reconstruction of the tracks. In general, the code’s computational performance has continued to improve while the above capabilities were being added. We demonstrate that the present Kalman filter implementation is able to reconstruct events with comparable physics performance to CMSSW, while providing generally better computational performance. Further plans for advancing the software are discussed.


Author(s):  
Yann Barsamian ◽  
Arthur Charguéraud ◽  
Sever A. Hirstoaga ◽  
Michel Mehrenberger

Author(s):  
Tetsuya Hoshino ◽  
Akihiro Ida ◽  
Toshihiro Hanawa ◽  
Kengo Nakajima
Keyword(s):  

Author(s):  
Joonas Multanen ◽  
Timo Viitanen ◽  
Henry Linjamaki ◽  
Heikki Kultala ◽  
Pekka Jaaskelainen ◽  
...  
Keyword(s):  

2013 ◽  
Vol 347-350 ◽  
pp. 1727-1731 ◽  
Author(s):  
Kai Zhang ◽  
Yao Hua Wang ◽  
Shu Ming Chen ◽  
Zhen Tao Li ◽  
Liang Wen

Wireless communication and multimedia applications feature a large amount of matrix operations with different matrix size. These operations require accessing matrix in column order. This paper implements a Multi-Grained Matrix Register File (MMRF) that supports multi-grained parallel row-wise and column-wise access. We implement a 4*4 MIMO decoding with the help of MMRF to illustrate the efficient matrix operations on SIMD processors. Experimental results show that, compared with TMS320C64x+, our SIMD processor can achieve about 5.65x to 7.71x performance improvement by employing the MMRF. By customized design technology, we reduce the area and critical-path delay of MMRF by 17.9% and 39.1% respectively.


2013 ◽  
Vol 74 (2) ◽  
pp. 137-150
Author(s):  
Yi Wang ◽  
Linfeng Pan ◽  
Zili Shao ◽  
Yong Guan ◽  
Minyi Guo

Sign in / Sign up

Export Citation Format

Share Document