Optimizing Tensor Contractions for Embedded Devices with Racetrack and DRAM Memories

2020 ◽  
Vol 19 (6) ◽  
pp. 1-26 ◽  
Author(s):  
Asif Ali Khan ◽  
Norman A. Rink ◽  
Fazal Hameed ◽  
Jeronimo Castrillon
2005 ◽  
Vol 40 (7) ◽  
pp. 230-238 ◽  
Author(s):  
Paul Griffin ◽  
Witawas Srisa-an ◽  
J. Morris Chang

2021 ◽  
Vol 54 (2) ◽  
pp. 1-42
Author(s):  
Abdullah Qasem ◽  
Paria Shirani ◽  
Mourad Debbabi ◽  
Lingyu Wang ◽  
Bernard Lebel ◽  
...  

In the era of the internet of things (IoT), software-enabled inter-connected devices are of paramount importance. The embedded systems are very frequently used in both security and privacy-sensitive applications. However, the underlying software (a.k.a. firmware) very often suffers from a wide range of security vulnerabilities, mainly due to their outdated systems or reusing existing vulnerable libraries; which is evident by the surprising rise in the number of attacks against embedded systems. Therefore, to protect those embedded systems, detecting the presence of vulnerabilities in the large pool of embedded devices and their firmware plays a vital role. To this end, there exist several approaches to identify and trigger potential vulnerabilities within deployed embedded systems firmware. In this survey, we provide a comprehensive review of the state-of-the-art proposals, which detect vulnerabilities in embedded systems and firmware images by employing various analysis techniques, including static analysis, dynamic analysis, symbolic execution, and hybrid approaches. Furthermore, we perform both quantitative and qualitative comparisons among the surveyed approaches. Moreover, we devise taxonomies based on the applications of those approaches, the features used in the literature, and the type of the analysis. Finally, we identify the unresolved challenges and discuss possible future directions in this field of research.


Author(s):  
Edgar Solomonik ◽  
James Demmel

AbstractIn matrix-vector multiplication, matrix symmetry does not permit a straightforward reduction in computational cost. More generally, in contractions of symmetric tensors, the symmetries are not preserved in the usual algebraic form of contraction algorithms. We introduce an algorithm that reduces the bilinear complexity (number of computed elementwise products) for most types of symmetric tensor contractions. In particular, it lowers the bilinear complexity of symmetrized contractions of symmetric tensors of order {s+v} and {v+t} by a factor of {\frac{(s+t+v)!}{s!t!v!}} to leading order. The algorithm computes a symmetric tensor of bilinear products, then subtracts unwanted parts of its partial sums. Special cases of this algorithm provide improvements to the bilinear complexity of the multiplication of a symmetric matrix and a vector, the symmetrized vector outer product, and the symmetrized product of symmetric matrices. While the algorithm requires more additions for each elementwise product, the total number of operations is in some cases less than classical algorithms, for tensors of any size. We provide a round-off error analysis of the algorithm and demonstrate that the error is not too large in practice. Finally, we provide an optimized implementation for one variant of the symmetry-preserving algorithm, which achieves speedups of up to 4.58\times for a particular tensor contraction, relative to a classical approach that casts the problem as a matrix-matrix multiplication.


Sensors ◽  
2021 ◽  
Vol 21 (1) ◽  
pp. 275
Author(s):  
Ruben Panero Martinez ◽  
Ionut Schiopu ◽  
Bruno Cornelis ◽  
Adrian Munteanu

The paper proposes a novel instance segmentation method for traffic videos devised for deployment on real-time embedded devices. A novel neural network architecture is proposed using a multi-resolution feature extraction backbone and improved network designs for the object detection and instance segmentation branches. A novel post-processing method is introduced to ensure a reduced rate of false detection by evaluating the quality of the output masks. An improved network training procedure is proposed based on a novel label assignment algorithm. An ablation study on speed-vs.-performance trade-off further modifies the two branches and replaces the conventional ResNet-based performance-oriented backbone with a lightweight speed-oriented design. The proposed architectural variations achieve real-time performance when deployed on embedded devices. The experimental results demonstrate that the proposed instance segmentation method for traffic videos outperforms the you only look at coefficients algorithm, the state-of-the-art real-time instance segmentation method. The proposed architecture achieves qualitative results with 31.57 average precision on the COCO dataset, while its speed-oriented variations achieve speeds of up to 66.25 frames per second on the Jetson AGX Xavier module.


Sign in / Sign up

Export Citation Format

Share Document