Optimizing Tensor Contractions for Embedded Devices with Racetrack and DRAM Memories

Asif Ali Khan; Norman A. Rink; Fazal Hameed; Jeronimo Castrillon

doi:10.1145/3396235

Optimizing tensor contractions for embedded devices with racetrack memory scratch-pads

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems - LCTES 2019 ◽

10.1145/3316482.3326351 ◽

2019 ◽

Cited By ~ 4

Author(s):

Asif Ali Khan ◽

Norman A. Rink ◽

Fazal Hameed ◽

Jeronimo Castrillon

Keyword(s):

Embedded Devices ◽

Tensor Contractions

Download Full-text

Towards a standards-compliant pure-software trusted execution environment for resource-constrained embedded devices

Proceedings of the 4th Workshop on System Software for Trusted Execution - SysTEX '19 ◽

10.1145/3342559.3365338 ◽

2019 ◽

Cited By ~ 1

Author(s):

Hassaan Janjua ◽

Mahmoud Ammar ◽

Bruno Crispo ◽

Danny Hughes

Keyword(s):

Resource Constrained ◽

Embedded Devices ◽

Execution Environment ◽

Trusted Execution Environment

Download Full-text

An energy efficient garbage collector for java embedded devices

ACM SIGPLAN Notices ◽

10.1145/1070891.1065943 ◽

2005 ◽

Vol 40 (7) ◽

pp. 230-238 ◽

Cited By ~ 2

Author(s):

Paul Griffin ◽

Witawas Srisa-an ◽

J. Morris Chang

Keyword(s):

Energy Efficient ◽

Embedded Devices ◽

Garbage Collector

Download Full-text

A Covid-19 viral transmission prevention system for embedded devices utilising deep learning

2021 32nd Irish Signals and Systems Conference (ISSC) ◽

10.1109/issc52156.2021.9467847 ◽

2021 ◽

Author(s):

Mihai Penica ◽

Reenu Mohandas ◽

Mangolika Bhattacharya ◽

Karl Vancamp ◽

Martin Hayes ◽

...

Keyword(s):

Deep Learning ◽

Viral Transmission ◽

Embedded Devices ◽

Prevention System ◽

Transmission Prevention

Download Full-text

Processor Pipelining Method for Efficient Deep Neural Network Inference on Embedded Devices

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) ◽

10.1109/hipc50609.2020.00022 ◽

2020 ◽

Author(s):

Akshay Parashar ◽

Arun Abraham ◽

Deepak Chaudhary ◽

Vikram Nelvoy Rajendiran

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Network Inference ◽

Embedded Devices

Download Full-text

Automatic Vulnerability Detection in Embedded Devices and Firmware

ACM Computing Surveys ◽

10.1145/3432893 ◽

2021 ◽

Vol 54 (2) ◽

pp. 1-42

Author(s):

Abdullah Qasem ◽

Paria Shirani ◽

Mourad Debbabi ◽

Lingyu Wang ◽

Bernard Lebel ◽

...

Keyword(s):

Embedded Systems ◽

Symbolic Execution ◽

Vital Role ◽

Security And Privacy ◽

Embedded Devices ◽

Security Vulnerabilities ◽

Future Directions ◽

Hybrid Approaches ◽

Wide Range ◽

The Internet Of Things

In the era of the internet of things (IoT), software-enabled inter-connected devices are of paramount importance. The embedded systems are very frequently used in both security and privacy-sensitive applications. However, the underlying software (a.k.a. firmware) very often suffers from a wide range of security vulnerabilities, mainly due to their outdated systems or reusing existing vulnerable libraries; which is evident by the surprising rise in the number of attacks against embedded systems. Therefore, to protect those embedded systems, detecting the presence of vulnerabilities in the large pool of embedded devices and their firmware plays a vital role. To this end, there exist several approaches to identify and trigger potential vulnerabilities within deployed embedded systems firmware. In this survey, we provide a comprehensive review of the state-of-the-art proposals, which detect vulnerabilities in embedded systems and firmware images by employing various analysis techniques, including static analysis, dynamic analysis, symbolic execution, and hybrid approaches. Furthermore, we perform both quantitative and qualitative comparisons among the surveyed approaches. Moreover, we devise taxonomies based on the applications of those approaches, the features used in the literature, and the type of the analysis. Finally, we identify the unresolved challenges and discuss possible future directions in this field of research.

Download Full-text

Fast Bilinear Algorithms for Symmetric Tensor Contractions

Computational Methods in Applied Mathematics ◽

10.1515/cmam-2019-0075 ◽

2020 ◽

Vol 0 (0) ◽

Cited By ~ 1

Author(s):

Edgar Solomonik ◽

James Demmel

Keyword(s):

Symmetric Matrix ◽

Matrix Multiplication ◽

Computational Cost ◽

Symmetric Tensor ◽

Partial Sums ◽

Symmetric Tensors ◽

Outer Product ◽

Bilinear Complexity ◽

Special Cases ◽

Tensor Contractions

AbstractIn matrix-vector multiplication, matrix symmetry does not permit a straightforward reduction in computational cost. More generally, in contractions of symmetric tensors, the symmetries are not preserved in the usual algebraic form of contraction algorithms. We introduce an algorithm that reduces the bilinear complexity (number of computed elementwise products) for most types of symmetric tensor contractions. In particular, it lowers the bilinear complexity of symmetrized contractions of symmetric tensors of order {s+v} and {v+t} by a factor of {\frac{(s+t+v)!}{s!t!v!}} to leading order. The algorithm computes a symmetric tensor of bilinear products, then subtracts unwanted parts of its partial sums. Special cases of this algorithm provide improvements to the bilinear complexity of the multiplication of a symmetric matrix and a vector, the symmetrized vector outer product, and the symmetrized product of symmetric matrices. While the algorithm requires more additions for each elementwise product, the total number of operations is in some cases less than classical algorithms, for tensors of any size. We provide a round-off error analysis of the algorithm and demonstrate that the error is not too large in practice. Finally, we provide an optimized implementation for one variant of the symmetry-preserving algorithm, which achieves speedups of up to 4.58\times for a particular tensor contraction, relative to a classical approach that casts the problem as a matrix-matrix multiplication.

Download Full-text

Real-Time Instance Segmentation of Traffic Videos for Embedded Devices

Sensors ◽

10.3390/s21010275 ◽

2021 ◽

Vol 21 (1) ◽

pp. 275

Author(s):

Ruben Panero Martinez ◽

Ionut Schiopu ◽

Bruno Cornelis ◽

Adrian Munteanu

Keyword(s):

Real Time ◽

Network Architecture ◽

Training Procedure ◽

Segmentation Method ◽

Embedded Devices ◽

Network Training ◽

Assignment Algorithm ◽

Ablation Study ◽

Reduced Rate ◽

Instance Segmentation

The paper proposes a novel instance segmentation method for traffic videos devised for deployment on real-time embedded devices. A novel neural network architecture is proposed using a multi-resolution feature extraction backbone and improved network designs for the object detection and instance segmentation branches. A novel post-processing method is introduced to ensure a reduced rate of false detection by evaluating the quality of the output masks. An improved network training procedure is proposed based on a novel label assignment algorithm. An ablation study on speed-vs.-performance trade-off further modifies the two branches and replaces the conventional ResNet-based performance-oriented backbone with a lightweight speed-oriented design. The proposed architectural variations achieve real-time performance when deployed on embedded devices. The experimental results demonstrate that the proposed instance segmentation method for traffic videos outperforms the you only look at coefficients algorithm, the state-of-the-art real-time instance segmentation method. The proposed architecture achieves qualitative results with 31.57 average precision on the COCO dataset, while its speed-oriented variations achieve speeds of up to 66.25 frames per second on the Jetson AGX Xavier module.

Download Full-text