Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

Robert Stewart; Andrew Nowlan; Pascal Bacchus; Quentin Ducasse; Ekaterina Komendantskaya

doi:10.3390/electronics10040396

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

Electronics ◽

10.3390/electronics10040396 ◽

2021 ◽

Vol 10 (4) ◽

pp. 396

Author(s):

Robert Stewart ◽

Andrew Nowlan ◽

Pascal Bacchus ◽

Quentin Ducasse ◽

Ekaterina Komendantskaya

Keyword(s):

Neural Networks ◽

Evolutionary Algorithm ◽

Sweet Spot ◽

Throughput Performance ◽

Trade Off ◽

Training Time ◽

Knowledge Distillation ◽

Hardware Costs ◽

The Cost ◽

Evolving Models

This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with our new multi-objective evolutionary algorithm called NEMOKD, and with quantisation. We evaluate NEMOKD on Intel’s Movidius Myriad X VPU processor, and quantisation on Xilinx’s programmable Z7020 FPGA hardware. Evolving models with NEMOKD increases inference accuracy by up to 82% at the cost of 38% increased latency, with throughput performance of 100–590 image frames-per-second (FPS). Quantisation identifies a sweet spot of 3 bit precision in the trade-off between latency, hardware requirements, training time and accuracy. Parallelising FPGA implementations of 2 and 3 bit quantised neural networks increases throughput from 6 k FPS to 373 k FPS, a 62× speedup.

Download Full-text

Design and Evaluation of Bulk Data Transfer Extensions for the NFComms Framework

South African Computer Journal ◽

10.18489/sacj.v31i2.692 ◽

2019 ◽

Vol 31 (2) ◽

Author(s):

Sean Pennefather ◽

Karen Bradshaw ◽

Barry Irwin

Keyword(s):

Network Flow ◽

Message Passing ◽

Data Transfer ◽

Low Latency ◽

Throughput Performance ◽

Trade Off ◽

Design And Implementation ◽

Bulk Data ◽

Bulk Data Transfer ◽

The Cost

We present the design and implementation of an indirect messaging extension for the existing NFComms framework that provides communication between a network flow processor and host CPU. This extension addresses the bulk throughput limitations of the framework and is intended to work in conjunction with existing communication mediums. Testing of the framework extensions shows an increase in throughput performance of up to 268x that of the current direct message passing framework at the cost of increased single message latency of up to 2x. This trade-off is considered acceptable as the proposed extensions are intended for bulk data transfer only while the existing message passing functionality of the framework is preserved and can be used in situations where low latency is required for small messages.

Download Full-text

Protecting Neural Networks with Hierarchical Random Switching: Towards Better Robustness-Accuracy Trade-off for Stochastic Defenses

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/833 ◽

2019 ◽

Cited By ~ 3

Author(s):

Xiao Wang ◽

Siyue Wang ◽

Pin-Yu Chen ◽

Yanzhi Wang ◽

Brian Kulis ◽

...

Keyword(s):

Neural Networks ◽

Random Perturbation ◽

Efficiency Score ◽

Test Accuracy ◽

Stochastic Network ◽

Trade Off ◽

Neuron Activation ◽

Random Switching ◽

Fixed Model ◽

The Cost

Despite achieving remarkable success in various domains, recent studies have uncovered the vulnerability of deep neural networks to adversarial perturbations, creating concerns on model generalizability and new threats such as prediction-evasive misclassification or stealthy reprogramming. Among different defense proposals, stochastic network defenses such as random neuron activation pruning or random perturbation to layer inputs are shown to be promising for attack mitigation. However, one critical drawback of current defenses is that the robustness enhancement is at the cost of noticeable performance degradation on legitimate data, e.g., large drop in test accuracy.This paper is motivated by pursuing for a better trade-off between adversarial robustness and test accuracy for stochastic network defenses. We propose Defense Efficiency Score (DES), a comprehensive metric that measures the gain in unsuccessful attack attempts at the cost of drop in test accuracy of any defense. To achieve a better DES, we propose hierarchical random switching (HRS), which protects neural networks through a novel randomization scheme. A HRS-protected model contains several blocks of randomly switching channels to prevent adversaries from exploiting fixed model structures and parameters for their malicious purposes. Extensive experiments show that HRS is superior in defending against state-of-the-art white-box and adaptive adversarial misclassification attacks. We also demonstrate the effectiveness of HRS in defending adversarial reprogramming, which is the first defense against adversarial programs. Moreover, in most settings the average DES of HRS is at least 5X higher than current stochastic network defenses, validating its significantly improved robustness-accuracy trade-off.

Download Full-text

Developing Schedule With Linear Programming (Case Study: STTF II Project Komplek Sukamukti Banjaran)

International Journal of Innovation in Enterprise System ◽

10.25124/ijies.v4i02.77 ◽

2020 ◽

Vol 4 (02) ◽

pp. 34-45

Author(s):

Naufal Dzikri Afifi ◽

Ika Arum Puspita ◽

Mohammad Deni Akbar

Keyword(s):

Linear Programming ◽

Objective Function ◽

Project Scheduling ◽

Time Limit ◽

Minimum Time ◽

Service Cost ◽

Trade Off ◽

Overtime Work ◽

The Cost

Shift to The Front II Komplek Sukamukti Banjaran Project is one of the projects implemented by one of the companies engaged in telecommunications. In its implementation, each project including Shift to The Front II Komplek Sukamukti Banjaran has a time limit specified in the contract. Project scheduling is an important role in predicting both the cost and time in a project. Every project should be able to complete the project before or just in the time specified in the contract. Delay in a project can be anticipated by accelerating the duration of completion by using the crashing method with the application of linear programming. Linear programming will help iteration in the calculation of crashing because if linear programming not used, iteration will be repeated. The objective function in this scheduling is to minimize the cost. This study aims to find a trade-off between the costs and the minimum time expected to complete this project. The acceleration of the duration of this study was carried out using the addition of 4 hours of overtime work, 3 hours of overtime work, 2 hours of overtime work, and 1 hour of overtime work. The normal time for this project is 35 days with a service fee of Rp. 52,335,690. From the results of the crashing analysis, the alternative chosen is to add 1 hour of overtime to 34 days with a total service cost of Rp. 52,375,492. This acceleration will affect the entire project because there are 33 different locations worked on Shift to The Front II and if all these locations can be accelerated then the duration of completion of the entire project will be effective

Download Full-text

TRANSITION TO CONVOLUTIONAL NEURAL NETWORKS IN RADAR PROBLEMS

Informatization and communication ◽

10.34219/2078-8320-2019-10-3-96-99 ◽

2019 ◽

pp. 96-99

Author(s):

K. Maystrenko ◽

A. Budilov ◽

D. Afanasev

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Target Detection ◽

Convolutional Neural Networks ◽

Network Topology ◽

Subject Areas ◽

The Subject ◽

Printed Materials ◽

Hardware Costs

Goal. Identify trends and prospects for the development of radar in terms of the use of convolutional neural networks for target detection. Materials and methods. Analysis of relevant printed materials related to the subject areas of radar and convolutional neural networks. Results. The transition to convolutional neural networks in the field of radar is considered. A review of papers on the use of convolutional neural networks in pattern recognition problems, in particular, in the radar problem, is carried out. Hardware costs for the implementation of convolutional neural networks are analyzed. Conclusion. The conclusion is made about the need to create a methodology for selecting a network topology depending on the parameters of the radar task.

Download Full-text

Black Hole Algorithm for Sustainable Design of Counterfort Retaining Walls

Sustainability ◽

10.3390/su12072767 ◽

2020 ◽

Vol 12 (7) ◽

pp. 2767 ◽

Cited By ~ 13

Author(s):

Víctor Yepes ◽

José V. Martí ◽

José García

Keyword(s):

Black Hole ◽

Sustainable Design ◽

Retaining Walls ◽

The Other ◽

Trade Off ◽

Construction Company ◽

Geometric Variables ◽

The Stability ◽

The Cost ◽

Black Hole Algorithm

The optimization of the cost and CO 2 emissions in earth-retaining walls is of relevance, since these structures are often used in civil engineering. The optimization of costs is essential for the competitiveness of the construction company, and the optimization of emissions is relevant in the environmental impact of construction. To address the optimization, black hole metaheuristics were used, along with a discretization mechanism based on min–max normalization. The stability of the algorithm was evaluated with respect to the solutions obtained; the steel and concrete values obtained in both optimizations were analyzed. Additionally, the geometric variables of the structure were compared. Finally, the results obtained were compared with another algorithm that solved the problem. The results show that there is a trade-off between the use of steel and concrete. The solutions that minimize CO 2 emissions prefer the use of concrete instead of those that optimize the cost. On the other hand, when comparing the geometric variables, it is seen that most remain similar in both optimizations except for the distance between buttresses. When comparing with another algorithm, the results show a good performance in optimization using the black hole algorithm.

Download Full-text

Evaluation of Mixed Deep Neural Networks for Reverberant Speech Enhancement

Biomimetics ◽

10.3390/biomimetics5010001 ◽

2019 ◽

Vol 5 (1) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Michelle Gutiérrez-Muñoz ◽

Astryd González-Salazar ◽

Marvin Coto-Jiménez

Keyword(s):

Neural Networks ◽

Short Term Memory ◽

Computational Cost ◽

Real Life ◽

Fixed Number ◽

Training Procedure ◽

Statistical Validation ◽

Significant Drop ◽

Training Time ◽

Important Solution

Speech signals are degraded in real-life environments, as a product of background noise or other factors. The processing of such signals for voice recognition and voice analysis systems presents important challenges. One of the conditions that make adverse quality difficult to handle in those systems is reverberation, produced by sound wave reflections that travel from the source to the microphone in multiple directions. To enhance signals in such adverse conditions, several deep learning-based methods have been proposed and proven to be effective. Recently, recurrent neural networks, especially those with long short-term memory (LSTM), have presented surprising results in tasks related to time-dependent processing of signals, such as speech. One of the most challenging aspects of LSTM networks is the high computational cost of the training procedure, which has limited extended experimentation in several cases. In this work, we present a proposal to evaluate the hybrid models of neural networks to learn different reverberation conditions without any previous information. The results show that some combinations of LSTM and perceptron layers produce good results in comparison to those from pure LSTM networks, given a fixed number of layers. The evaluation was made based on quality measurements of the signal’s spectrum, the training time of the networks, and statistical validation of results. In total, 120 artificial neural networks of eight different types were trained and compared. The results help to affirm the fact that hybrid networks represent an important solution for speech signal enhancement, given that reduction in training time is on the order of 30%, in processes that can normally take several days or weeks, depending on the amount of data. The results also present advantages in efficiency, but without a significant drop in quality.

Download Full-text

Sensuator: A Hybrid Sensor–Actuator Approach to Soft Robotic Proprioception Using Recurrent Neural Networks

Actuators ◽

10.3390/act10020030 ◽

2021 ◽

Vol 10 (2) ◽

pp. 30

Author(s):

Pornthep Preechayasomboon ◽

Eric Rombokas

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Linear Models ◽

Open Loop ◽

Proof Of Concept ◽

State Estimator ◽

Loop Control ◽

Practical Applications ◽

Soft Actuator ◽

The Cost

Soft robotic actuators are now being used in practical applications; however, they are often limited to open-loop control that relies on the inherent compliance of the actuator. Achieving human-like manipulation and grasping with soft robotic actuators requires at least some form of sensing, which often comes at the cost of complex fabrication and purposefully built sensor structures. In this paper, we utilize the actuating fluid itself as a sensing medium to achieve high-fidelity proprioception in a soft actuator. As our sensors are somewhat unstructured, their readings are difficult to interpret using linear models. We therefore present a proof of concept of a method for deriving the pose of the soft actuator using recurrent neural networks. We present the experimental setup and our learned state estimator to show that our method is viable for achieving proprioception and is also robust to common sensor failures.

Download Full-text

Hybrid last mile delivery fleets with crowdsourcing: A systems view of managing the cost‐service trade‐off

Journal of Business Logistics ◽

10.1111/jbl.12288 ◽

2021 ◽

Author(s):

Vincent E. Castillo ◽

John E. Bell ◽

Diane A. Mollenkopf ◽

Theodore P. Stank

Keyword(s):

Service Trade ◽

Trade Off ◽

Last Mile ◽

The Cost ◽

Last Mile Delivery ◽

Systems View

Download Full-text

Explaining Neural Networks Using Attentive Knowledge Distillation

Sensors ◽

10.3390/s21041280 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1280

Author(s):

Hyeonseok Lee ◽

Sungchan Kim

Keyword(s):

Neural Networks ◽

Model Prediction ◽

Saliency Map ◽

Model Parameters ◽

Learning Capability ◽

Fine Grained ◽

Network Layers ◽

Saliency Maps ◽

Novel Approach ◽

Knowledge Distillation

Explaining the prediction of deep neural networks makes the networks more understandable and trusted, leading to their use in various mission critical tasks. Recent progress in the learning capability of networks has primarily been due to the enormous number of model parameters, so that it is usually hard to interpret their operations, as opposed to classical white-box models. For this purpose, generating saliency maps is a popular approach to identify the important input features used for the model prediction. Existing explanation methods typically only use the output of the last convolution layer of the model to generate a saliency map, lacking the information included in intermediate layers. Thus, the corresponding explanations are coarse and result in limited accuracy. Although the accuracy can be improved by iteratively developing a saliency map, this is too time-consuming and is thus impractical. To address these problems, we proposed a novel approach to explain the model prediction by developing an attentive surrogate network using the knowledge distillation. The surrogate network aims to generate a fine-grained saliency map corresponding to the model prediction using meaningful regional information presented over all network layers. Experiments demonstrated that the saliency maps are the result of spatially attentive features learned from the distillation. Thus, they are useful for fine-grained classification tasks. Moreover, the proposed method runs at the rate of 24.3 frames per second, which is much faster than the existing methods by orders of magnitude.

Download Full-text

Aggregation of cohorts for histopathological diagnosis with deep morphological analysis

Scientific Reports ◽

10.1038/s41598-021-82642-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jeonghyuk Park ◽

Yul Ri Chung ◽

Seo Taek Kong ◽

Yeong Won Kim ◽

Hyunho Park ◽

...

Keyword(s):

Cancer Detection ◽

Morphological Analysis ◽

Deep Neural Networks ◽

Large Datasets ◽

Histopathological Diagnosis ◽

Single Model ◽

Trade Off ◽

Detection Model ◽

Optimal Behavior ◽

The Cost

AbstractThere have been substantial efforts in using deep learning (DL) to diagnose cancer from digital images of pathology slides. Existing algorithms typically operate by training deep neural networks either specialized in specific cohorts or an aggregate of all cohorts when there are only a few images available for the target cohort. A trade-off between decreasing the number of models and their cancer detection performance was evident in our experiments with The Cancer Genomic Atlas dataset, with the former approach achieving higher performance at the cost of having to acquire large datasets from the cohort of interest. Constructing annotated datasets for individual cohorts is extremely time-consuming, with the acquisition cost of such datasets growing linearly with the number of cohorts. Another issue associated with developing cohort-specific models is the difficulty of maintenance: all cohort-specific models may need to be adjusted when a new DL algorithm is to be used, where training even a single model may require a non-negligible amount of computation, or when more data is added to some cohorts. In resolving the sub-optimal behavior of a universal cancer detection model trained on an aggregate of cohorts, we investigated how cohorts can be grouped to augment a dataset without increasing the number of models linearly with the number of cohorts. This study introduces several metrics which measure the morphological similarities between cohort pairs and demonstrates how the metrics can be used to control the trade-off between performance and the number of models.

Download Full-text