Explaining Neural Networks Using Attentive Knowledge Distillation

Hyeonseok Lee; Sungchan Kim

doi:10.3390/s21041280

Explaining Neural Networks Using Attentive Knowledge Distillation

Sensors ◽

10.3390/s21041280 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1280

Author(s):

Hyeonseok Lee ◽

Sungchan Kim

Keyword(s):

Neural Networks ◽

Model Prediction ◽

Saliency Map ◽

Model Parameters ◽

Learning Capability ◽

Fine Grained ◽

Network Layers ◽

Saliency Maps ◽

Novel Approach ◽

Knowledge Distillation

Explaining the prediction of deep neural networks makes the networks more understandable and trusted, leading to their use in various mission critical tasks. Recent progress in the learning capability of networks has primarily been due to the enormous number of model parameters, so that it is usually hard to interpret their operations, as opposed to classical white-box models. For this purpose, generating saliency maps is a popular approach to identify the important input features used for the model prediction. Existing explanation methods typically only use the output of the last convolution layer of the model to generate a saliency map, lacking the information included in intermediate layers. Thus, the corresponding explanations are coarse and result in limited accuracy. Although the accuracy can be improved by iteratively developing a saliency map, this is too time-consuming and is thus impractical. To address these problems, we proposed a novel approach to explain the model prediction by developing an attentive surrogate network using the knowledge distillation. The surrogate network aims to generate a fine-grained saliency map corresponding to the model prediction using meaningful regional information presented over all network layers. Experiments demonstrated that the saliency maps are the result of spatially attentive features learned from the distillation. Thus, they are useful for fine-grained classification tasks. Moreover, the proposed method runs at the rate of 24.3 frames per second, which is much faster than the existing methods by orders of magnitude.

Download Full-text

Semiotic Aggregation in Deep Learning

Entropy ◽

10.3390/e22121365 ◽

2020 ◽

Vol 22 (12) ◽

pp. 1365

Author(s):

Bogdan Muşat ◽

Răzvan Andonie

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Decision Model ◽

Deep Neural Networks ◽

Neural Model ◽

Network Layers ◽

Saliency Maps ◽

Spatial Entropy ◽

Insight Into

Convolutional neural networks utilize a hierarchy of neural network layers. The statistical aspects of information concentration in successive layers can bring an insight into the feature abstraction process. We analyze the saliency maps of these layers from the perspective of semiotics, also known as the study of signs and sign-using behavior. In computational semiotics, this aggregation operation (known as superization) is accompanied by a decrease of spatial entropy: signs are aggregated into supersign. Using spatial entropy, we compute the information content of the saliency maps and study the superization processes which take place between successive layers of the network. In our experiments, we visualize the superization process and show how the obtained knowledge can be used to explain the neural decision model. In addition, we attempt to optimize the architecture of the neural model employing a semiotic greedy technique. To the extent of our knowledge, this is the first application of computational semiotics in the analysis and interpretation of deep neural networks.

Download Full-text

Measuring the Uncertainty of Predictions in Deep Neural Networks with Variational Inference

Sensors ◽

10.3390/s20216011 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6011 ◽

Cited By ~ 1

Author(s):

Jan Steinbrener ◽

Konstantin Posch ◽

Jürgen Pilz

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Network Architecture ◽

Deep Neural Networks ◽

Variational Inference ◽

Model Parameters ◽

A Posteriori ◽

Novel Approach ◽

Credible Intervals ◽

Posteriori Distribution

We present a novel approach for training deep neural networks in a Bayesian way. Compared to other Bayesian deep learning formulations, our approach allows for quantifying the uncertainty in model parameters while only adding very few additional parameters to be optimized. The proposed approach uses variational inference to approximate the intractable a posteriori distribution on basis of a normal prior. By representing the a posteriori uncertainty of the network parameters per network layer and depending on the estimated parameter expectation values, only very few additional parameters need to be optimized compared to a non-Bayesian network. We compare our approach to classical deep learning, Bernoulli dropout and Bayes by Backprop using the MNIST dataset. Compared to classical deep learning, the test error is reduced by 15%. We also show that the uncertainty information obtained can be used to calculate credible intervals for the network prediction and to optimize network architecture for the dataset at hand. To illustrate that our approach also scales to large networks and input vector sizes, we apply it to the GoogLeNet architecture on a custom dataset, achieving an average accuracy of 0.92. Using 95% credible intervals, all but one wrong classification result can be detected.

Download Full-text

Salient Object Detection Based on Multiscale Segmentation and Fuzzy Broad Learning

The Computer Journal ◽

10.1093/comjnl/bxaa158 ◽

2020 ◽

Author(s):

Xiao Lin ◽

Zhi-Jie Wang ◽

Lizhuang Ma ◽

Renjie Li ◽

Mei-E Fang

Keyword(s):

Clustering Algorithm ◽

Saliency Detection ◽

Texture Features ◽

Saliency Map ◽

Label Propagation ◽

Learning System ◽

Saliency Maps ◽

Novel Approach ◽

Multiscale Segmentation ◽

Benchmark Datasets

Abstract Saliency detection has been a hot topic in the field of computer vision. In this paper, we propose a novel approach that is based on multiscale segmentation and fuzzy broad learning. The core idea of our method is to segment the image into different scales, and then the extracted features are fed to the fuzzy broad learning system (FBLS) for training. More specifically, it first segments the image into superpixel blocks at different scales based on the simple linear iterative clustering algorithm. Then, it uses the local binary pattern algorithm to extract texture features and computes the average color information for each superpixel of these segmentation images. These extracted features are then fed to the FBLS to obtain multiscale saliency maps. After that, it fuses these saliency maps into an initial saliency map and uses the label propagation algorithm to further optimize it, obtaining the final saliency map. We have conducted experiments based on several benchmark datasets. The results show that our solution can outperform several existing algorithms. Particularly, our method is significantly faster than most of deep learning-based saliency detection algorithms, in terms of training and inferring time.

Download Full-text

Deep Learning through Convolutional Neural Networks for Classification of Image A Novel Approach Using Hyper Filter

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i6.164168 ◽

2019 ◽

Vol 7 (6) ◽

pp. 164-168

Author(s):

Kshitij Tripathi ◽

Rajendra G. Vyas ◽

Anil K. Gupta

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Novel Approach

Download Full-text

A Novel Approach for Image Super Resolution Problems with Convolution Neural Networks

i-manager s Journal on Information Technology ◽

10.26634/jit.6.4.13846 ◽

2017 ◽

Vol 6 (4) ◽

pp. 15

Author(s):

JANARDHAN CHIDADALA ◽

RAMANAIAH K.V. ◽

BABULU K ◽

◽

...

Keyword(s):

Neural Networks ◽

Super Resolution ◽

Convolution Neural Networks ◽

Novel Approach ◽

Image Super Resolution

Download Full-text

Novel Approach for Inventory Planning Using OPAL and Some Neural Networks

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2017.11004 ◽

2017 ◽

Vol V (XI) ◽

pp. 23-27

Author(s):

Jay B. Simha

Keyword(s):

Neural Networks ◽

Inventory Planning ◽

Novel Approach

Download Full-text

Mining Camera Traces to Estimate Interactions Between Healthcare Workers and Patients

Infection Control and Hospital Epidemiology ◽

10.1017/ice.2020.484 ◽

2020 ◽

Vol 41 (S1) ◽

pp. s12-s12

Author(s):

D. M. Hasibul Hasan ◽

Philip Polgreen ◽

Alberto Segre ◽

Jacob Simmering ◽

Sriram Pemmaraju

Keyword(s):

Time Of Day ◽

Microsoft Kinect ◽

Critical Parameters ◽

Model Parameters ◽

Patient Room ◽

University Of Iowa ◽

Velocity Change ◽

Fine Grained ◽

Visit Frequency ◽

Contact Patterns

Background: Simulations based on models of healthcare worker (HCW) mobility and contact patterns with patients provide a key tool for understanding spread of healthcare-acquired infections (HAIs). However, simulations suffer from lack of accurate model parameters. This research uses Microsoft Kinect cameras placed in a patient room in the medical intensive care unit (MICU) at the University of Iowa Hospitals and Clinics (UIHC) to obtain reliable distributions of HCW visit length and time spent by HCWs near a patient. These data can inform modeling efforts for understanding HAI spread. Methods: Three Kinect cameras (left, right, and door cameras) were placed in a patient room to track the human body (ie, left/right hands and head) at 30 frames per second. The results reported here are based on 7 randomly selected days from a total of 308 observation days. Each tracked body may have multiple raw segments over the 2 camera regions, which we “stitch” up by matching features (eg, direction, velocity, etc), to obtain complete trajectories. Due to camera noise, in a substantial fraction of the frames bodies display unnatural characteristics including frequent and rapid directional and velocity change. We use unsupervised learning techniques to identify such “ghost” frames and we remove from our analysis bodies that have 20% or more “ghost” frames. Results: The heat map of hand positions (Fig. 1) shows that high-frequency locations are clustered around the bed and more to the patient’s right in accordance with the general medical practice of performing patient exams from their right. HCW visit frequency per hour (mean, 6.952; SD, 2.855) has 2 peaks, 1 during morning shift and 1 during the afternoon shift, with a distinct decrease after midnight. Figure 2 shows visit length (in minutes) distribution (mean, 1.570; SD, 2.679) being dominated by “check in visits” of <30 seconds. HCWs do not spend much time at touching distance from patients during short-length visits, and the fraction of time spent near the patient’s bed seems to increase with visit length up to a point. Conclusions: Using fine-grained data, this research extracts distributions of these critical parameters of HCW–patient interactions: (1) HCW visit length, (2) HCW visit frequency as a function of time of day, and (3) time spent by HCW within touching distance of patient as a function of visit length. To the best of our knowledge, we provide the first reliable estimates of these parameters.Funding: NoneDisclosures: None

Download Full-text

Application of Artificial Neural Networks to Streamline the Process of Adaptive Cruise Control

Sustainability ◽

10.3390/su13084572 ◽

2021 ◽

Vol 13 (8) ◽

pp. 4572

Author(s):

Jiří David ◽

Pavel Brom ◽

František Starý ◽

Josef Bradáč ◽

Vojtěch Dynybyl

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Adaptive Cruise Control ◽

Control Unit ◽

Driving Safety ◽

Model Parameters ◽

Cruise Control ◽

Artificial Neural ◽

Dynamic Obstacle ◽

Braking Process

This article deals with the use of neural networks for estimation of deceleration model parameters for the adaptive cruise control unit. The article describes the basic functionality of adaptive cruise control and creates a mathematical model of braking, which is one of the basic functions of adaptive cruise control. Furthermore, an analysis of the influences acting in the braking process is performed, the most significant of which are used in the design of deceleration prediction for the adaptive cruise control unit using neural networks. Such a connection using artificial neural networks using modern sensors can be another step towards full vehicle autonomy. The advantage of this approach is the original use of neural networks, which refines the determination of the deceleration value of the vehicle in front of a static or dynamic obstacle, while including a number of influences that affect the braking process and thus increase driving safety.

Download Full-text

Trigonometric Inference Providing Learning in Deep Neural Networks

Applied Sciences ◽

10.3390/app11156704 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6704

Author(s):

Jingyong Cai ◽

Masashi Takemoto ◽

Yuming Qiu ◽

Hironori Nakajo

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Trigonometric Approximation ◽

Model Parameters ◽

Training Algorithms ◽

Activation Functions ◽

Classical Training ◽

Sum Formula

Despite being heavily used in the training of deep neural networks (DNNs), multipliers are resource-intensive and insufficient in many different scenarios. Previous discoveries have revealed the superiority when activation functions, such as the sigmoid, are calculated by shift-and-add operations, although they fail to remove multiplications in training altogether. In this paper, we propose an innovative approach that can convert all multiplications in the forward and backward inferences of DNNs into shift-and-add operations. Because the model parameters and backpropagated errors of a large DNN model are typically clustered around zero, these values can be approximated by their sine values. Multiplications between the weights and error signals are transferred to multiplications of their sine values, which are replaceable with simpler operations with the help of the product to sum formula. In addition, a rectified sine activation function is utilized for further converting layer inputs into sine values. In this way, the original multiplication-intensive operations can be computed through simple add-and-shift operations. This trigonometric approximation method provides an efficient training and inference alternative for devices with insufficient hardware multipliers. Experimental results demonstrate that this method is able to obtain a performance close to that of classical training algorithms. The approach we propose sheds new light on future hardware customization research for machine learning.

Download Full-text

Acceleration-aware Fine-grained Channel Pruning for Deep Neural Networks via Residual Gating

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/tcad.2021.3093835 ◽

2021 ◽

pp. 1-1

Author(s):

Kai Huang ◽

Siang Chen ◽

Bowen Li ◽

Luc Claesen ◽

Hao Yao ◽

...

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Fine Grained

Download Full-text