scholarly journals Explaining Neural Networks Using Attentive Knowledge Distillation

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1280
Author(s):  
Hyeonseok Lee ◽  
Sungchan Kim

Explaining the prediction of deep neural networks makes the networks more understandable and trusted, leading to their use in various mission critical tasks. Recent progress in the learning capability of networks has primarily been due to the enormous number of model parameters, so that it is usually hard to interpret their operations, as opposed to classical white-box models. For this purpose, generating saliency maps is a popular approach to identify the important input features used for the model prediction. Existing explanation methods typically only use the output of the last convolution layer of the model to generate a saliency map, lacking the information included in intermediate layers. Thus, the corresponding explanations are coarse and result in limited accuracy. Although the accuracy can be improved by iteratively developing a saliency map, this is too time-consuming and is thus impractical. To address these problems, we proposed a novel approach to explain the model prediction by developing an attentive surrogate network using the knowledge distillation. The surrogate network aims to generate a fine-grained saliency map corresponding to the model prediction using meaningful regional information presented over all network layers. Experiments demonstrated that the saliency maps are the result of spatially attentive features learned from the distillation. Thus, they are useful for fine-grained classification tasks. Moreover, the proposed method runs at the rate of 24.3 frames per second, which is much faster than the existing methods by orders of magnitude.

Entropy ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. 1365
Author(s):  
Bogdan Muşat ◽  
Răzvan Andonie

Convolutional neural networks utilize a hierarchy of neural network layers. The statistical aspects of information concentration in successive layers can bring an insight into the feature abstraction process. We analyze the saliency maps of these layers from the perspective of semiotics, also known as the study of signs and sign-using behavior. In computational semiotics, this aggregation operation (known as superization) is accompanied by a decrease of spatial entropy: signs are aggregated into supersign. Using spatial entropy, we compute the information content of the saliency maps and study the superization processes which take place between successive layers of the network. In our experiments, we visualize the superization process and show how the obtained knowledge can be used to explain the neural decision model. In addition, we attempt to optimize the architecture of the neural model employing a semiotic greedy technique. To the extent of our knowledge, this is the first application of computational semiotics in the analysis and interpretation of deep neural networks.


Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6011 ◽  
Author(s):  
Jan Steinbrener ◽  
Konstantin Posch ◽  
Jürgen Pilz

We present a novel approach for training deep neural networks in a Bayesian way. Compared to other Bayesian deep learning formulations, our approach allows for quantifying the uncertainty in model parameters while only adding very few additional parameters to be optimized. The proposed approach uses variational inference to approximate the intractable a posteriori distribution on basis of a normal prior. By representing the a posteriori uncertainty of the network parameters per network layer and depending on the estimated parameter expectation values, only very few additional parameters need to be optimized compared to a non-Bayesian network. We compare our approach to classical deep learning, Bernoulli dropout and Bayes by Backprop using the MNIST dataset. Compared to classical deep learning, the test error is reduced by 15%. We also show that the uncertainty information obtained can be used to calculate credible intervals for the network prediction and to optimize network architecture for the dataset at hand. To illustrate that our approach also scales to large networks and input vector sizes, we apply it to the GoogLeNet architecture on a custom dataset, achieving an average accuracy of 0.92. Using 95% credible intervals, all but one wrong classification result can be detected.


2020 ◽  
Author(s):  
Xiao Lin ◽  
Zhi-Jie Wang ◽  
Lizhuang Ma ◽  
Renjie Li ◽  
Mei-E Fang

Abstract Saliency detection has been a hot topic in the field of computer vision. In this paper, we propose a novel approach that is based on multiscale segmentation and fuzzy broad learning. The core idea of our method is to segment the image into different scales, and then the extracted features are fed to the fuzzy broad learning system (FBLS) for training. More specifically, it first segments the image into superpixel blocks at different scales based on the simple linear iterative clustering algorithm. Then, it uses the local binary pattern algorithm to extract texture features and computes the average color information for each superpixel of these segmentation images. These extracted features are then fed to the FBLS to obtain multiscale saliency maps. After that, it fuses these saliency maps into an initial saliency map and uses the label propagation algorithm to further optimize it, obtaining the final saliency map. We have conducted experiments based on several benchmark datasets. The results show that our solution can outperform several existing algorithms. Particularly, our method is significantly faster than most of deep learning-based saliency detection algorithms, in terms of training and inferring time.


2017 ◽  
Vol 6 (4) ◽  
pp. 15
Author(s):  
JANARDHAN CHIDADALA ◽  
RAMANAIAH K.V. ◽  
BABULU K ◽  
◽  
◽  
...  

2020 ◽  
Vol 41 (S1) ◽  
pp. s12-s12
Author(s):  
D. M. Hasibul Hasan ◽  
Philip Polgreen ◽  
Alberto Segre ◽  
Jacob Simmering ◽  
Sriram Pemmaraju

Background: Simulations based on models of healthcare worker (HCW) mobility and contact patterns with patients provide a key tool for understanding spread of healthcare-acquired infections (HAIs). However, simulations suffer from lack of accurate model parameters. This research uses Microsoft Kinect cameras placed in a patient room in the medical intensive care unit (MICU) at the University of Iowa Hospitals and Clinics (UIHC) to obtain reliable distributions of HCW visit length and time spent by HCWs near a patient. These data can inform modeling efforts for understanding HAI spread. Methods: Three Kinect cameras (left, right, and door cameras) were placed in a patient room to track the human body (ie, left/right hands and head) at 30 frames per second. The results reported here are based on 7 randomly selected days from a total of 308 observation days. Each tracked body may have multiple raw segments over the 2 camera regions, which we “stitch” up by matching features (eg, direction, velocity, etc), to obtain complete trajectories. Due to camera noise, in a substantial fraction of the frames bodies display unnatural characteristics including frequent and rapid directional and velocity change. We use unsupervised learning techniques to identify such “ghost” frames and we remove from our analysis bodies that have 20% or more “ghost” frames. Results: The heat map of hand positions (Fig. 1) shows that high-frequency locations are clustered around the bed and more to the patient’s right in accordance with the general medical practice of performing patient exams from their right. HCW visit frequency per hour (mean, 6.952; SD, 2.855) has 2 peaks, 1 during morning shift and 1 during the afternoon shift, with a distinct decrease after midnight. Figure 2 shows visit length (in minutes) distribution (mean, 1.570; SD, 2.679) being dominated by “check in visits” of <30 seconds. HCWs do not spend much time at touching distance from patients during short-length visits, and the fraction of time spent near the patient’s bed seems to increase with visit length up to a point. Conclusions: Using fine-grained data, this research extracts distributions of these critical parameters of HCW–patient interactions: (1) HCW visit length, (2) HCW visit frequency as a function of time of day, and (3) time spent by HCW within touching distance of patient as a function of visit length. To the best of our knowledge, we provide the first reliable estimates of these parameters.Funding: NoneDisclosures: None


2021 ◽  
Vol 13 (8) ◽  
pp. 4572
Author(s):  
Jiří David ◽  
Pavel Brom ◽  
František Starý ◽  
Josef Bradáč ◽  
Vojtěch Dynybyl

This article deals with the use of neural networks for estimation of deceleration model parameters for the adaptive cruise control unit. The article describes the basic functionality of adaptive cruise control and creates a mathematical model of braking, which is one of the basic functions of adaptive cruise control. Furthermore, an analysis of the influences acting in the braking process is performed, the most significant of which are used in the design of deceleration prediction for the adaptive cruise control unit using neural networks. Such a connection using artificial neural networks using modern sensors can be another step towards full vehicle autonomy. The advantage of this approach is the original use of neural networks, which refines the determination of the deceleration value of the vehicle in front of a static or dynamic obstacle, while including a number of influences that affect the braking process and thus increase driving safety.


2021 ◽  
Vol 11 (15) ◽  
pp. 6704
Author(s):  
Jingyong Cai ◽  
Masashi Takemoto ◽  
Yuming Qiu ◽  
Hironori Nakajo

Despite being heavily used in the training of deep neural networks (DNNs), multipliers are resource-intensive and insufficient in many different scenarios. Previous discoveries have revealed the superiority when activation functions, such as the sigmoid, are calculated by shift-and-add operations, although they fail to remove multiplications in training altogether. In this paper, we propose an innovative approach that can convert all multiplications in the forward and backward inferences of DNNs into shift-and-add operations. Because the model parameters and backpropagated errors of a large DNN model are typically clustered around zero, these values can be approximated by their sine values. Multiplications between the weights and error signals are transferred to multiplications of their sine values, which are replaceable with simpler operations with the help of the product to sum formula. In addition, a rectified sine activation function is utilized for further converting layer inputs into sine values. In this way, the original multiplication-intensive operations can be computed through simple add-and-shift operations. This trigonometric approximation method provides an efficient training and inference alternative for devices with insufficient hardware multipliers. Experimental results demonstrate that this method is able to obtain a performance close to that of classical training algorithms. The approach we propose sheds new light on future hardware customization research for machine learning.


Sign in / Sign up

Export Citation Format

Share Document