scholarly journals Examining the Causal Structures of Deep Neural Networks Using Information Theory

Entropy ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. 1429
Author(s):  
Scythia Marrow ◽  
Eric J. Michaud ◽  
Erik Hoel

Deep Neural Networks (DNNs) are often examined at the level of their response to input, such as analyzing the mutual information between nodes and data sets. Yet DNNs can also be examined at the level of causation, exploring “what does what” within the layers of the network itself. Historically, analyzing the causal structure of DNNs has received less attention than understanding their responses to input. Yet definitionally, generalizability must be a function of a DNN’s causal structure as it reflects how the DNN responds to unseen or even not-yet-defined future inputs. Here, we introduce a suite of metrics based on information theory to quantify and track changes in the causal structure of DNNs during training. Specifically, we introduce the effective information (EI) of a feedforward DNN, which is the mutual information between layer input and output following a maximum-entropy perturbation. The EI can be used to assess the degree of causal influence nodes and edges have over their downstream targets in each layer. We show that the EI can be further decomposed in order to examine the sensitivity of a layer (measured by how well edges transmit perturbations) and the degeneracy of a layer (measured by how edge overlap interferes with transmission), along with estimates of the amount of integrated information of a layer. Together, these properties define where each layer lies in the “causal plane”, which can be used to visualize how layer connectivity becomes more sensitive or degenerate over time, and how integration changes during training, revealing how the layer-by-layer causal structure differentiates. These results may help in understanding the generalization capabilities of DNNs and provide foundational tools for making DNNs both more generalizable and more explainable.

Entropy ◽  
2021 ◽  
Vol 23 (10) ◽  
pp. 1360
Author(s):  
Xin Du ◽  
Katayoun Farrahi ◽  
Mahesan Niranjan

In solving challenging pattern recognition problems, deep neural networks have shown excellent performance by forming powerful mappings between inputs and targets, learning representations (features) and making subsequent predictions. A recent tool to help understand how representations are formed is based on observing the dynamics of learning on an information plane using mutual information, linking the input to the representation (I(X;T)) and the representation to the target (I(T;Y)). In this paper, we use an information theoretical approach to understand how Cascade Learning (CL), a method to train deep neural networks layer-by-layer, learns representations, as CL has shown comparable results while saving computation and memory costs. We observe that performance is not linked to information–compression, which differs from observation on End-to-End (E2E) learning. Additionally, CL can inherit information about targets, and gradually specialise extracted features layer-by-layer. We evaluate this effect by proposing an information transition ratio, I(T;Y)/I(X;T), and show that it can serve as a useful heuristic in setting the depth of a neural network that achieves satisfactory accuracy of classification.


Author(s):  
Jingyuan Wang ◽  
Kai Feng ◽  
Junjie Wu

The deep network model, with the majority built on neural networks, has been proved to be a powerful framework to represent complex data for high performance machine learning. In recent years, more and more studies turn to nonneural network approaches to build diverse deep structures, and the Deep Stacking Network (DSN) model is one of such approaches that uses stacked easy-to-learn blocks to build a parameter-training-parallelizable deep network. In this paper, we propose a novel SVM-based Deep Stacking Network (SVM-DSN), which uses the DSN architecture to organize linear SVM classifiers for deep learning. A BP-like layer tuning scheme is also proposed to ensure holistic and local optimizations of stacked SVMs simultaneously. Some good math properties of SVM, such as the convex optimization, is introduced into the DSN framework by our model. From a global view, SVM-DSN can iteratively extract data representations layer by layer as a deep neural network but with parallelizability, and from a local view, each stacked SVM can converge to its optimal solution and obtain the support vectors, which compared with neural networks could lead to interesting improvements in anti-saturation and interpretability. Experimental results on both image and text data sets demonstrate the excellent performances of SVM-DSN compared with some competitive benchmark models.


2018 ◽  
Vol 15 (9) ◽  
pp. 1451-1455 ◽  
Author(s):  
Grant J. Scott ◽  
Kyle C. Hagan ◽  
Richard A. Marcum ◽  
James Alex Hurt ◽  
Derek T. Anderson ◽  
...  

Energies ◽  
2021 ◽  
Vol 14 (19) ◽  
pp. 6156
Author(s):  
Stefan Hensel ◽  
Marin B. Marinov ◽  
Michael Koch ◽  
Dimitar Arnaudov

This paper presents a systematic approach for accurate short-time cloud coverage prediction based on a machine learning (ML) approach. Based on a newly built omnidirectional ground-based sky camera system, local training and evaluation data sets were created. These were used to train several state-of-the-art deep neural networks for object detection and segmentation. For this purpose, the camera-generated a full hemispherical image every 30 min over two months in daylight conditions with a fish-eye lens. From this data set, a subset of images was selected for training and evaluation according to various criteria. Deep neural networks, based on the two-stage R-CNN architecture, were trained and compared with a U-net segmentation approach implemented by CloudSegNet. All chosen deep networks were then evaluated and compared according to the local situation.


2021 ◽  
Vol 12 ◽  
Author(s):  
Osval A. Montesinos-López ◽  
Abelardo Montesinos-López ◽  
Brandon A. Mosqueda-González ◽  
Alison R. Bentley ◽  
Morten Lillemo ◽  
...  

Genomic selection (GS) has the potential to revolutionize predictive plant breeding. A reference population is phenotyped and genotyped to train a statistical model that is used to perform genome-enabled predictions of new individuals that were only genotyped. In this vein, deep neural networks, are a type of machine learning model and have been widely adopted for use in GS studies, as they are not parametric methods, making them more adept at capturing nonlinear patterns. However, the training process for deep neural networks is very challenging due to the numerous hyper-parameters that need to be tuned, especially when imperfect tuning can result in biased predictions. In this paper we propose a simple method for calibrating (adjusting) the prediction of continuous response variables resulting from deep learning applications. We evaluated the proposed deep learning calibration method (DL_M2) using four crop breeding data sets and its performance was compared with the standard deep learning method (DL_M1), as well as the standard genomic Best Linear Unbiased Predictor (GBLUP). While the GBLUP was the most accurate model overall, the proposed deep learning calibration method (DL_M2) helped increase the genome-enabled prediction performance in all data sets when compared with the traditional DL method (DL_M1). Taken together, we provide evidence for extending the use of the proposed calibration method to evaluate its potential and consistency for predicting performance in the context of GS applied to plant breeding.


Author(s):  
Aydin Ayanzadeh ◽  
Sahand Vahidnia

In this paper, we leverage state of the art models on Imagenet data-sets. We use the pre-trained model and learned weighs to extract the feature from the Dog breeds identification data-set. Afterwards, we applied fine-tuning and dataaugmentation to increase the performance of our test accuracy in classification of dog breeds datasets. The performance of the proposed approaches are compared with the state of the art models of Image-Net datasets such as ResNet-50, DenseNet-121, DenseNet-169 and GoogleNet. we achieved 89.66% , 85.37% 84.01% and 82.08% test accuracy respectively which shows thesuperior performance of proposed method to the previous works on Stanford dog breeds datasets.


2019 ◽  
Vol 9 (2) ◽  
pp. 123-147 ◽  
Author(s):  
Ryotaro Kamimura

Abstract The present paper1 aims to propose a new type of information-theoretic method to maximize mutual information between inputs and outputs. The importance of mutual information in neural networks is well known, but the actual implementation of mutual information maximization has been quite difficult to undertake. In addition, mutual information has not extensively been used in neural networks, meaning that its applicability is very limited. To overcome the shortcoming of mutual information maximization, we present it here in a very simplified manner by supposing that mutual information is already maximized before learning, or at least at the beginning of learning. The method was applied to three data sets (crab data set, wholesale data set, and human resources data set) and examined in terms of generalization performance and connection weights. The results showed that by disentangling connection weights, maximizing mutual information made it possible to explicitly interpret the relations between inputs and outputs.


2020 ◽  
Vol 12 (20) ◽  
pp. 3358
Author(s):  
Vasileios Syrris ◽  
Ondrej Pesek ◽  
Pierre Soille

Automatic supervised classification with complex modelling such as deep neural networks requires the availability of representative training data sets. While there exists a plethora of data sets that can be used for this purpose, they are usually very heterogeneous and not interoperable. In this context, the present work has a twofold objective: (i) to describe procedures of open-source training data management, integration, and data retrieval, and (ii) to demonstrate the practical use of varying source training data for remote sensing image classification. For the former, we propose SatImNet, a collection of open training data, structured and harmonized according to specific rules. For the latter, two modelling approaches based on convolutional neural networks have been designed and configured to deal with satellite image classification and segmentation.


2020 ◽  
Vol 24 (01) ◽  
pp. 003-011 ◽  
Author(s):  
Narges Razavian ◽  
Florian Knoll ◽  
Krzysztof J. Geras

AbstractArtificial intelligence (AI) has made stunning progress in the last decade, made possible largely due to the advances in training deep neural networks with large data sets. Many of these solutions, initially developed for natural images, speech, or text, are now becoming successful in medical imaging. In this article we briefly summarize in an accessible way the current state of the field of AI. Furthermore, we highlight the most promising approaches and describe the current challenges that will need to be solved to enable broad deployment of AI in clinical practice.


Sign in / Sign up

Export Citation Format

Share Document