Examining the Causal Structures of Deep Neural Networks Using Information Theory

Scythia Marrow; Eric J. Michaud; Erik Hoel

doi:10.3390/e22121429

Examining the Causal Structures of Deep Neural Networks Using Information Theory

Entropy ◽

10.3390/e22121429 ◽

2020 ◽

Vol 22 (12) ◽

pp. 1429

Author(s):

Scythia Marrow ◽

Eric J. Michaud ◽

Erik Hoel

Keyword(s):

Neural Networks ◽

Information Theory ◽

Mutual Information ◽

Deep Neural Networks ◽

Causal Structure ◽

Data Sets ◽

Layer By Layer ◽

Input And Output ◽

Causal Structures ◽

Entropy Perturbation

Deep Neural Networks (DNNs) are often examined at the level of their response to input, such as analyzing the mutual information between nodes and data sets. Yet DNNs can also be examined at the level of causation, exploring “what does what” within the layers of the network itself. Historically, analyzing the causal structure of DNNs has received less attention than understanding their responses to input. Yet definitionally, generalizability must be a function of a DNN’s causal structure as it reflects how the DNN responds to unseen or even not-yet-defined future inputs. Here, we introduce a suite of metrics based on information theory to quantify and track changes in the causal structure of DNNs during training. Specifically, we introduce the effective information (EI) of a feedforward DNN, which is the mutual information between layer input and output following a maximum-entropy perturbation. The EI can be used to assess the degree of causal influence nodes and edges have over their downstream targets in each layer. We show that the EI can be further decomposed in order to examine the sensitivity of a layer (measured by how well edges transmit perturbations) and the degeneracy of a layer (measured by how edge overlap interferes with transmission), along with estimates of the amount of integrated information of a layer. Together, these properties define where each layer lies in the “causal plane”, which can be used to visualize how layer connectivity becomes more sensitive or degenerate over time, and how integration changes during training, revealing how the layer-by-layer causal structure differentiates. These results may help in understanding the generalization capabilities of DNNs and provide foundational tools for making DNNs both more generalizable and more explainable.

Download Full-text

Information Bottleneck Theory Based Exploration of Cascade Learning

Entropy ◽

10.3390/e23101360 ◽

2021 ◽

Vol 23 (10) ◽

pp. 1360

Author(s):

Xin Du ◽

Katayoun Farrahi ◽

Mahesan Niranjan

Keyword(s):

Neural Network ◽

Neural Networks ◽

Pattern Recognition ◽

Mutual Information ◽

Theoretical Approach ◽

Deep Neural Networks ◽

Layer By Layer ◽

Information Compression ◽

Excellent Performance ◽

Information Bottleneck

In solving challenging pattern recognition problems, deep neural networks have shown excellent performance by forming powerful mappings between inputs and targets, learning representations (features) and making subsequent predictions. A recent tool to help understand how representations are formed is based on observing the dynamics of learning on an information plane using mutual information, linking the input to the representation (I(X;T)) and the representation to the target (I(T;Y)). In this paper, we use an information theoretical approach to understand how Cascade Learning (CL), a method to train deep neural networks layer-by-layer, learns representations, as CL has shown comparable results while saving computation and memory costs. We observe that performance is not linked to information–compression, which differs from observation on End-to-End (E2E) learning. Additionally, CL can inherit information about targets, and gradually specialise extracted features layer-by-layer. We evaluate this effect by proposing an information transition ratio, I(T;Y)/I(X;T), and show that it can serve as a useful heuristic in setting the depth of a neural network that achieves satisfactory accuracy of classification.

Download Full-text

SVM-Based Deep Stacking Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015273 ◽

2019 ◽

Vol 33 ◽

pp. 5273-5280 ◽

Cited By ~ 2

Author(s):

Jingyuan Wang ◽

Kai Feng ◽

Junjie Wu

Keyword(s):

Neural Networks ◽

High Performance ◽

Optimal Solution ◽

Data Sets ◽

Complex Data ◽

Layer By Layer ◽

Deep Network ◽

Local View ◽

Linear Svm ◽

Network Approaches

The deep network model, with the majority built on neural networks, has been proved to be a powerful framework to represent complex data for high performance machine learning. In recent years, more and more studies turn to nonneural network approaches to build diverse deep structures, and the Deep Stacking Network (DSN) model is one of such approaches that uses stacked easy-to-learn blocks to build a parameter-training-parallelizable deep network. In this paper, we propose a novel SVM-based Deep Stacking Network (SVM-DSN), which uses the DSN architecture to organize linear SVM classifiers for deep learning. A BP-like layer tuning scheme is also proposed to ensure holistic and local optimizations of stacked SVMs simultaneously. Some good math properties of SVM, such as the convex optimization, is introduced into the DSN framework by our model. From a global view, SVM-DSN can iteratively extract data representations layer by layer as a deep neural network but with parallelizability, and from a local view, each stacked SVM can converge to its optimal solution and obtain the support vectors, which compared with neural networks could lead to interesting improvements in anti-saturation and interpretability. Experimental results on both image and text data sets demonstrate the excellent performances of SVM-DSN compared with some competitive benchmark models.

Download Full-text

Enhanced Fusion of Deep Neural Networks for Classification of Benchmark High-Resolution Image Data Sets

IEEE Geoscience and Remote Sensing Letters ◽

10.1109/lgrs.2018.2839092 ◽

2018 ◽

Vol 15 (9) ◽

pp. 1451-1455 ◽

Cited By ~ 23

Author(s):

Grant J. Scott ◽

Kyle C. Hagan ◽

Richard A. Marcum ◽

James Alex Hurt ◽

Derek T. Anderson ◽

...

Keyword(s):

Neural Networks ◽

High Resolution ◽

Deep Neural Networks ◽

Image Data ◽

Data Sets ◽

Resolution Image ◽

High Resolution Image

Download Full-text

Why Dose Layer-by-Layer Pre-training Improve Deep Neural Networks Learning?

Handbook of Deep Learning Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-3-030-11479-4_13 ◽

2019 ◽

pp. 293-318

Author(s):

Seyyede Zohreh Seyyedsalehi ◽

Seyyed Ali Seyyedsalehi

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Layer By Layer

Download Full-text

Evaluation of Deep Learning-Based Neural Network Methods for Cloud Detection and Segmentation

Energies ◽

10.3390/en14196156 ◽

2021 ◽

Vol 14 (19) ◽

pp. 6156

Author(s):

Stefan Hensel ◽

Marin B. Marinov ◽

Michael Koch ◽

Dimitar Arnaudov

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Data Sets ◽

Cloud Detection ◽

Camera System ◽

Data Set ◽

Network Methods ◽

Segmentation Approach ◽

Coverage Prediction ◽

Short Time

This paper presents a systematic approach for accurate short-time cloud coverage prediction based on a machine learning (ML) approach. Based on a newly built omnidirectional ground-based sky camera system, local training and evaluation data sets were created. These were used to train several state-of-the-art deep neural networks for object detection and segmentation. For this purpose, the camera-generated a full hemispherical image every 30 min over two months in daylight conditions with a fish-eye lens. From this data set, a subset of images was selected for training and evaluation according to various criteria. Deep neural networks, based on the two-stage R-CNN architecture, were trained and compared with a U-net segmentation approach implemented by CloudSegNet. All chosen deep networks were then evaluated and compared according to the local situation.

Download Full-text

A New Deep Learning Calibration Method Enhances Genome-Based Prediction of Continuous Crop Traits

Frontiers in Genetics ◽

10.3389/fgene.2021.798840 ◽

2021 ◽

Vol 12 ◽

Author(s):

Osval A. Montesinos-López ◽

Abelardo Montesinos-López ◽

Brandon A. Mosqueda-González ◽

Alison R. Bentley ◽

Morten Lillemo ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Plant Breeding ◽

Deep Neural Networks ◽

Reference Population ◽

Calibration Method ◽

Data Sets ◽

Simple Method ◽

Continuous Response ◽

Predicting Performance

Genomic selection (GS) has the potential to revolutionize predictive plant breeding. A reference population is phenotyped and genotyped to train a statistical model that is used to perform genome-enabled predictions of new individuals that were only genotyped. In this vein, deep neural networks, are a type of machine learning model and have been widely adopted for use in GS studies, as they are not parametric methods, making them more adept at capturing nonlinear patterns. However, the training process for deep neural networks is very challenging due to the numerous hyper-parameters that need to be tuned, especially when imperfect tuning can result in biased predictions. In this paper we propose a simple method for calibrating (adjusting) the prediction of continuous response variables resulting from deep learning applications. We evaluated the proposed deep learning calibration method (DL_M2) using four crop breeding data sets and its performance was compared with the standard deep learning method (DL_M1), as well as the standard genomic Best Linear Unbiased Predictor (GBLUP). While the GBLUP was the most accurate model overall, the proposed deep learning calibration method (DL_M2) helped increase the genome-enabled prediction performance in all data sets when compared with the traditional DL method (DL_M1). Taken together, we provide evidence for extending the use of the proposed calibration method to evaluate its potential and consistency for predicting performance in the context of GS applied to plant breeding.

Download Full-text

Modified Deep Neural Networks for Dog Breeds Identification

10.20944/preprints201812.0232.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Aydin Ayanzadeh ◽

Sahand Vahidnia

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

The State ◽

Fine Tuning ◽

Test Accuracy ◽

Data Sets ◽

Data Set

In this paper, we leverage state of the art models on Imagenet data-sets. We use the pre-trained model and learned weighs to extract the feature from the Dog breeds identification data-set. Afterwards, we applied fine-tuning and dataaugmentation to increase the performance of our test accuracy in classification of dog breeds datasets. The performance of the proposed approaches are compared with the state of the art models of Image-Net datasets such as ResNet-50, DenseNet-121, DenseNet-169 and GoogleNet. we achieved 89.66% , 85.37% 84.01% and 82.08% test accuracy respectively which shows thesuperior performance of proposed method to the previous works on Stanford dog breeds datasets.

Download Full-text

Supposed Maximum Mutual Information for Improving Generalization and Interpretation of Multi-Layered Neural Networks

Journal of Artificial Intelligence and Soft Computing Research ◽

10.2478/jaiscr-2018-0029 ◽

2019 ◽

Vol 9 (2) ◽

pp. 123-147 ◽

Cited By ~ 5

Author(s):

Ryotaro Kamimura

Keyword(s):

Neural Networks ◽

Mutual Information ◽

Data Sets ◽

Data Set ◽

Information Theoretic ◽

Information Maximization ◽

Maximum Mutual Information ◽

Information Theoretic Method ◽

Mutual Information Maximization ◽

Inputs And Outputs

Abstract The present paper1 aims to propose a new type of information-theoretic method to maximize mutual information between inputs and outputs. The importance of mutual information in neural networks is well known, but the actual implementation of mutual information maximization has been quite difficult to undertake. In addition, mutual information has not extensively been used in neural networks, meaning that its applicability is very limited. To overcome the shortcoming of mutual information maximization, we present it here in a very simplified manner by supposing that mutual information is already maximized before learning, or at least at the beginning of learning. The method was applied to three data sets (crab data set, wholesale data set, and human resources data set) and examined in terms of generalization performance and connection weights. The results showed that by disentangling connection weights, maximizing mutual information made it possible to explicitly interpret the relations between inputs and outputs.

Download Full-text

SatImNet: Structured and Harmonised Training Data for Enhanced Satellite Imagery Classification

Remote Sensing ◽

10.3390/rs12203358 ◽

2020 ◽

Vol 12 (20) ◽

pp. 3358

Author(s):

Vasileios Syrris ◽

Ondrej Pesek ◽

Pierre Soille

Keyword(s):

Neural Networks ◽

Image Classification ◽

Supervised Classification ◽

Deep Neural Networks ◽

Satellite Image ◽

Data Retrieval ◽

Remote Sensing Image ◽

Training Data ◽

Data Sets ◽

Remote Sensing Image Classification

Automatic supervised classification with complex modelling such as deep neural networks requires the availability of representative training data sets. While there exists a plethora of data sets that can be used for this purpose, they are usually very heterogeneous and not interoperable. In this context, the present work has a twofold objective: (i) to describe procedures of open-source training data management, integration, and data retrieval, and (ii) to demonstrate the practical use of varying source training data for remote sensing image classification. For the former, we propose SatImNet, a collection of open training data, structured and harmonized according to specific rules. For the latter, two modelling approaches based on convolutional neural networks have been designed and configured to deal with satellite image classification and segmentation.

Download Full-text

Artificial Intelligence Explained for Nonexperts

Seminars in Musculoskeletal Radiology ◽

10.1055/s-0039-3401041 ◽

2020 ◽

Vol 24 (01) ◽

pp. 003-011 ◽

Cited By ~ 1

Author(s):

Narges Razavian ◽

Florian Knoll ◽

Krzysztof J. Geras

Keyword(s):

Artificial Intelligence ◽

Neural Networks ◽

Clinical Practice ◽

Medical Imaging ◽

Deep Neural Networks ◽

Large Data ◽

Large Data Sets ◽

Natural Images ◽

Data Sets ◽

Current State

AbstractArtificial intelligence (AI) has made stunning progress in the last decade, made possible largely due to the advances in training deep neural networks with large data sets. Many of these solutions, initially developed for natural images, speech, or text, are now becoming successful in medical imaging. In this article we briefly summarize in an accessible way the current state of the field of AI. Furthermore, we highlight the most promising approaches and describe the current challenges that will need to be solved to enable broad deployment of AI in clinical practice.

Download Full-text