Probabilistic Autoencoder Using Fisher Information

Johannes Zacherl; Philipp Frank; Torsten A. Enßlin

doi:10.3390/e23121640

Probabilistic Autoencoder Using Fisher Information

Entropy ◽

10.3390/e23121640 ◽

2021 ◽

Vol 23 (12) ◽

pp. 1640

Author(s):

Johannes Zacherl ◽

Philipp Frank ◽

Torsten A. Enßlin

Keyword(s):

Neural Networks ◽

Fisher Information ◽

Point Of View ◽

Learning Performance ◽

Theoretical Point ◽

Data Set ◽

Essential Information ◽

Additional Information ◽

Latent Space ◽

Fisher Information Metric

Neural networks play a growing role in many scientific disciplines, including physics. Variational autoencoders (VAEs) are neural networks that are able to represent the essential information of a high dimensional data set in a low dimensional latent space, which have a probabilistic interpretation. In particular, the so-called encoder network, the first part of the VAE, which maps its input onto a position in latent space, additionally provides uncertainty information in terms of variance around this position. In this work, an extension to the autoencoder architecture is introduced, the FisherNet. In this architecture, the latent space uncertainty is not generated using an additional information channel in the encoder but derived from the decoder by means of the Fisher information metric. This architecture has advantages from a theoretical point of view as it provides a direct uncertainty quantification derived from the model and also accounts for uncertainty cross-correlations. We can show experimentally that the FisherNet produces more accurate data reconstructions than a comparable VAE and its learning performance also apparently scales better with the number of latent space dimensions.

Download Full-text

Deep neural network based on generalized neo-fuzzy neurons and its learning based on backpropagation

Artificial Intelligence ◽

10.15407/jai2021.01.032 ◽

2021 ◽

Vol 26 (jai2021.26(1)) ◽

pp. 32-41

Author(s):

Bodyanskiy Y ◽

◽

Antonenko T ◽

Keyword(s):

Neural Network ◽

Neural Networks ◽

Learning Process ◽

Activation Function ◽

Point Of View ◽

Basic Unit ◽

Theoretical Point ◽

Activation Functions ◽

Approximation Properties ◽

Network Training

Modern approaches in deep neural networks have a number of issues related to the learning process and computational costs. This article considers the architecture grounded on an alternative approach to the basic unit of the neural network. This approach achieves optimization in the calculations and gives rise to an alternative way to solve the problems of the vanishing and exploding gradient. The main issue of the article is the usage of the deep stacked neo-fuzzy system, which uses a generalized neo-fuzzy neuron to optimize the learning process. This approach is non-standard from a theoretical point of view, so the paper presents the necessary mathematical calculations and describes all the intricacies of using this architecture from a practical point of view. From a theoretical point, the network learning process is fully disclosed. Derived all necessary calculations for the use of the backpropagation algorithm for network training. A feature of the network is the rapid calculation of the derivative for the activation functions of neurons. This is achieved through the use of fuzzy membership functions. The paper shows that the derivative of such function is a constant, and this is a reason for the statement of increasing in the optimization rate in comparison with neural networks which use neurons with more common activation functions (ReLU, sigmoid). The paper highlights the main points that can be improved in further theoretical developments on this topic. In general, these issues are related to the calculation of the activation function. The proposed methods cope with these points and allow approximation using the network, but the authors already have theoretical justifications for improving the speed and approximation properties of the network. The results of the comparison of the proposed network with standard neural network architectures are shown

Download Full-text

Weisfeiler and Leman Go Neural: Higher-Order Graph Neural Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014602 ◽

2019 ◽

Vol 33 ◽

pp. 4602-4609 ◽

Cited By ~ 36

Author(s):

Christopher Morris ◽

Martin Ritzert ◽

Matthias Fey ◽

William L. Hamilton ◽

Jan Eric Lenssen ◽

...

Keyword(s):

Neural Networks ◽

Multiple Scales ◽

Graph Isomorphism ◽

Higher Order ◽

Point Of View ◽

Theoretical Point ◽

Graph Classification ◽

Graph Neural Networks ◽

Vector Representations

In recent years, graph neural networks (GNNs) have emerged as a powerful neural architecture to learn vector representations of nodes and graphs in a supervised, end-to-end fashion. Up to now, GNNs have only been evaluated empirically—showing promising results. The following work investigates GNNs from a theoretical point of view and relates them to the 1-dimensional Weisfeiler-Leman graph isomorphism heuristic (1-WL). We show that GNNs have the same expressiveness as the 1-WL in terms of distinguishing non-isomorphic (sub-)graphs. Hence, both algorithms also have the same shortcomings. Based on this, we propose a generalization of GNNs, so-called k-dimensional GNNs (k-GNNs), which can take higher-order graph structures at multiple scales into account. These higher-order structures play an essential role in the characterization of social networks and molecule graphs. Our experimental evaluation confirms our theoretical findings as well as confirms that higher-order information is useful in the task of graph classification and regression.

Download Full-text

Firm-Provided Training During the Great Recession

Jahrbücher für Nationalökonomie und Statistik ◽

10.1515/jbnst-2014-0103 ◽

2014 ◽

Vol 234 (1) ◽

Cited By ~ 7

Author(s):

Lutz Bellmann ◽

Hans-Dieter Gerner ◽

Ute Leber

Keyword(s):

Economic Crisis ◽

Great Recession ◽

Point Of View ◽

Theoretical Point ◽

Further Training ◽

Data Set ◽

The Great Recession ◽

Apprenticeship Training ◽

Training Activities ◽

The Impact

SummaryEven though the 2008/09 economic crisis had only minor employment effects on the German labor market, it might have affected firms’ further training and apprenticeship training behavior. From a theoretical point of view, the impact of the business cycle on firms’ training behaviour is ambiguous. There are reasons for an increase of training during a downturn (e.g., declining opportunity costs of training, fewer exit options for trained workers) as well as arguments for a decrease of training (e.g., uncertain future benefits of training). The existing empirical evidence on the relationship between training and economic downturns is relatively scarce. In particular, we are not aware of any empirical study investigating the effects of the most recent crisis on firms’ training activities in Germany. Our paper aims to fill this gap by using data from the IAB Establishment Panel, a representative German panel data set with annual information about almost 16,000 establishments. In particular, we analyzed the provision and the intensity of further training and apprenticeship training in firms which were affected by the crisis and in those which were not. Our empirical investigation revealed that the establishments, irrespective of whether or not they were hit by the economic crisis, decreased their further training and apprenticeship training efforts in 2009 compared to 2008. However, establishments directly affected by the great recession tended to reduce their training activities more often than those which were not affected. Furthermore, we found much stronger variations in the development of firms’ further training activities than in the development of their apprenticeship training.

Download Full-text

A data generator for covid-19 patients’ care requirements inside hospitals

WPOM - Working Papers on Operations Management ◽

10.4995/wpom.15332 ◽

2021 ◽

Vol 12 (1) ◽

pp. 76

Author(s):

Juan A. Marin-Garcia ◽

Angel Ruiz ◽

Maheut Julien ◽

Jose P. Garcia-Sabater

Keyword(s):

Medical Knowledge ◽

Population Data ◽

Real Data ◽

Health Service Delivery ◽

Point Of View ◽

Future Research ◽

Theoretical Point ◽

The European Union ◽

Data Set ◽

Data Generator

This paper presents the generation of a plausible data set related to the needs of COVID-19 patients with severe or critical symptoms. Possible illness’ stages were proposed within the context of medical knowledge as of January 2021. The parameters chosen in this data set were customized to fit the population data of the Valencia region (Spain) with approximately 2.5 million inhabitants. They were based on the evolution of the pandemic between September 2020 and March 2021, a period that included two complete waves of the pandemic.Contrary to expectation and despite the European and national transparency laws (BOE-A2013-12887, 2013; European Parliament and Council of the European Union, 2019), the actual COVID-19 pandemic-related data, at least in Spain, took considerable time to be updated and made available (usually a week or more). Moreover, some relevant data necessary to develop and validate hospital bed management models were not publicly accessible. This was either because these data were not collected, because public agencies failed to make them public (despite having them indexed in their databases), the data were processed within indicators and not shown as raw data, or they simply published the data in a format that was difficult to process (e.g., PDF image documents versus CSV tables). Despite the potential of hospital information systems, there were still data that were not adequately captured within these systems.Moreover, the data collected in a hospital depends on the strategies and practices specific to that hospital or health system. This limits the generalization of "real" data, and it encourages working with "realistic" or plausible data that are clean of interactions with local variables or decisions (Gunal, 2012; Marin-Garcia et al., 2020). Besides, one can parameterize the model and define the data structure that would be necessary to run the model without delaying till the real data become available. Conversely, plausible data sets can be generated from publicly available information and, later, when real data become available, the accuracy of the model can be evaluated (Garcia-Sabater and Maheut, 2021).This work opens lines of future research, both theoretical and practical. From a theoretical point of view, it would be interesting to develop machine learning tools that, by analyzing specific data samples in real hospitals, can identify the parameters necessary for the automatic prototyping of generators adapted to each hospital. Regarding the lines of research applied, it is evident that the formalism proposed for the generation of sound patients is not limited to patients affected by SARS-CoV-2 infection. The generation of heterogeneous patients can represent the needs of a specific population and serve as a basis for studying complex health service delivery systems.

Download Full-text

UNCERTAIN PROBABILITIES

Environment Technology Resources Proceedings of the International Scientific and Practical Conference ◽

10.17770/etr2003vol1.2020 ◽

2006 ◽

Vol 1 ◽

pp. 377

Author(s):

O. Uzhga-Rebrov

Keyword(s):

Second Order ◽

Point Of View ◽

Sufficient Information ◽

Uncertain Information ◽

Theoretical Point ◽

Knowledge Processing ◽

Additional Information ◽

Lack Of Information ◽

Uncertainty Representation ◽

Random Events

The uncertainty of probabilistic evaluations results from the lack of sufficient information and/or knowledge underlying those random events. Uncertainty representation in the form of second order probability distribution or interval evaluations does not cause any objections from the theoretical point of view. On the other hand, what is worthy in the second order probabilities is that they allow one to model a real uncertainty of subjective probabilistic evaluations resulting from the lack of information and/or knowledge. Processing of uncertain information regarding probabilistic evaluations can help make a validated decision about the collection of additional information aimed to remove completely or to reduce the existing uncertainty.

Download Full-text

A DEEP LEARNING METHOD FOR LATENT SPACE ANALYSIS OF MULTIPLE SEISMIC ATTRIBUTES

Interpretation ◽

10.1190/int-2020-0194.1 ◽

2021 ◽

pp. 1-40

Author(s):

Bradley C. Wallet ◽

Thang N. Ha

Keyword(s):

Deep Learning ◽

Goodness Of Fit ◽

Original Data ◽

Seismic Attributes ◽

Great Difficulty ◽

Mathematical Functions ◽

Data Set ◽

Essential Information ◽

Space Analysis ◽

Latent Space

Seismic attributes are a well-established method for highlighting subtle features in seismic data to improve interpretability and suitability for quantitative analysis. Seismic attributes are an enabling technology in such areas as thin-bed analysis, geobody extraction, and seismic geomorphology. Seismic attributes are mathematical functions of the data that are designed to exploit geologic and/or geophysical principles to provide meaningful information about underlying processes. Seismic attributes often suffer from an “abundance of riches” because the high dimensionality of seismic attributes may cause great difficulty in accomplishing even simple tasks. Spectral decomposition, for instance, typically produces tens and sometimes hundreds of attributes. However, when it comes to visualization, for instance, we are limited to visualizing three or at most four attributes simultaneously. We have developed a deep-learning-based approach to latent space analysis. This method is superior to other methods in that it focuses upon capturing essential information rather than just focusing upon probability density functions or clusters. Our method provides a quantitative way to assess the fit of the latent space to the original data. We apply our method to a data set from Canterbury Basin, New Zealand. This data set contains a turbidite system, and it has been the subject of several other papers. We examine the goodness of fit of our model by comparing the input data to what can be reproduced, and we provide an interpretation based upon our method.

Download Full-text

Efficient encoding of spectrotemporal information for bat echolocation

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009052 ◽

2021 ◽

Vol 17 (6) ◽

pp. e1009052

Author(s):

Adarsh Chitradurga Achutha ◽

Herbert Peremans ◽

Uwe Firzlaff ◽

Dieter Vanderelst

Keyword(s):

Neural Networks ◽

High Performance ◽

Experimental Conditions ◽

Data Set ◽

Essential Information ◽

Sonar System ◽

Outdoor Environments ◽

Low Dimensional ◽

Indoor And Outdoor Environments ◽

High Level

In most animals, natural stimuli are characterized by a high degree of redundancy, limiting the ensemble of ecologically valid stimuli to a significantly reduced subspace of the representation space. Neural encodings can exploit this redundancy and increase sensing efficiency by generating low-dimensional representations that retain all information essential to support behavior. In this study, we investigate whether such an efficient encoding can be found to support a broad range of echolocation tasks in bats. Starting from an ensemble of echo signals collected with a biomimetic sonar system in natural indoor and outdoor environments, we use independent component analysis to derive a low-dimensional encoding of the output of a cochlear model. We show that this compressive encoding retains all essential information. To this end, we simulate a range of psycho-acoustic experiments with bats. In these simulations, we train a set of neural networks to use the encoded echoes as input while performing the experiments. The results show that the neural networks’ performance is at least as good as that of the bats. We conclude that our results indicate that efficient encoding of echo information is feasible and, given its many advantages, very likely to be employed by bats. Previous studies have demonstrated that low-dimensional encodings allow for task resolution at a relatively high level. In contrast to previous work in this area, we show that high performance can also be achieved when low-dimensional filters are derived from a data set of realistic echo signals, not tailored to specific experimental conditions.

Download Full-text

Superintelligent Deep Learning Artificial Neural Networks

10.20944/preprints201912.0263.v1 ◽

2019 ◽

Author(s):

Jamilu Adamu

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Deep Learning ◽

Human Brain ◽

Point Of View ◽

Basic Unit ◽

Trial And Error ◽

Activation Functions ◽

Data Set ◽

Artificial Neural

Activation Functions are crucial parts of the Deep Learning Artificial Neural Networks. From the Biological point of view, a neuron is just a node with many inputs and one output. A neural network consists of many interconnected neurons. It is a “simple” device that receives data at the input and provides a response. The function of neurons is to process and transmit information; the neuron is the basic unit in the nervous system. Carly Vandergriendt (2018) stated the human brain at birth consists of an estimated 100 billion Neurons. The ability of a machine to mimic human intelligence is called Machine Learning. Deep Learning Artificial Neural Networks was designed to work like a human brain with the aid of arbitrary choice of Non-linear Activation Functions. Currently, there is no rule of thumb on the choice of Activation Functions, “Try out different things and see what combinations lead to the best performance”, however, sincerely; the choice of Activation Functions should not be Trial and error. Jamilu (2019) proposed that Activation Functions shall be emanated from AI-ML-Purified Data Set and its choice shall satisfy Jameel’s ANNAF Stochastic and or Deterministic Criterion. The objectives of this paper are to propose instances where Deep Learning Artificial Neural Networks are SUPERINTELLIGENT. Using Jameel’s ANNAF Stochastic and or Deterministic Criterion, the paper proposed four classes where Deep Learning Artificial Neural Networks are Superintelligent namely; Stochastic Superintelligent, Deterministic Superintelligent, and Stochastic-Deterministic 1st and 2nd Levels Superintelligence. Also, a Normal Probabilistic-Deterministic case was proposed.

Download Full-text

Pathological Spectra of the Fisher Information Metric and Its Variants in Deep Neural Networks

Neural Computation ◽

10.1162/neco_a_01411 ◽

2021 ◽

pp. 1-34

Author(s):

Ryo Karakida ◽

Shotaro Akaho ◽

Shun-ichi Amari

Keyword(s):

Neural Networks ◽

Sample Size ◽

Fisher Information ◽

Large Scale ◽

Deep Neural Networks ◽

Hessian Matrix ◽

Information Matrix ◽

Signal Propagation ◽

Metric Tensor ◽

Fisher Information Metric

Abstract The Fisher information matrix (FIM) plays an essential role in statistics and machine learning as a Riemannian metric tensor or a component of the Hessian matrix of loss functions. Focusing on the FIM and its variants in deep neural networks (DNNs), we reveal their characteristic scale dependence on the network width, depth, and sample size when the network has random weights and is sufficiently wide. This study covers two widely used FIMs for regression with linear output and for classification with softmax output. Both FIMs asymptotically show pathological eigenvalue spectra in the sense that a small number of eigenvalues become large outliers depending on the width or sample size, while the others are much smaller. It implies that the local shape of the parameter space or loss landscape is very sharp in a few specific directions while almost flat in the other directions. In particular, the softmax output disperses the outliers and makes a tail of the eigenvalue density spread from the bulk. We also show that pathological spectra appear in other variants of FIMs: one is the neural tangent kernel; another is a metric for the input signal and feature space that arises from feedforward signal propagation. Thus, we provide a unified perspective on the FIM and its variants that will lead to more quantitative understanding of learning in large-scale DNNs.

Download Full-text

Joint Discussion 11: The Lithium Problem Introduction

Highlights of Astronomy ◽

10.1017/s1539299600011631 ◽

1995 ◽

Vol 10 ◽

pp. 437-438

Author(s):

F. Spite

Keyword(s):

Difficult Problem ◽

Point Of View ◽

General Assembly ◽

Theoretical Point ◽

Essential Information ◽

Primordial Nucleosynthesis ◽

Points Of Interest ◽

The World ◽

Interstellar Material ◽

Made In

The exact title of the JD 11 was: “stellar and interstellar lithium and primordial nucleosynthesis”. The large amount of work recently done about lithium provided an incentive for a discussion among the members of several commissions of the IAU. Lithium is a peculiar element. Since it is not produced in supernovae (at least such a production is not proven and would be quite different from the production of the other elements) its presence in old material is a legacy of the primordial nucleosynthesis. But lithium is a fragile element, and from a theoretical point of view, there are arguments tending to conclude that, in old stars, this legacy has been depleted. This difficult problem is areal challenge, an has been the motivation for many different works. The analysis of the lithium behavior in well known stars, of all kind of ages, metallicities, structures, peculiarities etc. is therefore extremely useful in order to understand the physicalprocesse at work for lithium depletion, and the reader will find here many up-to-date data. The analysis of lithium in interstellar material provides an essential information. Many works about lithium are in progress throughout the world on the different points of interest, so that the General Assembly of the IAU was an excellent occasion to have a review of the recent progresses made in different areas. A summary of (nearly) each communication made during the Joint Discussion 11 may be found hereafter. Longer summaries of the talks, and summaries of the posters will be published in a forthcoming volume of the Memorie of the Società Astronomica Italiana.

Download Full-text