Variational Information Bottleneck for Unsupervised Clustering: Deep Gaussian Mixture Embedding

Yiğit Uğur; George Arvanitakis; Abdellatif Zaidi

doi:10.3390/e22020213

Variational Information Bottleneck for Unsupervised Clustering: Deep Gaussian Mixture Embedding

Entropy ◽

10.3390/e22020213 ◽

2020 ◽

Vol 22 (2) ◽

pp. 213 ◽

Cited By ~ 1

Author(s):

Yiğit Uğur ◽

George Arvanitakis ◽

Abdellatif Zaidi

Keyword(s):

Neural Networks ◽

Lower Bound ◽

Gradient Descent ◽

Gaussian Mixture ◽

Variational Inference ◽

Stochastic Gradient Descent ◽

Information Bottleneck ◽

Latent Space ◽

Type Algorithm ◽

The Cost

In this paper, we develop an unsupervised generative clustering framework that combines the variational information bottleneck and the Gaussian mixture model. Specifically, in our approach, we use the variational information bottleneck method and model the latent space as a mixture of Gaussians. We derive a bound on the cost function of our model that generalizes the Evidence Lower Bound (ELBO) and provide a variational inference type algorithm that allows computing it. In the algorithm, the coders’ mappings are parametrized using neural networks, and the bound is approximated by Markov sampling and optimized with stochastic gradient descent. Numerical results on real datasets are provided to support the efficiency of our method.

Download Full-text

Optical Recognition of Handwritten Logic Formulas Using Neural Networks

Electronics ◽

10.3390/electronics10222761 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2761

Author(s):

Vaios Ampelakiotis ◽

Isidoros Perikos ◽

Ioannis Hatzilygeroudis ◽

George Tsihrintzis

Keyword(s):

Neural Networks ◽

Character Recognition ◽

Gradient Descent ◽

Feedforward Neural Networks ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Training Algorithms ◽

Gradient Descent Algorithm ◽

Two Stages ◽

And Training

In this paper, we present a handwritten character recognition (HCR) system that aims to recognize first-order logic handwritten formulas and create editable text files of the recognized formulas. Dense feedforward neural networks (NNs) are utilized, and their performance is examined under various training conditions and methods. More specifically, after three training algorithms (backpropagation, resilient propagation and stochastic gradient descent) had been tested, we created and trained an NN with the stochastic gradient descent algorithm, optimized by the Adam update rule, which was proved to be the best, using a trainset of 16,750 handwritten image samples of 28 × 28 each and a testset of 7947 samples. The final accuracy achieved is 90.13%. The general methodology followed consists of two stages: the image processing and the NN design and training. Finally, an application has been created that implements the methodology and automatically recognizes handwritten logic formulas. An interesting feature of the application is that it allows for creating new, user-oriented training sets and parameter settings, and thus new NN models.

Download Full-text

A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization

Stochastic Systems ◽

10.1287/stsy.2021.0083 ◽

2021 ◽

Author(s):

Tianyi Liu ◽

Zhehui Chen ◽

Enlu Zhou ◽

Tuo Zhao

Keyword(s):

Neural Networks ◽

Nonconvex Optimization ◽

Gradient Descent ◽

Deep Neural Networks ◽

Optimization Problems ◽

Saddle Points ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Nonconvex Optimization Problems ◽

Empirical Success

Momentum stochastic gradient descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning (e.g., training deep neural networks, variational Bayesian inference, etc.). Despite its empirical success, there is still a lack of theoretical understanding of convergence properties of MSGD. To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima. Our study shows that the momentum helps escape from saddle points but hurts the convergence within the neighborhood of optima (if without the step size annealing or momentum annealing). Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.

Download Full-text

Layer-Wise Compressive Training for Convolutional Neural Networks

Future Internet ◽

10.3390/fi11010007 ◽

2018 ◽

Vol 11 (1) ◽

pp. 7 ◽

Cited By ~ 3

Author(s):

Matteo Grimaldi ◽

Valerio Tenace ◽

Andrea Calimera

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Gradient Descent ◽

Computational Models ◽

Stochastic Gradient Descent ◽

Training Algorithm ◽

Heuristic Rules ◽

Human Capabilities ◽

Model Size ◽

Large Model

Convolutional Neural Networks (CNNs) are brain-inspired computational models designed to recognize patterns. Recent advances demonstrate that CNNs are able to achieve, and often exceed, human capabilities in many application domains. Made of several millions of parameters, even the simplest CNN shows large model size. This characteristic is a serious concern for the deployment on resource-constrained embedded-systems, where compression stages are needed to meet the stringent hardware constraints. In this paper, we introduce a novel accuracy-driven compressive training algorithm. It consists of a two-stage flow: first, layers are sorted by means of heuristic rules according to their significance; second, a modified stochastic gradient descent optimization is applied on less significant layers such that their representation is collapsed into a constrained subspace. Experimental results demonstrate that our approach achieves remarkable compression rates with low accuracy loss (<1%).

Download Full-text

Regularized Instance Embedding for Deep Multi-Instance Learning

Applied Sciences ◽

10.3390/app10010064 ◽

2019 ◽

Vol 10 (1) ◽

pp. 64

Author(s):

Yi Lin ◽

Honggang Zhang

Keyword(s):

Neural Network ◽

Big Data ◽

Supervised Learning ◽

Regularization Method ◽

Gradient Descent ◽

State Of The Art ◽

Stochastic Gradient Descent ◽

Learning Framework ◽

Weakly Supervised ◽

The Cost

In the era of Big Data, multi-instance learning, as a weakly supervised learning framework, has various applications since it is helpful to reduce the cost of the data-labeling process. Due to this weakly supervised setting, learning effective instance representation/embedding is challenging. To address this issue, we propose an instance-embedding regularizer that can boost the performance of both instance- and bag-embedding learning in a unified fashion. Specifically, the crux of the instance-embedding regularizer is to maximize correlation between instance-embedding and underlying instance-label similarities. The embedding-learning framework was implemented using a neural network and optimized in an end-to-end manner using stochastic gradient descent. In experiments, various applications were studied, and the results show that the proposed instance-embedding-regularization method is highly effective, having state-of-the-art performance.

Download Full-text

Application and Need-Based Architecture Design of Deep Neural Networks

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800142052014x ◽

2020 ◽

Vol 34 (13) ◽

pp. 2052014 ◽

Cited By ~ 1

Author(s):

Soniya ◽

Sandeep Paul ◽

Lotika Singh

Keyword(s):

Genetic Algorithm ◽

Network Structure ◽

Gradient Descent ◽

Stochastic Gradient Descent ◽

Number Of Layers ◽

Effective Manner ◽

Compact Genetic Algorithm ◽

Benchmark Datasets ◽

The Cost ◽

Optimal Set

This paper applies a hybrid evolutionary approach to a convolutional neural network (CNN) and determines the number of layers and filters based on the application and user need. It integrates compact genetic algorithm with stochastic gradient descent (SGD) for simultaneously evolving structure and parameters of the CNN. It defines an effectual string representation for combining structure and parameters of the CNN. The compact genetic algorithm helps in the evolution of network structure by optimizing the number of convolutional layers and number of filters in each convolutional layer. At the same time, an optimal set of weight parameters of the network is obtained using the SGD law. This approach amalgamates exploration in network space by compact genetic algorithm and exploitation in weight space with SGD in an effective manner. The proposed approach also incorporates user-defined parameters in the cost function in an elegant manner which controls the network structure and hence the performance of the network based on the users need. The effectiveness of the proposed approach has been demonstrated on four benchmark datasets, namely MNIST, COIL-100, CIFAR-10 and CIFAR-100. The obtained results clearly demonstrate the potential of the proposed approach by evolving architectures based on the nature of the application and the need of the user.

Download Full-text

Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2020.0334 ◽

2020 ◽

Vol 476 (2239) ◽

pp. 20200334 ◽

Cited By ~ 2

Author(s):

Ameya D. Jagtap ◽

Kenji Kawaguchi ◽

George Em Karniadakis

Keyword(s):

Neural Networks ◽

Adaptive Learning ◽

Gradient Descent ◽

Activation Function ◽

Stochastic Gradient Descent ◽

Activation Functions ◽

Gradient Descent Algorithm ◽

Locally Adaptive ◽

The Matrix ◽

Base Method

We propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing a scalable parameter in each layer (layer-wise) and for every neuron (neuron-wise) separately, and then optimizing it using a variant of stochastic gradient descent algorithm. In order to further increase the training speed, an activation slope-based slope recovery term is added in the loss function, which further accelerates convergence, thereby reducing the training cost. On the theoretical side, we prove that in the proposed method, the gradient descent algorithms are not attracted to sub-optimal critical points or local minima under practical conditions on the initialization and learning rate, and that the gradient dynamics of the proposed method is not achievable by base methods with any (adaptive) learning rates. We further show that the adaptive activation methods accelerate the convergence by implicitly multiplying conditioning matrices to the gradient of the base method without any explicit computation of the conditioning matrix and the matrix–vector product. The different adaptive activation functions are shown to induce different implicit conditioning matrices. Furthermore, the proposed methods with the slope recovery are shown to accelerate the training process.

Download Full-text

Decoding Photons: Physics in the Latent Space of a BIB-AE Generative Network

EPJ Web of Conferences ◽

10.1051/epjconf/202125103003 ◽

2021 ◽

Vol 251 ◽

pp. 03003

Author(s):

Erik Buhmann ◽

Sascha Diefenbacher ◽

Engin Eren ◽

Frank Gaede ◽

Gregor Kasieczka ◽

...

Keyword(s):

Neural Networks ◽

Data Collection ◽

High Accuracy ◽

Fast Simulation ◽

Information Bottleneck ◽

Latent Space ◽

Future Collider

Given the increasing data collection capabilities and limited computing resources of future collider experiments, interest in using generative neural networks for the fast simulation of collider events is growing. In our previous study, the Bounded Information Bottleneck Autoencoder (BIB-AE) architecture for generating photon showers in a high-granularity calorimeter showed a high accuracy modeling of various global differential shower distributions. In this work, we investigate how the BIB-AE encodes this physics information in its latent space. Our understanding of this encoding allows us to propose methods to optimize the generation performance further, for example, by altering latent space sampling or by suggesting specific changes to hyperparameters. In particular, we improve the modeling of the shower shape along the particle incident axis.

Download Full-text

Deep Convolutional Spiking Neural Networks for Image Classification

10.18122/td.1782.boisestate ◽

2021 ◽

Author(s):

Ruthvik Vaila

Keyword(s):

Neural Network ◽

Neural Networks ◽

Artificial Neural Networks ◽

Gradient Descent ◽

Stochastic Gradient ◽

Spiking Neural Networks ◽

Stochastic Gradient Descent ◽

Data Set ◽

Learning Capabilities ◽

Artificial Neural

Spiking neural networks are biologically plausible counterparts of artificial neural networks. Artificial neural networks are usually trained with stochastic gradient descent (SGD) and spiking neural networks are trained with bioinspired spike timing dependent plasticity (STDP). Spiking networks could potentially help in reducing power usage owing to their binary activations. In this work, we use unsupervised STDP in the feature extraction layers of a neural network with instantaneous neurons to extract meaningful features. The extracted binary feature vectors are then classified using classification layers containing neurons with binary activations. Gradient descent (backpropagation) is used only on the output layer to perform training for classification. Surrogate gradients are proposed to perform backpropagation with binary gradients. The accuracies obtained for MNIST and the balanced EMNIST data set compare favorably with other approaches. The effect of the stochastic gradient descent (SGD) approximations on learning capabilities of our network are also explored. We also studied catastrophic forgetting and its effect on spiking neural networks (SNNs). For the experiments regarding catastrophic forgetting, in the classification sections of the network we use a modified synaptic intelligence that we refer to as cost per synapse metric as a regularizer to immunize the network against catastrophic forgetting in a Single-Incremental-Task scenario (SIT). In catastrophic forgetting experiments, we use MNIST and EMNIST handwritten digits datasets that were divided into five and ten incremental subtasks respectively. We also examine behavior of the spiking neural network and empirically study the effect of various hyperparameters on its learning capabilities using the software tool SPYKEFLOW that we developed. We employ MNIST, EMNIST and NMNIST data sets to produce our results.

Download Full-text

THE USE OF CONTROL THEORY METHODS IN TRAINING NEURAL NETWORKS ON THE EXAMPLE OF TEETH RECOGNITION ON PANORAMIC X-RAY IMAGES

Automation technological and business processes ◽

10.15673/atbp.v13i2.2055 ◽

2021 ◽

Vol 13 (2) ◽

pp. 36-40

Author(s):

A. Smorodin

Keyword(s):

Neural Networks ◽

Control Theory ◽

Gradient Descent ◽

Deep Neural Networks ◽

Discrete Dynamical System ◽

Stochastic Gradient Descent ◽

Network Training ◽

Panoramic Images ◽

Important Field ◽

New Algorithms

The article investigated a modification of stochastic gradient descent (SGD), based on the previously developed stabilization theory of discrete dynamical system cycles. Relation between stabilization of cycles in discrete dynamical systems and finding extremum points allowed us to apply new control methods to accelerate gradient descent when approaching local minima. Gradient descent is often used in training deep neural networks on a par with other iterative methods. Two gradient SGD and Adam were experimented, and we conducted comparative experiments. All experiments were conducted during solving a practical problem of teeth recognition on 2-D panoramic images. Network training showed that the new method outperforms the SGD in its capabilities and as for parameters chosen it approaches the capabilities of Adam, which is a “state of the art” method. Thus, practical utility of using control theory in the training of deep neural networks and possibility of expanding its applicability in the process of creating new algorithms in this important field are shown.

Download Full-text

Asymptotics of Reinforcement Learning with Neural Networks

Stochastic Systems ◽

10.1287/stsy.2021.0072 ◽

2021 ◽

Author(s):

Justin Sirignano ◽

Konstantinos Spiliopoulos

Keyword(s):

Differential Equation ◽

Neural Networks ◽

Stationary Solution ◽

Gradient Descent ◽

Learning Algorithm ◽

Single Layer ◽

Stochastic Gradient Descent ◽

Distributed Data ◽

Limiting Behavior ◽

Q Learning

We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution that is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on independent and identically distributed data with stochastic gradient descent under the widely used Xavier initialization.

Download Full-text