scholarly journals Average Contrastive Divergence for Training Restricted Boltzmann Machines

Entropy ◽  
2016 ◽  
Vol 18 (1) ◽  
pp. 35 ◽  
Author(s):  
Xuesi Ma ◽  
Xiaojie Wang
Author(s):  
BERGHOUT Tarek

Abstract: The main contribution of this paper is to introduce a new iterative training algorithm for restricted Boltzmann machines. The proposed learning path is inspired from online sequential extreme learning machine one of extreme learning machine variants which deals with time accumulated sequences of data with fixed or varied sizes. Recursive least squares rules are integrated for weights adaptation to avoid learning rate tuning and local minimum issues. The proposed approach is compared to one of the well known training algorithms for Boltzmann machines named “contrastive divergence”, in term of time, accuracy and algorithmic complexity under the same conditions. Results strongly encourage the new given rules during data reconstruction.


2011 ◽  
Vol 23 (3) ◽  
pp. 664-673 ◽  
Author(s):  
Asja Fischer ◽  
Christian Igel

Optimization based on k-step contrastive divergence (CD) has become a common way to train restricted Boltzmann machines (RBMs). The k-step CD is a biased estimator of the log-likelihood gradient relying on Gibbs sampling. We derive a new upper bound for this bias. Its magnitude depends on k, the number of variables in the RBM, and the maximum change in energy that can be produced by changing a single variable. The last reflects the dependence on the absolute values of the RBM parameters. The magnitude of the bias is also affected by the distance in variation between the modeled distribution and the starting distribution of the Gibbs chain.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Xuesi Ma ◽  
Xiaojie Wang

Contrastive Divergence has become a common way to train Restricted Boltzmann Machines; however, its convergence has not been made clear yet. This paper studies the convergence of Contrastive Divergence algorithm. We relate Contrastive Divergence algorithm to gradient method with errors and derive convergence conditions of Contrastive Divergence algorithm using the convergence theorem of gradient method with errors. We give specific convergence conditions of Contrastive Divergence learning algorithm for Restricted Boltzmann Machines in which both visible units and hidden units can only take a finite number of values. Two new convergence conditions are obtained by specifying the learning rate. Finally, we give specific conditions that the step number of Gibbs sampling must be satisfied in order to guarantee the Contrastive Divergence algorithm convergence.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Guanglei Xu ◽  
William S. Oates

AbstractRestricted Boltzmann Machines (RBMs) have been proposed for developing neural networks for a variety of unsupervised machine learning applications such as image recognition, drug discovery, and materials design. The Boltzmann probability distribution is used as a model to identify network parameters by optimizing the likelihood of predicting an output given hidden states trained on available data. Training such networks often requires sampling over a large probability space that must be approximated during gradient based optimization. Quantum annealing has been proposed as a means to search this space more efficiently which has been experimentally investigated on D-Wave hardware. D-Wave implementation requires selection of an effective inverse temperature or hyperparameter ($$\beta $$ β ) within the Boltzmann distribution which can strongly influence optimization. Here, we show how this parameter can be estimated as a hyperparameter applied to D-Wave hardware during neural network training by maximizing the likelihood or minimizing the Shannon entropy. We find both methods improve training RBMs based upon D-Wave hardware experimental validation on an image recognition problem. Neural network image reconstruction errors are evaluated using Bayesian uncertainty analysis which illustrate more than an order magnitude lower image reconstruction error using the maximum likelihood over manually optimizing the hyperparameter. The maximum likelihood method is also shown to out-perform minimizing the Shannon entropy for image reconstruction.


Sign in / Sign up

Export Citation Format

Share Document