A Neural Network MCMC Sampler That Maximizes Proposal Entropy

Zengyi Li; Yubei Chen; Friedrich T. Sommer

doi:10.3390/e23030269

A Neural Network MCMC Sampler That Maximizes Proposal Entropy

Entropy ◽

10.3390/e23030269 ◽

2021 ◽

Vol 23 (3) ◽

pp. 269

Author(s):

Zengyi Li ◽

Yubei Chen ◽

Friedrich T. Sommer

Keyword(s):

Neural Network ◽

Neural Networks ◽

Monte Carlo ◽

Markov Chain ◽

Network Architecture ◽

Probability Distributions ◽

Natural Images ◽

Mcmc Methods ◽

Proposal Distribution ◽

Target Distribution

Markov Chain Monte Carlo (MCMC) methods sample from unnormalized probability distributions and offer guarantees of exact sampling. However, in the continuous case, unfavorable geometry of the target distribution can greatly limit the efficiency of MCMC methods. Augmenting samplers with neural networks can potentially improve their efficiency. Previous neural network-based samplers were trained with objectives that either did not explicitly encourage exploration, or contained a term that encouraged exploration but only for well structured distributions. Here we propose to maximize proposal entropy for adapting the proposal to distributions of any shape. To optimize proposal entropy directly, we devised a neural network MCMC sampler that has a flexible and tractable proposal distribution. Specifically, our network architecture utilizes the gradient of the target distribution for generating proposals. Our model achieved significantly higher efficiency than previous neural network MCMC techniques in a variety of sampling tasks, sometimes by more than an order magnitude. Further, the sampler was demonstrated through the training of a convergent energy-based model of natural images. The adaptive sampler achieved unbiased sampling with significantly higher proposal entropy than a Langevin dynamics sample. The trained sampler also achieved better sample quality.

Download Full-text

The Hastings algorithm at fifty

Biometrika ◽

10.1093/biomet/asz066 ◽

2019 ◽

Vol 107 (1) ◽

pp. 1-23 ◽

Cited By ~ 4

Author(s):

D B Dunson ◽

J E Johndrow

Keyword(s):

Markov Chain ◽

Likelihood Function ◽

Marginal Likelihood ◽

Broad Class ◽

Probability Distributions ◽

50Th Anniversary ◽

Monte Carlo Sampling ◽

Proposal Distribution ◽

Target Distribution ◽

Sampling Algorithms

Summary In a 1970 Biometrika paper, W. K. Hastings developed a broad class of Markov chain algorithms for sampling from probability distributions that are difficult to sample from directly. The algorithm draws a candidate value from a proposal distribution and accepts the candidate with a probability that can be computed using only the unnormalized density of the target distribution, allowing one to sample from distributions known only up to a constant of proportionality. The stationary distribution of the corresponding Markov chain is the target distribution one is attempting to sample from. The Hastings algorithm generalizes the Metropolis algorithm to allow a much broader class of proposal distributions instead of just symmetric cases. An important class of applications for the Hastings algorithm corresponds to sampling from Bayesian posterior distributions, which have densities given by a prior density multiplied by a likelihood function and divided by a normalizing constant equal to the marginal likelihood. The marginal likelihood is typically intractable, presenting a fundamental barrier to implementation in Bayesian statistics. This barrier can be overcome by Markov chain Monte Carlo sampling algorithms. Amazingly, even after 50 years, the majority of algorithms used in practice today involve the Hastings algorithm. This article provides a brief celebration of the continuing impact of this ingenious algorithm on the 50th anniversary of its publication.

Download Full-text

Issues in Bayesian Analysis of Neural Network Models

Neural Computation ◽

10.1162/089976698300017737 ◽

1998 ◽

Vol 10 (3) ◽

pp. 749-770 ◽

Cited By ~ 54

Author(s):

Peter Müller ◽

David Rios Insua

Keyword(s):

Neural Network ◽

Neural Networks ◽

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Bayesian Analysis ◽

Network Models ◽

Feedforward Neural Networks ◽

Data Driven ◽

Neural Network Models

Stemming from work by Buntine and Weigend (1991) and MacKay (1992), there is a growing interest in Bayesian analysis of neural network models. Although conceptually simple, this problem is computationally involved. We suggest a very efficient Markov chain Monte Carlo scheme for inference and prediction with fixed-architecture feedforward neural networks. The scheme is then extended to the variable architecture case, providing a data-driven procedure to identify sensible architectures.

Download Full-text

Variational Learning of Bayesian Neural Networks via Bayesian Dark Knowledge

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/282 ◽

2020 ◽

Author(s):

Gehui Shen ◽

Xi Chen ◽

Zhihong Deng

Keyword(s):

Neural Network ◽

Neural Networks ◽

Monte Carlo ◽

Predictive Accuracy ◽

Epistemic Uncertainty ◽

Uncertainty Modeling ◽

Variational Inference ◽

Mcmc Methods ◽

Distillation Method ◽

Bayesian Neural Networks

Bayesian neural networks (BNNs) have received more and more attention because they are capable of modeling epistemic uncertainty which is hard for conventional neural networks. Markov chain Monte Carlo (MCMC) methods and variational inference (VI) are two mainstream methods for Bayesian deep learning. The former is effective but its storage cost is prohibitive since it has to save many samples of neural network parameters. The latter method is more time and space efficient, however the approximate variational posterior limits its performance. In this paper, we aim to combine the advantages of above two methods by distilling MCMC samples into an approximate variational posterior. On the basis of an existing distillation technique we first propose variational Bayesian dark knowledge method. Moreover, we propose Bayesian dark prior knowledge, a novel distillation method which considers MCMC posterior as the prior of a variational BNN. Two proposed methods both not only can reduce the space overhead of the teacher model so that are scalable, but also maintain a distilled posterior distribution capable of modeling epistemic uncertainty. Experimental results manifest our methods outperform existing distillation method in terms of predictive accuracy and uncertainty modeling.

Download Full-text

Limit theorems for sequential MCMC methods

Advances in Applied Probability ◽

10.1017/apr.2020.9 ◽

2020 ◽

Vol 52 (2) ◽

pp. 377-403 ◽

Cited By ~ 2

Author(s):

Axel Finke ◽

Arnaud Doucet ◽

Adam M. Johansen

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Sequential Monte Carlo ◽

Probability Distributions ◽

Asymptotic Variance ◽

Mcmc Methods ◽

Time Step ◽

Strong Law ◽

Large Numbers

AbstractBoth sequential Monte Carlo (SMC) methods (a.k.a. ‘particle filters’) and sequential Markov chain Monte Carlo (sequential MCMC) methods constitute classes of algorithms which can be used to approximate expectations with respect to (a sequence of) probability distributions and their normalising constants. While SMC methods sample particles conditionally independently at each time step, sequential MCMC methods sample particles according to a Markov chain Monte Carlo (MCMC) kernel. Introduced over twenty years ago in [6], sequential MCMC methods have attracted renewed interest recently as they empirically outperform SMC methods in some applications. We establish an $\mathbb{L}_r$ -inequality (which implies a strong law of large numbers) and a central limit theorem for sequential MCMC methods and provide conditions under which errors can be controlled uniformly in time. In the context of state-space models, we also provide conditions under which sequential MCMC methods can indeed outperform standard SMC methods in terms of asymptotic variance of the corresponding Monte Carlo estimators.

Download Full-text

Neural Network Models for Conditional Distribution Under Bayesian Analysis

Neural Computation ◽

10.1162/neco.2007.3182 ◽

2008 ◽

Vol 20 (2) ◽

pp. 504-522 ◽

Cited By ~ 4

Author(s):

Tatiana Miazhynskaia ◽

Sylvia Frühwirth-Schnatter ◽

Georg Dorffner

Keyword(s):

Neural Network ◽

Neural Networks ◽

Monte Carlo ◽

Markov Chain ◽

Conditional Distribution ◽

Strong Support ◽

Network Models ◽

Conditional Density ◽

Neural Network Models ◽

Model Evidence

We use neural networks (NN) as a tool for a nonlinear autoregression to predict the second moment of the conditional density of return series. The NN models are compared to the popular econometric GARCH(1,1) model. We estimate the models in a Bayesian framework using Markov chain Monte Carlo posterior simulations. The interlinked aspects of the proposed Bayesian methodology are identification of NN hidden units and treatment of NN complexity based on model evidence. The empirical study includes the application of the designed strategy to market data, where we found a strong support for a nonlinear multilayer perceptron model with two hidden units.

Download Full-text

SCORING MODELING BASED ON NEURAL NETWORKS FOR DETERMINING A BANK BORROWER'S RATING

Economy of Ukraine ◽

10.15407/economyukr.2020.10.054 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 54-62

Author(s):

Oleksii VASYLIEV ◽

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Architecture ◽

Statistical Data ◽

Activation Function ◽

Decision Making Process ◽

Neural Network Architecture ◽

Acceptable Accuracy ◽

The Neural Network ◽

Sigmoid Activation Function

The problem of applying neural networks to calculate ratings used in banking in the decision-making process on granting or not granting loans to borrowers is considered. The task is to determine the rating function of the borrower based on a set of statistical data on the effectiveness of loans provided by the bank. When constructing a regression model to calculate the rating function, it is necessary to know its general form. If so, the task is to calculate the parameters that are included in the expression for the rating function. In contrast to this approach, in the case of using neural networks, there is no need to specify the general form for the rating function. Instead, certain neural network architecture is chosen and parameters are calculated for it on the basis of statistical data. Importantly, the same neural network architecture can be used to process different sets of statistical data. The disadvantages of using neural networks include the need to calculate a large number of parameters. There is also no universal algorithm that would determine the optimal neural network architecture. As an example of the use of neural networks to determine the borrower's rating, a model system is considered, in which the borrower's rating is determined by a known non-analytical rating function. A neural network with two inner layers, which contain, respectively, three and two neurons and have a sigmoid activation function, is used for modeling. It is shown that the use of the neural network allows restoring the borrower's rating function with quite acceptable accuracy.

Download Full-text

Application of Assisted History Matching Workflow to Shale Gas Well Using EDFM and Neural Network-Markov Chain Monte Carlo Algorithm

Proceedings of the 7th Unconventional Resources Technology Conference ◽

10.15530/urtec-2019-659 ◽

2019 ◽

Cited By ~ 5

Author(s):

Sutthaporn Tripoppoom ◽

Wei Yu ◽

Kamy Sepehrnoori ◽

Jijun Miao

Keyword(s):

Neural Network ◽

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Shale Gas ◽

History Matching ◽

Monte Carlo Algorithm ◽

Gas Well ◽

Assisted History Matching

Download Full-text

SketchGNN: Semantic Sketch Segmentation with Graph Neural Networks

ACM Transactions on Graphics ◽

10.1145/3450284 ◽

2021 ◽

Vol 40 (3) ◽

pp. 1-13

Author(s):

Lumin Yang ◽

Jiajie Zhuang ◽

Hongbo Fu ◽

Xiangzhi Wei ◽

Kun Zhou ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Architecture ◽

Large Scale ◽

State Of The Art ◽

Semantic Segmentation ◽

Structure Information ◽

Graph Neural Networks ◽

Node Labels ◽

Point Level

We introduce SketchGNN , a convolutional graph neural network for semantic segmentation and labeling of freehand vector sketches. We treat an input stroke-based sketch as a graph with nodes representing the sampled points along input strokes and edges encoding the stroke structure information. To predict the per-node labels, our SketchGNN uses graph convolution and a static-dynamic branching network architecture to extract the features at three levels, i.e., point-level, stroke-level, and sketch-level. SketchGNN significantly improves the accuracy of the state-of-the-art methods for semantic sketch segmentation (by 11.2% in the pixel-based metric and 18.2% in the component-based metric over a large-scale challenging SPG dataset) and has magnitudes fewer parameters than both image-based and sequence-based methods.

Download Full-text

Mapping-Linked Quantitative Trait Loci Using Bayesian Analysis and Markov Chain Monte Carlo Algorithms

Genetics ◽

10.1093/genetics/146.2.735 ◽

1997 ◽

Vol 146 (2) ◽

pp. 735-743 ◽

Cited By ~ 1

Author(s):

Pekka Uimari ◽

Ina Hoeschele

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Quantitative Trait Loci ◽

Quantitative Trait ◽

Allele Frequencies ◽

Mcmc Methods ◽

Substitution Effects ◽

Indicator Variable ◽

Trait Loci

A Bayesian method for mapping linked quantitative trait loci (QTL) using multiple linked genetic markers is presented. Parameter estimation and hypothesis testing was implemented via Markov chain Monte Carlo (MCMC) algorithms. Parameters included were allele frequencies and substitution effects for two biallelic QTL, map positions of the QTL and markers, allele frequencies of the markers, and polygenic and residual variances. Missing data were polygenic effects and multi-locus marker-QTL genotypes. Three different MCMC schemes for testing the presence of a single or two linked QTL on the chromosome were compared. The first approach includes a model indicator variable representing two unlinked QTL affecting the trait, one linked and one unlinked QTL, or both QTL linked with the markers. The second approach incorporates an indicator variable for each QTL into the model for phenotype, allowing or not allowing for a substitution effect of a QTL on phenotype, and the third approach is based on model determination by reversible jump MCMC. Methods were evaluated empirically by analyzing simulated granddaughter designs. All methods identified correctly a second, linked QTL and did not reject the one-QTL model when there was only a single QTL and no additional or an unlinked QTL.

Download Full-text

Classification of Skin Disease Using Deep Learning Neural Networks with MobileNet V2 and LSTM

Sensors ◽

10.3390/s21082852 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2852

Author(s):

Parvathaneni Naga Srinivasu ◽

Jalluri Gnana SivaSai ◽

Muhammad Fazal Ijaz ◽

Akash Kumar Bhoi ◽

Wonjoon Kim ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Network ◽

Skin Disease ◽

Network Architecture ◽

Large Scale ◽

Short Term Memory ◽

Convolutional Networks ◽

Occurrence Matrix

Deep learning models are efficient in learning the features that assist in understanding complex patterns precisely. This study proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM). The MobileNet V2 model proved to be efficient with a better accuracy that can work on lightweight computational devices. The proposed model is efficient in maintaining stateful information for precise predictions. A grey-level co-occurrence matrix is used for assessing the progress of diseased growth. The performance has been compared against other state-of-the-art models such as Fine-Tuned Neural Networks (FTNN), Convolutional Neural Network (CNN), Very Deep Convolutional Networks for Large-Scale Image Recognition developed by Visual Geometry Group (VGG), and convolutional neural network architecture that expanded with few changes. The HAM10000 dataset is used and the proposed method has outperformed other methods with more than 85% accuracy. Its robustness in recognizing the affected region much faster with almost 2× lesser computations than the conventional MobileNet model results in minimal computational efforts. Furthermore, a mobile application is designed for instant and proper action. It helps the patient and dermatologists identify the type of disease from the affected region’s image at the initial stage of the skin disease. These findings suggest that the proposed system can help general practitioners efficiently and effectively diagnose skin conditions, thereby reducing further complications and morbidity.

Download Full-text