scholarly journals Robustness of Gaussian Mixture Reduction for Split-and-Conquer Learning of Finite Gaussian Mixtures

Author(s):  
Qiong Zhang ◽  
Jiahua Chen
Author(s):  
Pietro Coretto

AbstractIn this paper we study a finite Gaussian mixture model with an additional uniform component that has the role to catch points in the tails of the data distribution. An adaptive constraint enforces a certain level of separation between the Gaussian mixture components and the uniform component representing noise and outliers in the tail of the distribution. The latter makes the proposed tool particularly useful for robust estimation and outlier identification. A constrained ML estimator is introduced for which existence and consistency is shown. One of the attractive features of the methodology is that the noise level is estimated from data. We also develop an EM-type algorithm with proven convergence. Based on numerical evidence we show how the methods developed in this paper are useful for several fundamental data analysis tasks: outlier identification, robust location-scale estimation, clustering, and density estimation.


2003 ◽  
Vol 15 (2) ◽  
pp. 469-485 ◽  
Author(s):  
J. J. Verbeek ◽  
N. Vlassis ◽  
B. Kröse

This article concerns the greedy learning of gaussian mixtures. In the greedy approach, mixture components are inserted into the mixture one aftertheother.We propose a heuristic for searching for the optimal component to insert. In a randomized manner, a set of candidate new components is generated. For each of these candidates, we find the locally optimal new component and insert it into the existing mixture. The resulting algorithm resolves the sensitivity to initialization of state-of-the-art methods, like expectation maximization, and has running time linear in the number of data points and quadratic in the (final) number of mixture components. Due to its greedy nature, the algorithm can be particularly useful when the optimal number of mixture components is unknown. Experimental results comparing the proposed algorithm to other methods on density estimation and texture segmentation are provided.


1996 ◽  
Vol 8 (1) ◽  
pp. 129-151 ◽  
Author(s):  
Lei Xu ◽  
Michael I. Jordan

We build up the mathematical connection between the “Expectation-Maximization” (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite gaussian mixtures. We show that the EM step in parameter space is obtained from the gradient via a projection matrix P, and we provide an explicit expression for the matrix. We then analyze the convergence of EM in terms of special properties of P and provide new results analyzing the effect that P has on the likelihood surface. Based on these mathematical results, we present a comparative discussion of the advantages and disadvantages of EM and other algorithms for the learning of gaussian mixture models.


Author(s):  
Arturo Pacheco-Vega ◽  
Gabriela Avila

We introduce a methodology to extract the regimes of operation from condensing heat exchanger data. The methodology uses a Gaussian mixture clustering algorithm to determine the number of groups from the data, and a maximum likelihood decision rule to classify the data into these clusters. In order to assess the accuracy of clustering technique, experimental data from the literature visually classified as dry-surface, dropwise condensation, and film condensation, are used in the analysis. Though there is a discrepancy between the clustering classification and the visual one, an independent evaluation using artificial neural networks (ANNs) shows that the clustering methodology is able to both find the different regimes of operation and classify the data corresponding to each regime.


2019 ◽  
Vol 9 (3) ◽  
pp. 587-600 ◽  
Author(s):  
Carlos Améndola ◽  
Alexander Engström ◽  
Christian Haase

Abstract Gaussian mixture models are widely used in Statistics. A fundamental aspect of these distributions is the study of the local maxima of the density or modes. In particular, it is not known how many modes a mixture of $k$ Gaussians in $d$ dimensions can have. We give a brief account of this problem’s history. Then, we give improved lower bounds and the first upper bound on the maximum number of modes, provided it is finite.


2006 ◽  
Vol 18 (2) ◽  
pp. 430-445 ◽  
Author(s):  
Marc M. Van Hulle

We introduce a new unbiased metric for assessing the quality of density estimation based on gaussian mixtures, called differential log likelihood. As an application, we determine the optimal smoothness and the optimal number of kernels in gaussian mixtures. Furthermore, we suggest a learning strategy for gaussian mixture density estimation and compare its performance with log likelihood maximization for a wide range of real-world data sets.


Author(s):  
Yong Lü ◽  
Han Lin ◽  
Pingping Wu ◽  
Yitao Chen

AbstractIn this paper, we propose a novel feature compensation algorithm based on independent noise estimation, which employs a Gaussian mixture model (GMM) with fewer Gaussian components to rapidly estimate the noise parameters from the noisy speech and monitor the noise variation. The estimated noise model is combined with a GMM with sufficient Gaussian mixtures to produce the noisy GMM for the clean speech estimation so that parameters are updated if and only if the noise variation occurs. Experimental results show that the proposed algorithm can achieve the recognition accuracy similar to that of the traditional GMM-based feature compensation, but significantly reduces the computational cost, and thereby is more useful for resource-limited mobile devices.


Author(s):  
Gregory Beylkin ◽  
Lucas Monzón ◽  
Xinshuo Yang

We present a new adaptive method for electronic structure calculations based on novel fast algorithms for reduction of multivariate mixtures. In our calculations, spatial orbitals are maintained as Gaussian mixtures whose terms are selected in the process of solving equations. Using a fixed basis leads to the so-called basis error since orbitals may not lie entirely within the linear span of the basis. To avoid such an error, multiresolution bases are used in adaptive algorithms so that basis functions are selected from a fixed collection of functions, large enough to approximate solutions within any user-selected accuracy. Our new method achieves adaptivity without using a multiresolution basis. Instead, as part of an iteration to solve nonlinear equations, our algorithm selects the ‘best’ subset of linearly independent terms of a Gaussian mixture from a collection that is much larger than any possible basis since the locations and shapes of the Gaussian terms are not fixed in advance. Approximating an orbital within a given accuracy, our algorithm yields significantly fewer terms than methods using multiresolution bases. We demonstrate our approach by solving the Hartree–Fock equations for two diatomic molecules, HeH + and LiH, matching the accuracy previously obtained using multiwavelet bases.


2019 ◽  
Vol 13 (3) ◽  
pp. 871-882
Author(s):  
Burcin Simsek ◽  
Satish Iyengar

We consider the problem of computing tail probabilities - that is, probabilities of regions with low density - for high-dimensional Gaussian mixtures. We consider three approaches: the first is a bound based on the central and non-central ?2 distributions; the second uses Pearson curves with the first three moments of the criterion random variable U; the third embeds the distribution of U in an exponential family, and uses exponential tilting, which in turn suggests an importance sampling distribution. We illustrate each method with examples and assess their relative merits.


Sign in / Sign up

Export Citation Format

Share Document