scholarly journals Mixture models for rating data: the method of moments via Groebner bases

2017 ◽  
Vol 8 (2) ◽  
Author(s):  
Maria Iannario ◽  
Rosaria Simone

 A recent thread of research in ordinal data analysis involves a class of mixture models that designs the responses as the combination of the two main aspects driving the decision pro- cess: a feeling and an uncertainty components. This novel paradigm has been proven flexible to account also for overdispersion. In this context, Groebner bases are exploited to estimate model parameters by implementing the method of moments. In order to strengthen the validity of the moment procedure so derived, alternatives parameter estimates are tested by means of a simulation experiment. Results show that the moment estimators are satisfactory per se, and that they significantly reduce the bias and perform more efficiently than others when they are set as starting values for the Expectation-Maximization algorithm. 

2008 ◽  
Vol 17 (1) ◽  
pp. 33-51 ◽  
Author(s):  
Jeroen K Vermunt

An extension of latent class (LC) and finite mixture models is described for the analysis of hierarchical data sets. As is typical in multilevel analysis, the dependence between lower-level units within higher-level units is dealt with by assuming that certain model parameters differ randomly across higher-level observations. One of the special cases is an LC model in which group-level differences in the logit of belonging to a particular LC are captured with continuous random effects. Other variants are obtained by including random effects in the model for the response variables rather than for the LCs. The variant that receives most attention in this article is an LC model with discrete random effects: higher-level units are clustered based on the likelihood of their members belonging to the various LCs. This yields a model with mixture distributions at two levels, namely at the group and the subject level. This model is illustrated with three rather different empirical examples. The appendix describes an adapted version of the expectation—maximization algorithm that can be used for maximum likelihood estimation, as well as providing setups for estimating the multilevel LC model with generally available software.


2008 ◽  
Vol 10 (2) ◽  
pp. 153-162 ◽  
Author(s):  
B. G. Ruessink

When a numerical model is to be used as a practical tool, its parameters should preferably be stable and consistent, that is, possess a small uncertainty and be time-invariant. Using data and predictions of alongshore mean currents flowing on a beach as a case study, this paper illustrates how parameter stability and consistency can be assessed using Markov chain Monte Carlo. Within a single calibration run, Markov chain Monte Carlo estimates the parameter posterior probability density function, its mode being the best-fit parameter set. Parameter stability is investigated by stepwise adding new data to a calibration run, while consistency is examined by calibrating the model on different datasets of equal length. The results for the present case study indicate that various tidal cycles with strong (say, >0.5 m/s) currents are required to obtain stable parameter estimates, and that the best-fit model parameters and the underlying posterior distribution are strongly time-varying. This inconsistent parameter behavior may reflect unresolved variability of the processes represented by the parameters, or may represent compensational behavior for temporal violations in specific model assumptions.


1991 ◽  
Vol 18 (2) ◽  
pp. 320-327 ◽  
Author(s):  
Murray A. Fitch ◽  
Edward A. McBean

A model is developed for the prediction of river flows resulting from combined snowmelt and precipitation. The model employs a Kalman filter to reflect uncertainty both in the measured data and in the system model parameters. The forecasting algorithm is used to develop multi-day forecasts for the Sturgeon River, Ontario. The algorithm is shown to develop good 1-day and 2-day ahead forecasts, but the linear prediction model is found inadequate for longer-term forecasts. Good initial parameter estimates are shown to be essential for optimal forecasting performance. Key words: Kalman filter, streamflow forecast, multi-day, streamflow, Sturgeon River, MISP algorithm.


2013 ◽  
Vol 141 (6) ◽  
pp. 1737-1760 ◽  
Author(s):  
Thomas Sondergaard ◽  
Pierre F. J. Lermusiaux

Abstract This work introduces and derives an efficient, data-driven assimilation scheme, focused on a time-dependent stochastic subspace that respects nonlinear dynamics and captures non-Gaussian statistics as it occurs. The motivation is to obtain a filter that is applicable to realistic geophysical applications, but that also rigorously utilizes the governing dynamical equations with information theory and learning theory for efficient Bayesian data assimilation. Building on the foundations of classical filters, the underlying theory and algorithmic implementation of the new filter are developed and derived. The stochastic Dynamically Orthogonal (DO) field equations and their adaptive stochastic subspace are employed to predict prior probabilities for the full dynamical state, effectively approximating the Fokker–Planck equation. At assimilation times, the DO realizations are fit to semiparametric Gaussian Mixture Models (GMMs) using the Expectation-Maximization algorithm and the Bayesian Information Criterion. Bayes’s law is then efficiently carried out analytically within the evolving stochastic subspace. The resulting GMM-DO filter is illustrated in a very simple example. Variations of the GMM-DO filter are also provided along with comparisons with related schemes.


2011 ◽  
Vol 64 (S1) ◽  
pp. S3-S18 ◽  
Author(s):  
Yuanxi Yang ◽  
Jinlong Li ◽  
Junyi Xu ◽  
Jing Tang

Integrated navigation using multiple Global Navigation Satellite Systems (GNSS) is beneficial to increase the number of observable satellites, alleviate the effects of systematic errors and improve the accuracy of positioning, navigation and timing (PNT). When multiple constellations and multiple frequency measurements are employed, the functional and stochastic models as well as the estimation principle for PNT may be different. Therefore, the commonly used definition of “dilution of precision (DOP)” based on the least squares (LS) estimation and unified functional and stochastic models will be not applicable anymore. In this paper, three types of generalised DOPs are defined. The first type of generalised DOP is based on the error influence function (IF) of pseudo-ranges that reflects the geometry strength of the measurements, error magnitude and the estimation risk criteria. When the least squares estimation is used, the first type of generalised DOP is identical to the one commonly used. In order to define the first type of generalised DOP, an IF of signal–in-space (SIS) errors on the parameter estimates of PNT is derived. The second type of generalised DOP is defined based on the functional model with additional systematic parameters induced by the compatibility and interoperability problems among different GNSS systems. The third type of generalised DOP is defined based on Bayesian estimation in which the a priori information of the model parameters is taken into account. This is suitable for evaluating the precision of kinematic positioning or navigation. Different types of generalised DOPs are suitable for different PNT scenarios and an example for the calculation of these DOPs for multi-GNSS systems including GPS, GLONASS, Compass and Galileo is given. New observation equations of Compass and GLONASS that may contain additional parameters for interoperability are specifically investigated. It shows that if the interoperability of multi-GNSS is not fulfilled, the increased number of satellites will not significantly reduce the generalised DOP value. Furthermore, the outlying measurements will not change the original DOP, but will change the first type of generalised DOP which includes a robust error IF. A priori information of the model parameters will also reduce the DOP.


Author(s):  
Arnaud Dufays ◽  
Elysee Aristide Houndetoungan ◽  
Alain Coën

Abstract Change-point (CP) processes are one flexible approach to model long time series. We propose a method to uncover which model parameters truly vary when a CP is detected. Given a set of breakpoints, we use a penalized likelihood approach to select the best set of parameters that changes over time and we prove that the penalty function leads to a consistent selection of the true model. Estimation is carried out via the deterministic annealing expectation-maximization algorithm. Our method accounts for model selection uncertainty and associates a probability to all the possible time-varying parameter specifications. Monte Carlo simulations highlight that the method works well for many time series models including heteroskedastic processes. For a sample of fourteen hedge fund (HF) strategies, using an asset-based style pricing model, we shed light on the promising ability of our method to detect the time-varying dynamics of risk exposures as well as to forecast HF returns.


1981 ◽  
Vol 240 (5) ◽  
pp. R259-R265 ◽  
Author(s):  
J. J. DiStefano

Design of optimal blood sampling protocols for kinetic experiments is discussed and evaluated, with the aid of several examples--including an endocrine system case study. The criterion of optimality is maximum accuracy of kinetic model parameter estimates. A simple example illustrates why a sequential experiment approach is required; optimal designs depend on the true model parameter values, knowledge of which is usually a primary objective of the experiment, as well as the structure of the model and the measurement error (e.g., assay) variance. The methodology is evaluated from the results of a series of experiments designed to quantify the dynamics of distribution and metabolism of three iodothyronines, T3, T4, and reverse-T3. This analysis indicates that 1) the sequential optimal experiment approach can be effective and efficient in the laboratory, 2) it works in the presence of reasonably controlled biological variation, producing sufficiently robust sampling protocols, and 3) optimal designs can be highly efficient designs in practice, requiring for maximum accuracy a number of blood samples equal to the number of independently adjustable model parameters, no more or less.


Author(s):  
Zachary R. McCaw ◽  
Hanna Julienne ◽  
Hugues Aschard

AbstractAlthough missing data are prevalent in applications, existing implementations of Gaussian mixture models (GMMs) require complete data. Standard practice is to perform complete case analysis or imputation prior to model fitting. Both approaches have serious drawbacks, potentially resulting in biased and unstable parameter estimates. Here we present MGMM, an R package for fitting GMMs in the presence of missing data. Using three case studies on real and simulated data sets, we demonstrate that, when the underlying distribution is near-to a GMM, MGMM is more effective at recovering the true cluster assignments than state of the art imputation followed by standard GMM. Moreover, MGMM provides an accurate assessment of cluster assignment uncertainty even when the generative distribution is not a GMM. This assessment may be used to identify unassignable observations. MGMM is available as an R package on CRAN: https://CRAN.R-project.org/package=MGMM.


Sign in / Sign up

Export Citation Format

Share Document