Mixture models for rating data: the method of moments via Groebner bases

Maria Iannario; Rosaria Simone

doi:10.18409/jas.v8i2.60

Mixture models for rating data: the method of moments via Groebner bases

Journal of Algebraic Statistics ◽

10.18409/jas.v8i2.60 ◽

2017 ◽

Vol 8 (2) ◽

Cited By ~ 3

Author(s):

Maria Iannario ◽

Rosaria Simone

Keyword(s):

Mixture Models ◽

Method Of Moments ◽

Ordinal Data ◽

Expectation Maximization Algorithm ◽

Parameter Estimates ◽

Model Parameters ◽

Groebner Bases ◽

Rating Data ◽

Ordinal Data Analysis ◽

The Moment

A recent thread of research in ordinal data analysis involves a class of mixture models that designs the responses as the combination of the two main aspects driving the decision pro- cess: a feeling and an uncertainty components. This novel paradigm has been proven flexible to account also for overdispersion. In this context, Groebner bases are exploited to estimate model parameters by implementing the method of moments. In order to strengthen the validity of the moment procedure so derived, alternatives parameter estimates are tested by means of a simulation experiment. Results show that the moment estimators are satisfactory per se, and that they significantly reduce the bias and perform more efficiently than others when they are set as starting values for the Expectation-Maximization algorithm.

Download Full-text

Latent class and finite mixture models for multilevel data sets

Statistical Methods in Medical Research ◽

10.1177/0962280207081238 ◽

2008 ◽

Vol 17 (1) ◽

pp. 33-51 ◽

Cited By ~ 117

Author(s):

Jeroen K Vermunt

Keyword(s):

Mixture Models ◽

Random Effects ◽

Latent Class ◽

Finite Mixture Models ◽

Expectation Maximization Algorithm ◽

Likelihood Estimation ◽

Finite Mixture ◽

Model Parameters ◽

Data Sets ◽

Special Cases

An extension of latent class (LC) and finite mixture models is described for the analysis of hierarchical data sets. As is typical in multilevel analysis, the dependence between lower-level units within higher-level units is dealt with by assuming that certain model parameters differ randomly across higher-level observations. One of the special cases is an LC model in which group-level differences in the logit of belonging to a particular LC are captured with continuous random effects. Other variants are obtained by including random effects in the model for the response variables rather than for the LCs. The variant that receives most attention in this article is an LC model with discrete random effects: higher-level units are clustered based on the likelihood of their members belonging to the various LCs. This yields a model with mixture distributions at two levels, namely at the group and the subject level. This model is illustrated with three rather different empirical examples. The appendix describes an adapted version of the expectation—maximization algorithm that can be used for maximum likelihood estimation, as well as providing setups for estimating the multilevel LC model with generally available software.

Download Full-text

Determining Roe and Metz model parameters for simulating multireader multicase confidence-of-disease rating data based on read-data or conjectured Obuchowski-Rockette parameter estimates

Medical Imaging 2020: Image Perception, Observer Performance, and Technology Assessment ◽

10.1117/12.2550541 ◽

2020 ◽

Author(s):

Stephen L. Hillis

Keyword(s):

Parameter Estimates ◽

Model Parameters ◽

Rating Data ◽

Disease Rating

Download Full-text

IRTree Model: An Alternative Approach for Self-Reported Ordinal Data Analysis

Korean Society for Educational Evaluation ◽

10.31158/jeev.2019.32.2.303 ◽

2019 ◽

Vol 32 (2) ◽

pp. 303-323

Author(s):

Yoonsun Jang ◽

Meereem Kim ◽

Juyeon Lee

Keyword(s):

Data Analysis ◽

Ordinal Data ◽

Alternative Approach ◽

Ordinal Data Analysis

Download Full-text

Parameter stability and consistency in an alongshore-current model determined with Markov chain Monte Carlo

Journal of Hydroinformatics ◽

10.2166/hydro.2008.016 ◽

2008 ◽

Vol 10 (2) ◽

pp. 153-162 ◽

Cited By ~ 2

Author(s):

B. G. Ruessink

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Parameter Estimates ◽

Model Parameters ◽

Parameter Stability ◽

Small Uncertainty ◽

Best Fit ◽

Stability And Consistency

When a numerical model is to be used as a practical tool, its parameters should preferably be stable and consistent, that is, possess a small uncertainty and be time-invariant. Using data and predictions of alongshore mean currents flowing on a beach as a case study, this paper illustrates how parameter stability and consistency can be assessed using Markov chain Monte Carlo. Within a single calibration run, Markov chain Monte Carlo estimates the parameter posterior probability density function, its mode being the best-fit parameter set. Parameter stability is investigated by stepwise adding new data to a calibration run, while consistency is examined by calibrating the model on different datasets of equal length. The results for the present case study indicate that various tidal cycles with strong (say, >0.5 m/s) currents are required to obtain stable parameter estimates, and that the best-fit model parameters and the underlying posterior distribution are strongly time-varying. This inconsistent parameter behavior may reflect unresolved variability of the processes represented by the parameters, or may represent compensational behavior for temporal violations in specific model assumptions.

Download Full-text

Multi-day flow forecasting using the Kalman filter

Canadian Journal of Civil Engineering ◽

10.1139/l91-037 ◽

1991 ◽

Vol 18 (2) ◽

pp. 320-327 ◽

Cited By ~ 1

Author(s):

Murray A. Fitch ◽

Edward A. McBean

Keyword(s):

Kalman Filter ◽

Prediction Model ◽

Linear Prediction ◽

Measured Data ◽

System Model ◽

Parameter Estimates ◽

Model Parameters ◽

Forecasting Performance ◽

Optimal Forecasting ◽

Linear Prediction Model

A model is developed for the prediction of river flows resulting from combined snowmelt and precipitation. The model employs a Kalman filter to reflect uncertainty both in the measured data and in the system model parameters. The forecasting algorithm is used to develop multi-day forecasts for the Sturgeon River, Ontario. The algorithm is shown to develop good 1-day and 2-day ahead forecasts, but the linear prediction model is found inadequate for longer-term forecasts. Good initial parameter estimates are shown to be essential for optimal forecasting performance. Key words: Kalman filter, streamflow forecast, multi-day, streamflow, Sturgeon River, MISP algorithm.

Download Full-text

Data Assimilation with Gaussian Mixture Models Using the Dynamically Orthogonal Field Equations. Part I: Theory and Scheme

Monthly Weather Review ◽

10.1175/mwr-d-11-00295.1 ◽

2013 ◽

Vol 141 (6) ◽

pp. 1737-1760 ◽

Cited By ~ 31

Author(s):

Thomas Sondergaard ◽

Pierre F. J. Lermusiaux

Keyword(s):

Data Assimilation ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Expectation Maximization Algorithm ◽

Planck Equation ◽

Information Criterion ◽

Gaussian Mixture ◽

Field Equations ◽

Dynamical Equations ◽

Prior Probabilities

Abstract This work introduces and derives an efficient, data-driven assimilation scheme, focused on a time-dependent stochastic subspace that respects nonlinear dynamics and captures non-Gaussian statistics as it occurs. The motivation is to obtain a filter that is applicable to realistic geophysical applications, but that also rigorously utilizes the governing dynamical equations with information theory and learning theory for efficient Bayesian data assimilation. Building on the foundations of classical filters, the underlying theory and algorithmic implementation of the new filter are developed and derived. The stochastic Dynamically Orthogonal (DO) field equations and their adaptive stochastic subspace are employed to predict prior probabilities for the full dynamical state, effectively approximating the Fokker–Planck equation. At assimilation times, the DO realizations are fit to semiparametric Gaussian Mixture Models (GMMs) using the Expectation-Maximization algorithm and the Bayesian Information Criterion. Bayes’s law is then efficiently carried out analytically within the evolving stochastic subspace. The resulting GMM-DO filter is illustrated in a very simple example. Variations of the GMM-DO filter are also provided along with comparisons with related schemes.

Download Full-text

Generalised DOPs with Consideration of the Influence Function of Signal-in-Space Errors

Journal of Navigation ◽

10.1017/s0373463311000415 ◽

2011 ◽

Vol 64 (S1) ◽

pp. S3-S18 ◽

Cited By ~ 20

Author(s):

Yuanxi Yang ◽

Jinlong Li ◽

Junyi Xu ◽

Jing Tang

Keyword(s):

Least Squares ◽

Stochastic Models ◽

Influence Function ◽

A Priori ◽

Functional Model ◽

Global Navigation Satellite Systems ◽

Parameter Estimates ◽

Model Parameters ◽

Integrated Navigation ◽

Satellite Systems

Integrated navigation using multiple Global Navigation Satellite Systems (GNSS) is beneficial to increase the number of observable satellites, alleviate the effects of systematic errors and improve the accuracy of positioning, navigation and timing (PNT). When multiple constellations and multiple frequency measurements are employed, the functional and stochastic models as well as the estimation principle for PNT may be different. Therefore, the commonly used definition of “dilution of precision (DOP)” based on the least squares (LS) estimation and unified functional and stochastic models will be not applicable anymore. In this paper, three types of generalised DOPs are defined. The first type of generalised DOP is based on the error influence function (IF) of pseudo-ranges that reflects the geometry strength of the measurements, error magnitude and the estimation risk criteria. When the least squares estimation is used, the first type of generalised DOP is identical to the one commonly used. In order to define the first type of generalised DOP, an IF of signal–in-space (SIS) errors on the parameter estimates of PNT is derived. The second type of generalised DOP is defined based on the functional model with additional systematic parameters induced by the compatibility and interoperability problems among different GNSS systems. The third type of generalised DOP is defined based on Bayesian estimation in which the a priori information of the model parameters is taken into account. This is suitable for evaluating the precision of kinematic positioning or navigation. Different types of generalised DOPs are suitable for different PNT scenarios and an example for the calculation of these DOPs for multi-GNSS systems including GPS, GLONASS, Compass and Galileo is given. New observation equations of Compass and GLONASS that may contain additional parameters for interoperability are specifically investigated. It shows that if the interoperability of multi-GNSS is not fulfilled, the increased number of satellites will not significantly reduce the generalised DOP value. Furthermore, the outlying measurements will not change the original DOP, but will change the first type of generalised DOP which includes a robust error IF. A priori information of the model parameters will also reduce the DOP.

Download Full-text

Selective Linear Segmentation for Detecting Relevant Parameter Changes*

Journal of Financial Econometrics ◽

10.1093/jjfinec/nbaa032 ◽

2020 ◽

Author(s):

Arnaud Dufays ◽

Elysee Aristide Houndetoungan ◽

Alain Coën

Keyword(s):

Time Series ◽

Hedge Fund ◽

Expectation Maximization Algorithm ◽

Model Parameters ◽

Relevant Parameter ◽

Time Varying ◽

Long Time ◽

Model Selection Uncertainty ◽

Penalized Likelihood Approach ◽

Changes Over Time

Abstract Change-point (CP) processes are one flexible approach to model long time series. We propose a method to uncover which model parameters truly vary when a CP is detected. Given a set of breakpoints, we use a penalized likelihood approach to select the best set of parameters that changes over time and we prove that the penalty function leads to a consistent selection of the true model. Estimation is carried out via the deterministic annealing expectation-maximization algorithm. Our method accounts for model selection uncertainty and associates a probability to all the possible time-varying parameter specifications. Monte Carlo simulations highlight that the method works well for many time series models including heteroskedastic processes. For a sample of fourteen hedge fund (HF) strategies, using an asset-based style pricing model, we shed light on the promising ability of our method to detect the time-varying dynamics of risk exposures as well as to forecast HF returns.

Download Full-text

Optimized blood sampling protocols and sequential design of kinetic experiments

AJP Regulatory Integrative and Comparative Physiology ◽

10.1152/ajpregu.1981.240.5.r259 ◽

1981 ◽

Vol 240 (5) ◽

pp. R259-R265 ◽

Cited By ~ 1

Author(s):

J. J. DiStefano

Keyword(s):

Endocrine System ◽

Primary Objective ◽

Blood Sampling ◽

Optimal Designs ◽

Parameter Estimates ◽

Model Parameters ◽

Reverse T3 ◽

Model Parameter ◽

Maximum Accuracy ◽

Sampling Protocols

Design of optimal blood sampling protocols for kinetic experiments is discussed and evaluated, with the aid of several examples--including an endocrine system case study. The criterion of optimality is maximum accuracy of kinetic model parameter estimates. A simple example illustrates why a sequential experiment approach is required; optimal designs depend on the true model parameter values, knowledge of which is usually a primary objective of the experiment, as well as the structure of the model and the measurement error (e.g., assay) variance. The methodology is evaluated from the results of a series of experiments designed to quantify the dynamics of distribution and metabolism of three iodothyronines, T3, T4, and reverse-T3. This analysis indicates that 1) the sequential optimal experiment approach can be effective and efficient in the laboratory, 2) it works in the presence of reasonably controlled biological variation, producing sufficiently robust sampling protocols, and 3) optimal designs can be highly efficient designs in practice, requiring for maximum accuracy a number of blood samples equal to the number of independently adjustable model parameters, no more or less.

Download Full-text

MGMM: An R Package for fitting Gaussian Mixture Models on Incomplete Data

10.1101/2019.12.20.884551 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zachary R. McCaw ◽

Hanna Julienne ◽

Hugues Aschard

Keyword(s):

Missing Data ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Model Fitting ◽

Simulated Data ◽

R Package ◽

Gaussian Mixture ◽

Parameter Estimates ◽

Cluster Assignment ◽

Underlying Distribution

AbstractAlthough missing data are prevalent in applications, existing implementations of Gaussian mixture models (GMMs) require complete data. Standard practice is to perform complete case analysis or imputation prior to model fitting. Both approaches have serious drawbacks, potentially resulting in biased and unstable parameter estimates. Here we present MGMM, an R package for fitting GMMs in the presence of missing data. Using three case studies on real and simulated data sets, we demonstrate that, when the underlying distribution is near-to a GMM, MGMM is more effective at recovering the true cluster assignments than state of the art imputation followed by standard GMM. Moreover, MGMM provides an accurate assessment of cluster assignment uncertainty even when the generative distribution is not a GMM. This assessment may be used to identify unassignable observations. MGMM is available as an R package on CRAN: https://CRAN.R-project.org/package=MGMM.

Download Full-text