FITTING MIXTURES OF ERLANGS TO CENSORED AND TRUNCATED DATA USING THE EM ALGORITHM

Roel Verbelen; Lan Gong; Katrien Antonio; Andrei Badescu; Sheldon Lin

doi:10.1017/asb.2015.15

FITTING MIXTURES OF ERLANGS TO CENSORED AND TRUNCATED DATA USING THE EM ALGORITHM

Astin Bulletin ◽

10.1017/asb.2015.15 ◽

2015 ◽

Vol 45 (3) ◽

pp. 729-758 ◽

Cited By ~ 30

Author(s):

Roel Verbelen ◽

Lan Gong ◽

Katrien Antonio ◽

Andrei Badescu ◽

Sheldon Lin

Keyword(s):

Em Algorithm ◽

Simulated Data ◽

Real Data ◽

Data Sets ◽

Truncated Data ◽

The Em Algorithm ◽

Loss Modeling ◽

Censored And Truncated Data

AbstractWe discuss how to fit mixtures of Erlangs to censored and truncated data by iteratively using the EM algorithm. Mixtures of Erlangs form a very versatile, yet analytically tractable, class of distributions making them suitable for loss modeling purposes. The effectiveness of the proposed algorithm is demonstrated on simulated data as well as real data sets.

Download Full-text

Fitting Mixtures of Erlangs to Censored and Truncated Data Using the EM Algorithm

SSRN Electronic Journal ◽

10.2139/ssrn.2381838 ◽

2014 ◽

Cited By ~ 1

Author(s):

Katrien Antonio ◽

Andrei L Badescu ◽

Gong Lan ◽

X. Sheldon Lin ◽

Verbelen Roel

Keyword(s):

Em Algorithm ◽

Truncated Data ◽

The Em Algorithm ◽

Censored And Truncated Data

Download Full-text

A Learning-Based EM Clustering for Circular Data with Unknown Number of Clusters

Proceedings of Engineering and Technology Innovation ◽

10.46604/peti.2020.5241 ◽

2020 ◽

Vol 15 ◽

pp. 42-51

Author(s):

Shou-Jen Chang-Chien ◽

Wajid Ali ◽

Miin-Shen Yang

Keyword(s):

Em Algorithm ◽

A Priori ◽

Real Data ◽

Circular Data ◽

Data Sets ◽

Number Of Clusters ◽

The Em Algorithm ◽

Von Mises ◽

Migrating Birds ◽

Em Clustering

Clustering is a method for analyzing grouped data. Circular data were well used in various applications, such as wind directions, departure directions of migrating birds or animals, etc. The expectation & maximization (EM) algorithm on mixtures of von Mises distributions is popularly used for clustering circular data. In general, the EM algorithm is sensitive to initials and not robust to outliers in which it is also necessary to give a number of clusters a priori. In this paper, we consider a learning-based schema for EM, and then propose a learning-based EM algorithm on mixtures of von Mises distributions for clustering grouped circular data. The proposed clustering method is without any initial and robust to outliers with automatically finding the number of clusters. Some numerical and real data sets are used to compare the proposed algorithm with existing methods. Experimental results and comparisons actually demonstrate these good aspects of effectiveness and superiority of the proposed learning-based EM algorithm.

Download Full-text

A Growth Model for Multilevel Ordinal Data

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986030004369 ◽

2005 ◽

Vol 30 (4) ◽

pp. 369-396 ◽

Cited By ~ 8

Author(s):

Eisuke Segawa

Keyword(s):

Latent Variable ◽

Ordinal Data ◽

Linear Models ◽

Growth Models ◽

Simulated Data ◽

Real Data ◽

Analytic Structure ◽

Data Sets ◽

Data Set ◽

Time Points

Multi-indicator growth models were formulated as special three-level hierarchical generalized linear models to analyze growth of a trait latent variable measured by ordinal items. Items are nested within a time-point, and time-points are nested within subject. These models are special because they include factor analytic structure. This model can analyze not only data with item- and time-level missing observations, but also data with time points freely specified over subjects. Furthermore, features useful for longitudinal analyses, “autoregressive error degree one” structure for the trait residuals and estimated time-scores, were included. The approach is Bayesian with Markov Chain and Monte Carlo, and the model is implemented in WinBUGS. They are illustrated with two simulated data sets and one real data set with planned missing items within a scale.

Download Full-text

Global and Componentwise Extrapolation for Accelerating Data Mining from Large Incomplete Data Sets with the EM Algorithm

Sixth International Conference on Data Mining (ICDM'06) ◽

10.1109/icdm.2006.77 ◽

2006 ◽

Cited By ~ 2

Author(s):

Chun-nan Hsu ◽

Han-shen Huang ◽

Bo-hou Yang

Keyword(s):

Data Mining ◽

Em Algorithm ◽

Incomplete Data ◽

Data Sets ◽

The Em Algorithm

Download Full-text

Using the EM algorithm to weight data sets of unknown precision when modelling fish stocks

Mathematical Biosciences ◽

10.1016/j.mbs.2004.03.001 ◽

2004 ◽

Vol 190 (1) ◽

pp. 1-7 ◽

Cited By ~ 4

Author(s):

A.J.R Cotter ◽

S.T Buckland

Keyword(s):

Em Algorithm ◽

Data Sets ◽

Fish Stocks ◽

The Em Algorithm ◽

Weight Data

Download Full-text

Estimation for the Rasch Model When Both Ability and Difficulty Parameters Are Random

Journal of Educational Statistics ◽

10.3102/10769986012001076 ◽

1987 ◽

Vol 12 (1) ◽

pp. 76-86 ◽

Cited By ~ 1

Author(s):

Steven E. Rigdon ◽

Robert K. Tsutakawa

Keyword(s):

Item Response ◽

Rasch Model ◽

Simulated Data ◽

Maximum Likelihood Estimates ◽

Data Sets ◽

Item Response Model ◽

Item Parameters ◽

The Em Algorithm ◽

Simulated Data Sets ◽

The Rasch Model

Estimation of the parameters of the Rasch model, a one-parameter item response model, is considered when both the item parameters and the ability parameters are considered random quantities. It is assumed that the item parameters are drawn from a N(γ, τ2) distribution, and the abilities are drawn from a N(0, σ2) distribution. A variation of the EM algorithm is used to find approximate maximum likelihood estimates of γ, τ, and σ. A second approach assumes that the difficulty parameters are drawn from a uniform distribution over part of the real line. Real and simulated data sets are discussed for illustration.

Download Full-text

Data mining in manufacturing: Significance analysis of process parameters

Proceedings of the Institution of Mechanical Engineers Part B Journal of Engineering Manufacture ◽

10.1243/09544054jem1182 ◽

2008 ◽

Vol 222 (11) ◽

pp. 1503-1516 ◽

Cited By ~ 18

Author(s):

M Perzyk ◽

R Biernacki ◽

J Kozlowski

Keyword(s):

Data Mining ◽

Neural Networks ◽

Process Parameters ◽

Regression Models ◽

Simulated Data ◽

Real Data ◽

Support Vector ◽

Data Sets ◽

Interaction Factors

Determination of the most significant manufacturing process parameters using collected past data can be very helpful in solving important industrial problems, such as the detection of root causes of deteriorating product quality, the selection of the most efficient parameters to control the process, and the prediction of breakdowns of machines, equipment, etc. A methodology of determination of relative significances of process variables and possible interactions between them, based on interrogations of generalized regression models, is proposed and tested. The performance of several types of data mining tool, such as artificial neural networks, support vector machines, regression trees, classification trees, and a naïve Bayesian classifier, is compared. Also, some simple non-parametric statistical methods, based on an analysis of variance (ANOVA) and contingency tables, are evaluated for comparison purposes. The tests were performed using simulated data sets, with assumed hidden relationships, as well as on real data collected in the foundry industry. It was found that the performance of significance and interaction factors obtained from regression models, and, in particular, neural networks, is satisfactory, while the other methods appeared to be less accurate and/or less reliable.

Download Full-text

ESTIMATION OF EXTREME QUANTILES: EMPIRICAL TOOLS FOR METHODS ASSESSMENT AND COMPARISON

International Journal of Reliability Quality and Safety Engineering ◽

10.1142/s0218539300000079 ◽

2000 ◽

Vol 07 (01) ◽

pp. 75-94 ◽

Cited By ~ 3

Author(s):

J. DIEBOLT ◽

M.-A. EL-AROUI ◽

V. DURBEC ◽

B. VILLAIN

Keyword(s):

Goodness Of Fit ◽

Simulated Data ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Extreme Quantiles ◽

Maintenance Policies ◽

Simulated Data Sets ◽

Industrial Context

When extreme quantiles have to be estimated from a given data set, the classical parametric approach can lead to very poor estimations. This has led to the introduction of specific methods for estimating extreme quantiles (MEEQ's) in a nonparametric spirit, e.g., Pickands excess method, methods based on Hill's estimate of the Pareto index, exponential tail (ET) and quadratic tail (QT) methods. However, no practical technique for assessing and comparing these MEEQ's when they are to be used on a given data set is available. This paper is a first attempt to provide such techniques. We first compare the estimations given by the main MEEQ's on several simulated data sets. Then we suggest goodness-of-fit (Gof) tests to assess the MEEQ's by measuring the quality of their underlying approximations. It is shown that Gof techniques bring very relevant tools to assess and compare ET and excess methods. Other empirical criterions for comparing MEEQ's are also proposed and studied through Monte-Carlo analyses. Finally, these assessment and comparison techniques are experimented on real-data sets issued from an industrial context where extreme quantiles are needed to define maintenance policies.

Download Full-text

Empirical threshold values for quantitative trait mapping.

Genetics ◽

10.1093/genetics/138.3.963 ◽

1994 ◽

Vol 138 (3) ◽

pp. 963-971 ◽

Cited By ~ 134

Author(s):

G A Churchill ◽

R W Doerge

Keyword(s):

Quantitative Trait ◽

Permutation Test ◽

Simulated Data ◽

Real Data ◽

Threshold Value ◽

Empirical Method ◽

Data Sets ◽

Threshold Values ◽

Quantitative Trait Mapping ◽

Qtl Effects

Abstract The detection of genes that control quantitative characters is a problem of great interest to the genetic mapping community. Methods for locating these quantitative trait loci (QTL) relative to maps of genetic markers are now widely used. This paper addresses an issue common to all QTL mapping methods, that of determining an appropriate threshold value for declaring significant QTL effects. An empirical method is described, based on the concept of a permutation test, for estimating threshold values that are tailored to the experimental data at hand. The method is demonstrated using two real data sets derived from F(2) and recombinant inbred plant populations. An example using simulated data from a backcross design illustrates the effect of marker density on threshold values.

Download Full-text

Improving the convergence rate of the em algorithm for a mixture model fitted to grouped truncated data

Journal of Statistical Computation and Simulation ◽

10.1080/00949659208811426 ◽

1992 ◽

Vol 43 (1-2) ◽

pp. 31-44 ◽

Cited By ~ 7

Author(s):

Peter N. Jones ◽

Mclachlan J. Geoffrey

Keyword(s):

Convergence Rate ◽

Em Algorithm ◽

Mixture Model ◽

Truncated Data ◽

The Em Algorithm

Download Full-text