scholarly journals FITTING MIXTURES OF ERLANGS TO CENSORED AND TRUNCATED DATA USING THE EM ALGORITHM

2015 ◽  
Vol 45 (3) ◽  
pp. 729-758 ◽  
Author(s):  
Roel Verbelen ◽  
Lan Gong ◽  
Katrien Antonio ◽  
Andrei Badescu ◽  
Sheldon Lin

AbstractWe discuss how to fit mixtures of Erlangs to censored and truncated data by iteratively using the EM algorithm. Mixtures of Erlangs form a very versatile, yet analytically tractable, class of distributions making them suitable for loss modeling purposes. The effectiveness of the proposed algorithm is demonstrated on simulated data as well as real data sets.

Author(s):  
Katrien Antonio ◽  
Andrei L Badescu ◽  
Gong Lan ◽  
X. Sheldon Lin ◽  
Verbelen Roel

2020 ◽  
Vol 15 ◽  
pp. 42-51
Author(s):  
Shou-Jen Chang-Chien ◽  
Wajid Ali ◽  
Miin-Shen Yang

Clustering is a method for analyzing grouped data. Circular data were well used in various applications, such as wind directions, departure directions of migrating birds or animals, etc. The expectation & maximization (EM) algorithm on mixtures of von Mises distributions is popularly used for clustering circular data. In general, the EM algorithm is sensitive to initials and not robust to outliers in which it is also necessary to give a number of clusters a priori. In this paper, we consider a learning-based schema for EM, and then propose a learning-based EM algorithm on mixtures of von Mises distributions for clustering grouped circular data. The proposed clustering method is without any initial and robust to outliers with automatically finding the number of clusters. Some numerical and real data sets are used to compare the proposed algorithm with existing methods. Experimental results and comparisons actually demonstrate these good aspects of effectiveness and superiority of the proposed learning-based EM algorithm.


2005 ◽  
Vol 30 (4) ◽  
pp. 369-396 ◽  
Author(s):  
Eisuke Segawa

Multi-indicator growth models were formulated as special three-level hierarchical generalized linear models to analyze growth of a trait latent variable measured by ordinal items. Items are nested within a time-point, and time-points are nested within subject. These models are special because they include factor analytic structure. This model can analyze not only data with item- and time-level missing observations, but also data with time points freely specified over subjects. Furthermore, features useful for longitudinal analyses, “autoregressive error degree one” structure for the trait residuals and estimated time-scores, were included. The approach is Bayesian with Markov Chain and Monte Carlo, and the model is implemented in WinBUGS. They are illustrated with two simulated data sets and one real data set with planned missing items within a scale.


1987 ◽  
Vol 12 (1) ◽  
pp. 76-86 ◽  
Author(s):  
Steven E. Rigdon ◽  
Robert K. Tsutakawa

Estimation of the parameters of the Rasch model, a one-parameter item response model, is considered when both the item parameters and the ability parameters are considered random quantities. It is assumed that the item parameters are drawn from a N(γ, τ2) distribution, and the abilities are drawn from a N(0, σ2) distribution. A variation of the EM algorithm is used to find approximate maximum likelihood estimates of γ, τ, and σ. A second approach assumes that the difficulty parameters are drawn from a uniform distribution over part of the real line. Real and simulated data sets are discussed for illustration.


Author(s):  
M Perzyk ◽  
R Biernacki ◽  
J Kozlowski

Determination of the most significant manufacturing process parameters using collected past data can be very helpful in solving important industrial problems, such as the detection of root causes of deteriorating product quality, the selection of the most efficient parameters to control the process, and the prediction of breakdowns of machines, equipment, etc. A methodology of determination of relative significances of process variables and possible interactions between them, based on interrogations of generalized regression models, is proposed and tested. The performance of several types of data mining tool, such as artificial neural networks, support vector machines, regression trees, classification trees, and a naïve Bayesian classifier, is compared. Also, some simple non-parametric statistical methods, based on an analysis of variance (ANOVA) and contingency tables, are evaluated for comparison purposes. The tests were performed using simulated data sets, with assumed hidden relationships, as well as on real data collected in the foundry industry. It was found that the performance of significance and interaction factors obtained from regression models, and, in particular, neural networks, is satisfactory, while the other methods appeared to be less accurate and/or less reliable.


Author(s):  
J. DIEBOLT ◽  
M.-A. EL-AROUI ◽  
V. DURBEC ◽  
B. VILLAIN

When extreme quantiles have to be estimated from a given data set, the classical parametric approach can lead to very poor estimations. This has led to the introduction of specific methods for estimating extreme quantiles (MEEQ's) in a nonparametric spirit, e.g., Pickands excess method, methods based on Hill's estimate of the Pareto index, exponential tail (ET) and quadratic tail (QT) methods. However, no practical technique for assessing and comparing these MEEQ's when they are to be used on a given data set is available. This paper is a first attempt to provide such techniques. We first compare the estimations given by the main MEEQ's on several simulated data sets. Then we suggest goodness-of-fit (Gof) tests to assess the MEEQ's by measuring the quality of their underlying approximations. It is shown that Gof techniques bring very relevant tools to assess and compare ET and excess methods. Other empirical criterions for comparing MEEQ's are also proposed and studied through Monte-Carlo analyses. Finally, these assessment and comparison techniques are experimented on real-data sets issued from an industrial context where extreme quantiles are needed to define maintenance policies.


Genetics ◽  
1994 ◽  
Vol 138 (3) ◽  
pp. 963-971 ◽  
Author(s):  
G A Churchill ◽  
R W Doerge

Abstract The detection of genes that control quantitative characters is a problem of great interest to the genetic mapping community. Methods for locating these quantitative trait loci (QTL) relative to maps of genetic markers are now widely used. This paper addresses an issue common to all QTL mapping methods, that of determining an appropriate threshold value for declaring significant QTL effects. An empirical method is described, based on the concept of a permutation test, for estimating threshold values that are tailored to the experimental data at hand. The method is demonstrated using two real data sets derived from F(2) and recombinant inbred plant populations. An example using simulated data from a backcross design illustrates the effect of marker density on threshold values.


Sign in / Sign up

Export Citation Format

Share Document