Estimating Latent Structure Models with Categorical Variables: One-Step Versus Three-Step Estimators

2004 ◽  
Vol 12 (1) ◽  
pp. 3-27 ◽  
Author(s):  
Annabel Bolck ◽  
Marcel Croon ◽  
Jacques Hagenaars

We study the properties of a three-step approach to estimating the parameters of a latent structure model for categorical data and propose a simple correction for a common source of bias. Such models have a measurement part (essentially the latent class model) and a structural (causal) part (essentially a system of logit equations). In the three-step approach, a stand-alone measurement model is first defined and its parameters are estimated. Individual predicted scores on the latent variables are then computed from the parameter estimates of the measurement model and the individual observed scoring patterns on the indicators. Finally, these predicted scores are used in the causal part and treated as observed variables. We show that such a naive use of predicted latent scores cannot be recommended since it leads to a systematic underestimation of the strength of the association among the variables in the structural part of the models. However, a simple correction procedure can eliminate this systematic bias. This approach is illustrated on simulated and real data. A method that uses multiple imputation to account for the fact that the predicted latent variables are random variables can produce standard errors for the parameters in the structural part of the model.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Gregoire Preud’homme ◽  
Kevin Duarte ◽  
Kevin Dalleau ◽  
Claire Lacomblez ◽  
Emmanuel Bresso ◽  
...  

AbstractThe choice of the most appropriate unsupervised machine-learning method for “heterogeneous” or “mixed” data, i.e. with both continuous and categorical variables, can be challenging. Our aim was to examine the performance of various clustering strategies for mixed data using both simulated and real-life data. We conducted a benchmark analysis of “ready-to-use” tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model [LCM] and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). Clustering methods were then applied on the EPHESUS randomized clinical trial data (a heart failure trial evaluating the effect of eplerenone) allowing to illustrate the differences between different clustering techniques. The simulations revealed the dominance of K-prototypes, Kamila and LCM models over all other methods. Overall, methods using dissimilarity matrices in classical algorithms such as Partitioning Around Medoids and Hierarchical Clustering had a lower ARI compared to model-based methods in all scenarios. When applying clustering methods to a real-life clinical dataset, LCM showed promising results with regard to differences in (1) clinical profiles across clusters, (2) prognostic performance (highest C-index) and (3) identification of patient subgroups with substantial treatment benefit. The present findings suggest key differences in clustering performance between the tested algorithms (limited to tools readily available in R). In most of the tested scenarios, model-based methods (in particular the Kamila and LCM packages) and K-prototypes typically performed best in the setting of heterogeneous data.


2017 ◽  
Vol 78 (6) ◽  
pp. 925-951 ◽  
Author(s):  
Unkyung No ◽  
Sehee Hong

The purpose of the present study is to compare performances of mixture modeling approaches (i.e., one-step approach, three-step maximum-likelihood approach, three-step BCH approach, and LTB approach) based on diverse sample size conditions. To carry out this research, two simulation studies were conducted with two different models, a latent class model with three predictor variables and a latent class model with one distal outcome variable. For the simulation, data were generated under the conditions of different sample sizes (100, 200, 300, 500, 1,000), entropy (0.6, 0.7, 0.8, 0.9), and the variance of a distal outcome (homoscedasticity, heteroscedasticity). For evaluation criteria, parameter estimates bias, standard error bias, mean squared error, and coverage were used. Results demonstrate that the three-step approaches produced more stable and better estimations than the other approaches even with a small sample size of 100. This research differs from previous studies in the sense that various models were used to compare the approaches and smaller sample size conditions were used. Furthermore, the results supporting the superiority of the three-step approaches even in poorly manipulated conditions indicate the advantage of these approaches.


2014 ◽  
Vol 22 (4) ◽  
pp. 520-540 ◽  
Author(s):  
Zsuzsa Bakk ◽  
Daniel L. Oberski ◽  
Jeroen K. Vermunt

Latent class analysis is used in the political science literature in both substantive applications and as a tool to estimate measurement error. Many studies in the social and political sciences relate estimated class assignments from a latent class model to external variables. Although common, such a “three-step” procedure effectively ignores classification error in the class assignments; Vermunt (2010, “Latent class modeling with covariates: Two improved three-step approaches,” Political Analysis 18:450–69) showed that this leads to inconsistent parameter estimates and proposed a correction. Although this correction for bias is now implemented in standard software, inconsistency is not the only consequence of classification error. We demonstrate that the correction method introduces an additional source of variance in the estimates, so that standard errors and confidence intervals are overly optimistic when not taking this into account. We derive the asymptotic variance of the third-step estimates of interest, as well as several candidate-corrected sample estimators of the standard errors. These corrected standard error estimators are evaluated using a Monte Carlo study, and we provide practical advice to researchers as to which should be used so that valid inferences can be obtained when relating estimated class membership to external variables.


2011 ◽  
Vol 19 (2) ◽  
pp. 173-187 ◽  
Author(s):  
Drew A. Linzer

Contingency tables are among the most basic and useful techniques available for analyzing categorical data, but they produce highly imprecise estimates in small samples or for population subgroups that arise following repeated stratification. I demonstrate that preprocessing an observed set of categorical variables using a latent class model can greatly improve the quality of table-based inferences. As a density estimator, the latent class model closely approximates the underlying joint distribution of the variables of interest, which enables reliable estimation of conditional probabilities and marginal effects, even among subgroups containing fewer than 40 observations. Though here focused on applications to public opinion, the procedure has a wide range of potential uses. I illustrate the benefits of the latent class model—based approach for greatly improved accuracy in estimating and forecasting vote preferences within small demographic subgroups using survey data from the 2004 and 2008 U.S. presidential election campaigns.


2020 ◽  
Author(s):  
Benjamin Deonovic ◽  
Timo Bechger ◽  
Gunter Maris

Learning and assessment are intrinsically linked. However, the research, tools, and statistical models used within the two fields differ greatly. This has created a disconnect: The goals and missions of educational institutions are codified in the language and ideas of learning, but evaluated, monitored, and administered with the tools of assessment. We propose a novel statistical model, the Master model, capable of being the engine behind a modern learning and assessment system. The Master model combines three key concepts from the assessment and learning literature from the past century: A learning model should be multidimensional and hierarchical and should incorporate learning progressions. The Master model is a multidimensional latent variable model, more specifically a latent class model, that not only ranks learners from best to worst but also provides detailed diagnostic feedback to tell learners what they know, and more importantly, what they don't know. By incorporating a hierarchical structure of the latent variables, the Master model reproduces the positive manifold, a phenomenon that continues to be replicated in assessment data where scores between cognitive tests correlate positively. Finally, expert and data-driven annotation can incorporate learning progressions directly into the latent variables. With these three key concepts, the Master model can track the estimate of a learner’s latent skills, track the efficacy of various educational resources such as videos, and recommend which resources the learner should next focus on in order to maximize their learning.


2011 ◽  
Vol 37 (1) ◽  
Author(s):  
Seretse Moyo ◽  
Callie Theron

Orientation: The Fifteen Factor Questionnaire Plus (15FQ+) is a prominent personality questionnaire that organisations frequently use in personnel selection in South Africa.Research purpose: The primary objective of this study was to undertake a factor analytic investigation of the first-order factor structure of the 15FQ+.Motivation for the study: The construct validity of the 15FQ+, as a measure of personality, is necessary even though it is insufficient to justify its use in personnel selection.Research design, approach and method: The researchers evaluated the fit of the measurement model, which the structure and scoring key of the 15FQ+ implies, in a quantitative study that used an ex post facto correlation design through structural equation modelling. They conducted a secondary data analysis. They selected a sample of 241 Black South African managers from a large 15FQ+ database.Main findings: The researchers found good measurement model fit. The measurement model parameter estimates were worrying. The magnitude of the estimated model parameters suggests that the items generally do not reflect the latent personality dimensions the designers intended them to with a great degree of precision. The items are reasonably noisy measures of the latent variables they represent.Practical/managerial implications: Organisations should use the 15FQ+ carefully on Black South African managers until further local research evidence becomes available.Contribution/value-add: The study is a catalyst to trigger the necessary additional research we need to establish convincingly the psychometric credentials of the 15FQ+ as a valuable assessment tool in South Africa.


Sign in / Sign up

Export Citation Format

Share Document