Estimating Latent Structure Models with Categorical Variables: One-Step Versus Three-Step Estimators

Annabel Bolck; Marcel Croon; Jacques Hagenaars

doi:10.1093/pan/mph001

Estimating Latent Structure Models with Categorical Variables: One-Step Versus Three-Step Estimators

Political Analysis ◽

10.1093/pan/mph001 ◽

2004 ◽

Vol 12 (1) ◽

pp. 3-27 ◽

Cited By ~ 300

Author(s):

Annabel Bolck ◽

Marcel Croon ◽

Jacques Hagenaars

Keyword(s):

Latent Variables ◽

Latent Class ◽

Latent Class Model ◽

Measurement Model ◽

Latent Structure ◽

Categorical Variables ◽

Parameter Estimates ◽

Common Source ◽

Systematic Bias ◽

Structural Part

We study the properties of a three-step approach to estimating the parameters of a latent structure model for categorical data and propose a simple correction for a common source of bias. Such models have a measurement part (essentially the latent class model) and a structural (causal) part (essentially a system of logit equations). In the three-step approach, a stand-alone measurement model is first defined and its parameters are estimated. Individual predicted scores on the latent variables are then computed from the parameter estimates of the measurement model and the individual observed scoring patterns on the indicators. Finally, these predicted scores are used in the causal part and treated as observed variables. We show that such a naive use of predicted latent scores cannot be recommended since it leads to a systematic underestimation of the strength of the association among the variables in the structural part of the models. However, a simple correction procedure can eliminate this systematic bias. This approach is illustrated on simulated and real data. A method that uses multiple imputation to account for the fact that the predicted latent variables are random variables can produce standard errors for the parameters in the structural part of the model.

Download Full-text

Head-to-head comparison of clustering methods for heterogeneous data: a simulation-driven benchmark

Scientific Reports ◽

10.1038/s41598-021-83340-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Gregoire Preud’homme ◽

Kevin Duarte ◽

Kevin Dalleau ◽

Claire Lacomblez ◽

Emmanuel Bresso ◽

...

Keyword(s):

Hierarchical Clustering ◽

Latent Class ◽

Latent Class Model ◽

Real Life ◽

Heterogeneous Data ◽

Mixed Data ◽

Categorical Variables ◽

Clustering Methods ◽

Model Based ◽

Partitioning Around Medoids

AbstractThe choice of the most appropriate unsupervised machine-learning method for “heterogeneous” or “mixed” data, i.e. with both continuous and categorical variables, can be challenging. Our aim was to examine the performance of various clustering strategies for mixed data using both simulated and real-life data. We conducted a benchmark analysis of “ready-to-use” tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model [LCM] and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). Clustering methods were then applied on the EPHESUS randomized clinical trial data (a heart failure trial evaluating the effect of eplerenone) allowing to illustrate the differences between different clustering techniques. The simulations revealed the dominance of K-prototypes, Kamila and LCM models over all other methods. Overall, methods using dissimilarity matrices in classical algorithms such as Partitioning Around Medoids and Hierarchical Clustering had a lower ARI compared to model-based methods in all scenarios. When applying clustering methods to a real-life clinical dataset, LCM showed promising results with regard to differences in (1) clinical profiles across clusters, (2) prognostic performance (highest C-index) and (3) identification of patient subgroups with substantial treatment benefit. The present findings suggest key differences in clustering performance between the tested algorithms (limited to tools readily available in R). In most of the tested scenarios, model-based methods (in particular the Kamila and LCM packages) and K-prototypes typically performed best in the setting of heterogeneous data.

Download Full-text

A Comparison of Mixture Modeling Approaches in Latent Class Models With External Variables Under Small Samples

Educational and Psychological Measurement ◽

10.1177/0013164417726828 ◽

2017 ◽

Vol 78 (6) ◽

pp. 925-951 ◽

Cited By ~ 3

Author(s):

Unkyung No ◽

Sehee Hong

Keyword(s):

Sample Size ◽

Latent Class ◽

Latent Class Model ◽

Mixture Modeling ◽

Small Sample ◽

Outcome Variable ◽

Parameter Estimates ◽

Class Model ◽

Modeling Approaches ◽

Distal Outcome

The purpose of the present study is to compare performances of mixture modeling approaches (i.e., one-step approach, three-step maximum-likelihood approach, three-step BCH approach, and LTB approach) based on diverse sample size conditions. To carry out this research, two simulation studies were conducted with two different models, a latent class model with three predictor variables and a latent class model with one distal outcome variable. For the simulation, data were generated under the conditions of different sample sizes (100, 200, 300, 500, 1,000), entropy (0.6, 0.7, 0.8, 0.9), and the variance of a distal outcome (homoscedasticity, heteroscedasticity). For evaluation criteria, parameter estimates bias, standard error bias, mean squared error, and coverage were used. Results demonstrate that the three-step approaches produced more stable and better estimations than the other approaches even with a small sample size of 100. This research differs from previous studies in the sense that various models were used to compare the approaches and smaller sample size conditions were used. Furthermore, the results supporting the superiority of the three-step approaches even in poorly manipulated conditions indicate the advantage of these approaches.

Download Full-text

Relating Latent Class Assignments to External Variables: Standard Errors for Correct Inference

Political Analysis ◽

10.1093/pan/mpu003 ◽

2014 ◽

Vol 22 (4) ◽

pp. 520-540 ◽

Cited By ~ 44

Author(s):

Zsuzsa Bakk ◽

Daniel L. Oberski ◽

Jeroen K. Vermunt

Keyword(s):

Latent Class ◽

Asymptotic Variance ◽

Latent Class Model ◽

Monte Carlo Study ◽

Correction Method ◽

Standard Errors ◽

Classification Error ◽

Parameter Estimates ◽

Science Literature ◽

External Variables

Latent class analysis is used in the political science literature in both substantive applications and as a tool to estimate measurement error. Many studies in the social and political sciences relate estimated class assignments from a latent class model to external variables. Although common, such a “three-step” procedure effectively ignores classification error in the class assignments; Vermunt (2010, “Latent class modeling with covariates: Two improved three-step approaches,” Political Analysis 18:450–69) showed that this leads to inconsistent parameter estimates and proposed a correction. Although this correction for bias is now implemented in standard software, inconsistency is not the only consequence of classification error. We demonstrate that the correction method introduces an additional source of variance in the estimates, so that standard errors and confidence intervals are overly optimistic when not taking this into account. We derive the asymptotic variance of the third-step estimates of interest, as well as several candidate-corrected sample estimators of the standard errors. These corrected standard error estimators are evaluated using a Monte Carlo study, and we provide practical advice to researchers as to which should be used so that valid inferences can be obtained when relating estimated class membership to external variables.

Download Full-text

The Latent Class Model as a Measurement Model for Situational Judgment Tests

Psychologica Belgica ◽

10.5334/pb-51-3-4-197 ◽

2011 ◽

Vol 51 (3-4) ◽

pp. 197 ◽

Cited By ~ 3

Author(s):

Frank Rijmen

Keyword(s):

Latent Class ◽

Latent Class Model ◽

Measurement Model ◽

Class Model ◽

Situational Judgment ◽

Situational Judgment Tests

Download Full-text

Reliable Inference in Highly Stratified Contingency Tables: Using Latent Class Models as Density Estimators

Political Analysis ◽

10.1093/pan/mpr006 ◽

2011 ◽

Vol 19 (2) ◽

pp. 173-187 ◽

Cited By ~ 5

Author(s):

Drew A. Linzer

Keyword(s):

Latent Class ◽

Latent Class Model ◽

Contingency Tables ◽

Latent Class Models ◽

Small Samples ◽

Categorical Variables ◽

Density Estimator ◽

Class Model ◽

Wide Range ◽

Demographic Subgroups

Contingency tables are among the most basic and useful techniques available for analyzing categorical data, but they produce highly imprecise estimates in small samples or for population subgroups that arise following repeated stratification. I demonstrate that preprocessing an observed set of categorical variables using a latent class model can greatly improve the quality of table-based inferences. As a density estimator, the latent class model closely approximates the underlying joint distribution of the variables of interest, which enables reliable estimation of conditional probabilities and marginal effects, even among subgroups containing fewer than 40 observations. Though here focused on applications to public opinion, the procedure has a wide range of potential uses. I illustrate the benefits of the latent class model—based approach for greatly improved accuracy in estimating and forecasting vote preferences within small demographic subgroups using survey data from the 2004 and 2008 U.S. presidential election campaigns.

Download Full-text

Latent class model with two latent variables for analysis of count data

Compstat 2006 - Proceedings in Computational Statistics ◽

10.1007/978-3-7908-1709-6_31 ◽

2007 ◽

pp. 395-399

Author(s):

Kazunori Yamaguchi ◽

Naoko Sakurai ◽

Michiko Watanabe

Keyword(s):

Count Data ◽

Latent Variables ◽

Latent Class ◽

Latent Class Model ◽

Class Model

Download Full-text

The Ising on the Tree: The Master Model for Learning, Assessment, and Navigation

10.31234/osf.io/t65wa ◽

2020 ◽

Author(s):

Benjamin Deonovic ◽

Timo Bechger ◽

Gunter Maris

Keyword(s):

Latent Variables ◽

Latent Variable ◽

Latent Class ◽

Latent Class Model ◽

Assessment System ◽

Past Century ◽

Learning Progressions ◽

Variable Model ◽

Key Concepts ◽

Master Model

Learning and assessment are intrinsically linked. However, the research, tools, and statistical models used within the two fields differ greatly. This has created a disconnect: The goals and missions of educational institutions are codified in the language and ideas of learning, but evaluated, monitored, and administered with the tools of assessment. We propose a novel statistical model, the Master model, capable of being the engine behind a modern learning and assessment system. The Master model combines three key concepts from the assessment and learning literature from the past century: A learning model should be multidimensional and hierarchical and should incorporate learning progressions. The Master model is a multidimensional latent variable model, more specifically a latent class model, that not only ranks learners from best to worst but also provides detailed diagnostic feedback to tell learners what they know, and more importantly, what they don't know. By incorporating a hierarchical structure of the latent variables, the Master model reproduces the positive manifold, a phenomenon that continues to be replicated in assessment data where scores between cognitive tests correlate positively. Finally, expert and data-driven annotation can incorporate learning progressions directly into the latent variables. With these three key concepts, the Master model can track the estimate of a learner’s latent skills, track the efficacy of various educational resources such as videos, and recommend which resources the learner should next focus on in order to maximize their learning.

Download Full-text

A general solution for the latent class model of latent structure analysis

Psychometrika ◽

10.1007/bf02289112 ◽

1951 ◽

Vol 16 (2) ◽

pp. 151-166 ◽

Cited By ~ 59

Author(s):

Bert F. Green

Keyword(s):

General Solution ◽

Structure Analysis ◽

Latent Class ◽

Latent Class Model ◽

Latent Structure ◽

Class Model ◽

Latent Structure Analysis

Download Full-text

A GENERAL SOLUTION FOR THE LATENT CLASS MODEL OF LATENT STRUCTURE ANALYSIS*

ETS Research Bulletin Series ◽

10.1002/j.2333-8504.1950.tb00917.x ◽

1950 ◽

Vol 1950 (2) ◽

pp. i-14 ◽

Cited By ~ 4

Author(s):

Bert F. Green

Keyword(s):

General Solution ◽

Structure Analysis ◽

Latent Class ◽

Latent Class Model ◽

Latent Structure ◽

Class Model ◽

Latent Structure Analysis

Download Full-text

A preliminary factor analytic investigation into the firstorder factor structure of the Fifteen Factor Plus (15FQ+) on a sample of Black South African managers

SA Journal of Industrial Psychology ◽

10.4102/sajip.v37i1.934 ◽

2011 ◽

Vol 37 (1) ◽

Cited By ~ 5

Author(s):

Seretse Moyo ◽

Callie Theron

Keyword(s):

South African ◽

Factor Structure ◽

Latent Variables ◽

Measurement Model ◽

Parameter Estimates ◽

Model Parameters ◽

Analytic Investigation ◽

Black South African ◽

Factor Analytic ◽

Ex Post

Orientation: The Fifteen Factor Questionnaire Plus (15FQ+) is a prominent personality questionnaire that organisations frequently use in personnel selection in South Africa.Research purpose: The primary objective of this study was to undertake a factor analytic investigation of the first-order factor structure of the 15FQ+.Motivation for the study: The construct validity of the 15FQ+, as a measure of personality, is necessary even though it is insufficient to justify its use in personnel selection.Research design, approach and method: The researchers evaluated the fit of the measurement model, which the structure and scoring key of the 15FQ+ implies, in a quantitative study that used an ex post facto correlation design through structural equation modelling. They conducted a secondary data analysis. They selected a sample of 241 Black South African managers from a large 15FQ+ database.Main findings: The researchers found good measurement model fit. The measurement model parameter estimates were worrying. The magnitude of the estimated model parameters suggests that the items generally do not reflect the latent personality dimensions the designers intended them to with a great degree of precision. The items are reasonably noisy measures of the latent variables they represent.Practical/managerial implications: Organisations should use the 15FQ+ carefully on Black South African managers until further local research evidence becomes available.Contribution/value-add: The study is a catalyst to trigger the necessary additional research we need to establish convincingly the psychometric credentials of the 15FQ+ as a valuable assessment tool in South Africa.

Download Full-text