Reliable Inference in Highly Stratified Contingency Tables: Using Latent Class Models as Density Estimators

2011 ◽  
Vol 19 (2) ◽  
pp. 173-187 ◽  
Author(s):  
Drew A. Linzer

Contingency tables are among the most basic and useful techniques available for analyzing categorical data, but they produce highly imprecise estimates in small samples or for population subgroups that arise following repeated stratification. I demonstrate that preprocessing an observed set of categorical variables using a latent class model can greatly improve the quality of table-based inferences. As a density estimator, the latent class model closely approximates the underlying joint distribution of the variables of interest, which enables reliable estimation of conditional probabilities and marginal effects, even among subgroups containing fewer than 40 observations. Though here focused on applications to public opinion, the procedure has a wide range of potential uses. I illustrate the benefits of the latent class model—based approach for greatly improved accuracy in estimating and forecasting vote preferences within small demographic subgroups using survey data from the 2004 and 2008 U.S. presidential election campaigns.

2020 ◽  
Vol 29 (11) ◽  
pp. 3381-3395
Author(s):  
Wonmo Koo ◽  
Heeyoung Kim

Latent class models have been widely used in longitudinal studies to uncover unobserved heterogeneity in a population and find the characteristics of the latent classes simultaneously using the class allocation probabilities dependent on predictors. However, previous latent class models for longitudinal data suffer from uncertainty in the choice of the number of latent classes. In this study, we propose a Bayesian nonparametric latent class model for longitudinal data, which allows the number of latent classes to be inferred from the data. The proposed model is an infinite mixture model with predictor-dependent class allocation probabilities; an individual longitudinal trajectory is described by the class-specific linear mixed effects model. The model parameters are estimated using Markov chain Monte Carlo methods. The proposed model is validated using a simulated example and a real-data example for characterizing latent classes of estradiol trajectories over the menopausal transition using data from the Study of Women’s Health Across the Nation.


Methodology ◽  
2005 ◽  
Vol 1 (3) ◽  
pp. 93-103 ◽  
Author(s):  
Martin Schrepp

This paper tries to establish a connection between knowledge structures and latent class models. We will show that knowledge structures can be interpreted as a special type of constrained latent class model. Latent class models offer a well-founded theoretical framework to investigate the connection of a given latent class model to observed data. If we establish a connection between latent class models and knowledge structures, we can also use this framework in knowledge structure theory. We will show that the connection to latent class models offers us a possibility to construct a knowledge structure by exploratory data analysis from observed response patterns. Other possible applications are the empirical comparison of hypothetical knowledge structures and the statistical test of a given knowledge structure.


1987 ◽  
Vol 24 (3) ◽  
pp. 298-304
Author(s):  
Rajiv Grover

Only recently have latent class models been used effectively to analyze marketing data, though they have been popular for more than a decade in the social sciences. Most research reported in the literture does not include the standard errors of the estimates of the latent class model parameters. The author argues for the usefulness of standard errors while exploring for parsimonious models. He provides an approach to estimating standard errors of all parameters as estimated by the iterative proportional fitting algorithm of Goodman implemented in MLLSA.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Gregoire Preud’homme ◽  
Kevin Duarte ◽  
Kevin Dalleau ◽  
Claire Lacomblez ◽  
Emmanuel Bresso ◽  
...  

AbstractThe choice of the most appropriate unsupervised machine-learning method for “heterogeneous” or “mixed” data, i.e. with both continuous and categorical variables, can be challenging. Our aim was to examine the performance of various clustering strategies for mixed data using both simulated and real-life data. We conducted a benchmark analysis of “ready-to-use” tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model [LCM] and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). Clustering methods were then applied on the EPHESUS randomized clinical trial data (a heart failure trial evaluating the effect of eplerenone) allowing to illustrate the differences between different clustering techniques. The simulations revealed the dominance of K-prototypes, Kamila and LCM models over all other methods. Overall, methods using dissimilarity matrices in classical algorithms such as Partitioning Around Medoids and Hierarchical Clustering had a lower ARI compared to model-based methods in all scenarios. When applying clustering methods to a real-life clinical dataset, LCM showed promising results with regard to differences in (1) clinical profiles across clusters, (2) prognostic performance (highest C-index) and (3) identification of patient subgroups with substantial treatment benefit. The present findings suggest key differences in clustering performance between the tested algorithms (limited to tools readily available in R). In most of the tested scenarios, model-based methods (in particular the Kamila and LCM packages) and K-prototypes typically performed best in the setting of heterogeneous data.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Lian Lian ◽  
Shuo Zhang ◽  
Zhong Wang ◽  
Kai Liu ◽  
Lihuan Cao

As the parcel delivery service is booming in China, the competition among express companies intensifies. This paper employed multinomial logit model (MNL) and latent class model (LCM) to investigate customers’ express service choice behavior, using data from a SP survey. The attributes and attribute levels that matter most to express customers are identified. Meanwhile, the customers are divided into two segments (penny pincher segment and high-end segment) characterized by their taste heterogeneity. The results indicate that the LCM performs statistically better than MNL in our sample. Therefore, more attention should be paid to the taste heterogeneity, especially for further academic and policy research in freight choice behavior.


2017 ◽  
Vol 78 (6) ◽  
pp. 925-951 ◽  
Author(s):  
Unkyung No ◽  
Sehee Hong

The purpose of the present study is to compare performances of mixture modeling approaches (i.e., one-step approach, three-step maximum-likelihood approach, three-step BCH approach, and LTB approach) based on diverse sample size conditions. To carry out this research, two simulation studies were conducted with two different models, a latent class model with three predictor variables and a latent class model with one distal outcome variable. For the simulation, data were generated under the conditions of different sample sizes (100, 200, 300, 500, 1,000), entropy (0.6, 0.7, 0.8, 0.9), and the variance of a distal outcome (homoscedasticity, heteroscedasticity). For evaluation criteria, parameter estimates bias, standard error bias, mean squared error, and coverage were used. Results demonstrate that the three-step approaches produced more stable and better estimations than the other approaches even with a small sample size of 100. This research differs from previous studies in the sense that various models were used to compare the approaches and smaller sample size conditions were used. Furthermore, the results supporting the superiority of the three-step approaches even in poorly manipulated conditions indicate the advantage of these approaches.


Sign in / Sign up

Export Citation Format

Share Document