Reliable Inference in Highly Stratified Contingency Tables: Using Latent Class Models as Density Estimators

Drew A. Linzer

doi:10.1093/pan/mpr006

Reliable Inference in Highly Stratified Contingency Tables: Using Latent Class Models as Density Estimators

Political Analysis ◽

10.1093/pan/mpr006 ◽

2011 ◽

Vol 19 (2) ◽

pp. 173-187 ◽

Cited By ~ 5

Author(s):

Drew A. Linzer

Keyword(s):

Latent Class ◽

Latent Class Model ◽

Contingency Tables ◽

Latent Class Models ◽

Small Samples ◽

Categorical Variables ◽

Density Estimator ◽

Class Model ◽

Wide Range ◽

Demographic Subgroups

Contingency tables are among the most basic and useful techniques available for analyzing categorical data, but they produce highly imprecise estimates in small samples or for population subgroups that arise following repeated stratification. I demonstrate that preprocessing an observed set of categorical variables using a latent class model can greatly improve the quality of table-based inferences. As a density estimator, the latent class model closely approximates the underlying joint distribution of the variables of interest, which enables reliable estimation of conditional probabilities and marginal effects, even among subgroups containing fewer than 40 observations. Though here focused on applications to public opinion, the procedure has a wide range of potential uses. I illustrate the benefits of the latent class model—based approach for greatly improved accuracy in estimating and forecasting vote preferences within small demographic subgroups using survey data from the 2004 and 2008 U.S. presidential election campaigns.

Download Full-text

Bayesian nonparametric latent class model for longitudinal data

Statistical Methods in Medical Research ◽

10.1177/0962280220928384 ◽

2020 ◽

Vol 29 (11) ◽

pp. 3381-3395

Author(s):

Wonmo Koo ◽

Heeyoung Kim

Keyword(s):

Longitudinal Data ◽

Latent Class ◽

Latent Class Model ◽

Latent Class Models ◽

Model Parameters ◽

Latent Classes ◽

Bayesian Nonparametric ◽

Class Model ◽

Proposed Model ◽

Class Models

Latent class models have been widely used in longitudinal studies to uncover unobserved heterogeneity in a population and find the characteristics of the latent classes simultaneously using the class allocation probabilities dependent on predictors. However, previous latent class models for longitudinal data suffer from uncertainty in the choice of the number of latent classes. In this study, we propose a Bayesian nonparametric latent class model for longitudinal data, which allows the number of latent classes to be inferred from the data. The proposed model is an infinite mixture model with predictor-dependent class allocation probabilities; an individual longitudinal trajectory is described by the class-specific linear mixed effects model. The model parameters are estimated using Markov chain Monte Carlo methods. The proposed model is validated using a simulated example and a real-data example for characterizing latent classes of estradiol trajectories over the menopausal transition using data from the Study of Women’s Health Across the Nation.

Download Full-text

About the Connection Between Knowledge Structures and Latent Class Models

Methodology ◽

10.1027/1614-2241.1.3.93 ◽

2005 ◽

Vol 1 (3) ◽

pp. 93-103 ◽

Cited By ~ 15

Author(s):

Martin Schrepp

Keyword(s):

Latent Class ◽

Latent Class Model ◽

Knowledge Structure ◽

Latent Class Models ◽

Knowledge Structures ◽

Structure Theory ◽

Response Patterns ◽

Class Model ◽

Exploratory Data ◽

Class Models

This paper tries to establish a connection between knowledge structures and latent class models. We will show that knowledge structures can be interpreted as a special type of constrained latent class model. Latent class models offer a well-founded theoretical framework to investigate the connection of a given latent class model to observed data. If we establish a connection between latent class models and knowledge structures, we can also use this framework in knowledge structure theory. We will show that the connection to latent class models offers us a possibility to construct a knowledge structure by exploratory data analysis from observed response patterns. Other possible applications are the empirical comparison of hypothetical knowledge structures and the statistical test of a given knowledge structure.

Download Full-text

Estimation and Use of Standard Errors of Latent Class Model Parameters

Journal of Marketing Research ◽

10.1177/002224378702400306 ◽

1987 ◽

Vol 24 (3) ◽

pp. 298-304

Author(s):

Rajiv Grover

Keyword(s):

Latent Class ◽

Latent Class Model ◽

Latent Class Models ◽

Standard Errors ◽

Model Parameters ◽

Iterative Proportional Fitting ◽

Class Model ◽

Fitting Algorithm ◽

Marketing Data ◽

Parsimonious Models

Only recently have latent class models been used effectively to analyze marketing data, though they have been popular for more than a decade in the social sciences. Most research reported in the literture does not include the standard errors of the estimates of the latent class model parameters. The author argues for the usefulness of standard errors while exploring for parsimonious models. He provides an approach to estimating standard errors of all parameters as estimated by the iterative proportional fitting algorithm of Goodman implemented in MLLSA.

Download Full-text

A Study of Korean Consumer's Choice Behavior of Brand Using Latent Class Model - Competitive Structure Analysis of Acid Beverage Market -

Productivity Review ◽

10.15843/kpapr.21.4.200712.149 ◽

2007 ◽

Vol 21 (4) ◽

pp. 149-170 ◽

Cited By ~ 1

Author(s):

양진호 ◽

Hwang Yun Seop ◽

Kim,Chul

Keyword(s):

Structure Analysis ◽

Choice Behavior ◽

Latent Class ◽

Latent Class Model ◽

Class Model ◽

Competitive Structure

Download Full-text

Latent class model characterization of neighborhood socioeconomic status

Cancer Causes & Control ◽

10.1007/s10552-015-0711-4 ◽

2016 ◽

Vol 27 (3) ◽

pp. 445-452 ◽

Cited By ~ 8

Author(s):

Aimee Palumbo ◽

Yvonne Michael ◽

Terry Hyslop

Keyword(s):

Socioeconomic Status ◽

Latent Class ◽

Latent Class Model ◽

Class Model ◽

Neighborhood Socioeconomic Status

Download Full-text

Bayesian latent class model estimates of diagnostic accuracy for three test methods designed to detect spring viremia of carp virus

Preventive Veterinary Medicine ◽

10.1016/j.prevetmed.2021.105338 ◽

2021 ◽

pp. 105338

Author(s):

Sharon C. Clouthier ◽

Carol McClure ◽

Tamara Schroeder ◽

Eric D. Anderson

Keyword(s):

Diagnostic Accuracy ◽

Latent Class ◽

Latent Class Model ◽

Test Methods ◽

Class Model

Download Full-text

Head-to-head comparison of clustering methods for heterogeneous data: a simulation-driven benchmark

Scientific Reports ◽

10.1038/s41598-021-83340-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Gregoire Preud’homme ◽

Kevin Duarte ◽

Kevin Dalleau ◽

Claire Lacomblez ◽

Emmanuel Bresso ◽

...

Keyword(s):

Hierarchical Clustering ◽

Latent Class ◽

Latent Class Model ◽

Real Life ◽

Heterogeneous Data ◽

Mixed Data ◽

Categorical Variables ◽

Clustering Methods ◽

Model Based ◽

Partitioning Around Medoids

AbstractThe choice of the most appropriate unsupervised machine-learning method for “heterogeneous” or “mixed” data, i.e. with both continuous and categorical variables, can be challenging. Our aim was to examine the performance of various clustering strategies for mixed data using both simulated and real-life data. We conducted a benchmark analysis of “ready-to-use” tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model [LCM] and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). Clustering methods were then applied on the EPHESUS randomized clinical trial data (a heart failure trial evaluating the effect of eplerenone) allowing to illustrate the differences between different clustering techniques. The simulations revealed the dominance of K-prototypes, Kamila and LCM models over all other methods. Overall, methods using dissimilarity matrices in classical algorithms such as Partitioning Around Medoids and Hierarchical Clustering had a lower ARI compared to model-based methods in all scenarios. When applying clustering methods to a real-life clinical dataset, LCM showed promising results with regard to differences in (1) clinical profiles across clusters, (2) prognostic performance (highest C-index) and (3) identification of patient subgroups with substantial treatment benefit. The present findings suggest key differences in clustering performance between the tested algorithms (limited to tools readily available in R). In most of the tested scenarios, model-based methods (in particular the Kamila and LCM packages) and K-prototypes typically performed best in the setting of heterogeneous data.

Download Full-text

A Discussion of “Using Angler Characteristics and Attitudinal Data to Identify Environmental Preference Classes: A Latent-Class Model”

Environmental and Resource Economics ◽

10.1007/s10640-005-3793-8 ◽

2006 ◽

Vol 34 (1) ◽

pp. 117-124 ◽

Cited By ~ 29

Author(s):

Bill Provencher ◽

Rebecca Moore

Keyword(s):

Latent Class ◽

Latent Class Model ◽

Class Model ◽

Environmental Preference ◽

Attitudinal Data

Download Full-text

Customers’ Mode Choice Behaviors of Express Service Based on Latent Class Analysis and Logit Model

Mathematical Problems in Engineering ◽

10.1155/2015/610673 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Lian Lian ◽

Shuo Zhang ◽

Zhong Wang ◽

Kai Liu ◽

Lihuan Cao

Keyword(s):

Logit Model ◽

Choice Behavior ◽

Latent Class ◽

Latent Class Model ◽

Policy Research ◽

Multinomial Logit Model ◽

Class Model ◽

Taste Heterogeneity ◽

Using Data ◽

Service Choice

As the parcel delivery service is booming in China, the competition among express companies intensifies. This paper employed multinomial logit model (MNL) and latent class model (LCM) to investigate customers’ express service choice behavior, using data from a SP survey. The attributes and attribute levels that matter most to express customers are identified. Meanwhile, the customers are divided into two segments (penny pincher segment and high-end segment) characterized by their taste heterogeneity. The results indicate that the LCM performs statistically better than MNL in our sample. Therefore, more attention should be paid to the taste heterogeneity, especially for further academic and policy research in freight choice behavior.

Download Full-text

A Comparison of Mixture Modeling Approaches in Latent Class Models With External Variables Under Small Samples

Educational and Psychological Measurement ◽

10.1177/0013164417726828 ◽

2017 ◽

Vol 78 (6) ◽

pp. 925-951 ◽

Cited By ~ 3

Author(s):

Unkyung No ◽

Sehee Hong

Keyword(s):

Sample Size ◽

Latent Class ◽

Latent Class Model ◽

Mixture Modeling ◽

Small Sample ◽

Outcome Variable ◽

Parameter Estimates ◽

Class Model ◽

Modeling Approaches ◽

Distal Outcome

The purpose of the present study is to compare performances of mixture modeling approaches (i.e., one-step approach, three-step maximum-likelihood approach, three-step BCH approach, and LTB approach) based on diverse sample size conditions. To carry out this research, two simulation studies were conducted with two different models, a latent class model with three predictor variables and a latent class model with one distal outcome variable. For the simulation, data were generated under the conditions of different sample sizes (100, 200, 300, 500, 1,000), entropy (0.6, 0.7, 0.8, 0.9), and the variance of a distal outcome (homoscedasticity, heteroscedasticity). For evaluation criteria, parameter estimates bias, standard error bias, mean squared error, and coverage were used. Results demonstrate that the three-step approaches produced more stable and better estimations than the other approaches even with a small sample size of 100. This research differs from previous studies in the sense that various models were used to compare the approaches and smaller sample size conditions were used. Furthermore, the results supporting the superiority of the three-step approaches even in poorly manipulated conditions indicate the advantage of these approaches.

Download Full-text