scholarly journals A likelihood-free estimator of population structure bridging admixture models and principal components analysis

2017 ◽  
Author(s):  
Irineo Cabreros ◽  
John D. Storey

AbstractWe introduce a simple and computationally efficient method for fitting the admixture model of genetic population structure, called ALStructure. The strategy of ALStructure is to first estimate the low-dimensional linear subspace of the population admixture components and then search for a model within this subspace that is consistent with the admixture model’s natural probabilistic constraints. Central to this strategy is the observation that all models belonging to this constrained space of solutions are risk-minimizing and have equal likelihood, rendering any additional optimization unnecessary. The low-dimensional linear subspace is estimated through a recently introduced principal components analysis method that is appropriate for genotype data, thereby providing a solution that has both principal components and probabilistic admixture interpretations. Our approach differs fundamentally from other existing methods for estimating admixture, which aim to fit the admixture model directly by searching for parameters that maximize the likelihood function or the posterior probability. We observe that ALStructure typically outperforms existing methods both in accuracy and computational speed under a wide array of simulated and real human genotype datasets. Throughout this work we emphasize that the admixture model is a special case of a much broader class of models for which algorithms similar to ALStructure may be successfully employed.

Methodology ◽  
2013 ◽  
Vol 9 (1) ◽  
pp. 23-29 ◽  
Author(s):  
Gilles Raîche ◽  
Theodore A. Walls ◽  
David Magis ◽  
Martin Riopel ◽  
Jean-Guy Blais

Most of the strategies that have been proposed to determine the number of components that account for the most variation in a principal components analysis of a correlation matrix rely on the analysis of the eigenvalues and on numerical solutions. The Cattell’s scree test is a graphical strategy with a nonnumerical solution to determine the number of components to retain. Like Kaiser’s rule, this test is one of the most frequently used strategies for determining the number of components to retain. However, the graphical nature of the scree test does not definitively establish the number of components to retain. To circumvent this issue, some numerical solutions are proposed, one in the spirit of Cattell’s work and dealing with the scree part of the eigenvalues plot, and one focusing on the elbow part of this plot. A simulation study compares the efficiency of these solutions to those of other previously proposed methods. Extensions to factor analysis are possible and may be particularly useful with many low-dimensional components.


1980 ◽  
Vol 19 (04) ◽  
pp. 205-209
Author(s):  
L. A. Abbott ◽  
J. B. Mitton

Data taken from the blood of 262 patients diagnosed for malabsorption, elective cholecystectomy, acute cholecystitis, infectious hepatitis, liver cirrhosis, or chronic renal disease were analyzed with three numerical taxonomy (NT) methods : cluster analysis, principal components analysis, and discriminant function analysis. Principal components analysis revealed discrete clusters of patients suffering from chronic renal disease, liver cirrhosis, and infectious hepatitis, which could be displayed by NT clustering as well as by plotting, but other disease groups were poorly defined. Sharper resolution of the same disease groups was attained by discriminant function analysis.


Sign in / Sign up

Export Citation Format

Share Document