Probabilistic Modeling for Frequency Vectors Using a Flexible Shifted-Scaled Dirichlet Distribution Prior

2020 ◽  
Vol 14 (6) ◽  
pp. 1-35
Author(s):  
Nuha Zamzami ◽  
Nizar Bouguila
2021 ◽  
Author(s):  
Maike L Morrison ◽  
Nicolas Alcala ◽  
Noah A Rosenberg

In model-based inference of population structure from individual-level genetic data, individuals are assigned membership coefficients in a series of statistical clusters generated by clustering algorithms. Distinct patterns of variability in membership coefficients can be produced for different groups of individuals, for example, representing different predefined populations, sampling sites, or time periods. Such variability can be difficult to capture in a single numerical value; membership coefficient vectors are multivariate and potentially incommensurable across groups, as the number of clusters over which individuals are distributed can vary among groups of interest. Further, two groups might share few clusters in common, so that membership coefficient vectors are concentrated on different clusters. We introduce a method for measuring the variability of membership coefficients of individuals in a predefined group, making use of an analogy between variability across individuals in membership coefficient vectors and variation across populations in allele frequency vectors. We show that in a model in which membership coefficient vectors in a population follow a Dirichlet distribution, the measure increases linearly with a parameter describing the variance of a specified component of the membership vector. We apply the approach, which makes use of a normalized Fst statistic, to data on inferred population structure in three example scenarios. We also introduce a bootstrap test for equivalence of two or more groups in their level of membership coefficient variability. Our methods are implemented in the R package FSTruct.


Crop Science ◽  
1992 ◽  
Vol 32 (3) ◽  
pp. 704-712 ◽  
Author(s):  
Scott M. Lesch ◽  
Catherine M. Grieve ◽  
Eugene V. Maas ◽  
Leland E. Francois

2021 ◽  
Vol 58 (2) ◽  
pp. 314-334
Author(s):  
Man-Wai Ho ◽  
Lancelot F. James ◽  
John W. Lau

AbstractPitman (2003), and subsequently Gnedin and Pitman (2006), showed that a large class of random partitions of the integers derived from a stable subordinator of index $\alpha\in(0,1)$ have infinite Gibbs (product) structure as a characterizing feature. The most notable case are random partitions derived from the two-parameter Poisson–Dirichlet distribution, $\textrm{PD}(\alpha,\theta)$, whose corresponding $\alpha$-diversity/local time have generalized Mittag–Leffler distributions, denoted by $\textrm{ML}(\alpha,\theta)$. Our aim in this work is to provide indications on the utility of the wider class of Gibbs partitions as it relates to a study of Riemann–Liouville fractional integrals and size-biased sampling, and in decompositions of special functions, and its potential use in the understanding of various constructions of more exotic processes. We provide characterizations of general laws associated with nested families of $\textrm{PD}(\alpha,\theta)$ mass partitions that are constructed from fragmentation operations described in Dong et al. (2014). These operations are known to be related in distribution to various constructions of discrete random trees/graphs in [n], and their scaling limits. A centerpiece of our work is results related to Mittag–Leffler functions, which play a key role in fractional calculus and are otherwise Laplace transforms of the $\textrm{ML}(\alpha,\theta)$ variables. Notably, this leads to an interpretation within the context of $\textrm{PD}(\alpha,\theta)$ laws conditioned on Poisson point process counts over intervals of scaled lengths of the $\alpha$-diversity.


2020 ◽  
Vol 57 (4) ◽  
pp. 1029-1044
Author(s):  
Svante Janson

AbstractConsider a Pólya urn with balls of several colours, where balls are drawn sequentially and each drawn ball is immediately replaced together with a fixed number of balls of the same colour. It is well known that the proportions of balls of the different colours converge in distribution to a Dirichlet distribution. We show that the rate of convergence is $\Theta(1/n)$ in the minimal $L_p$ metric for any $p\in[1,\infty]$, extending a result by Goldstein and Reinert; we further show the same rate for the Lévy distance, while the rate for the Kolmogorov distance depends on the parameters, i.e. on the initial composition of the urn. The method used here differs from the one used by Goldstein and Reinert, and uses direct calculations based on the known exact distributions.


Genetics ◽  
2000 ◽  
Vol 155 (4) ◽  
pp. 1973-1980
Author(s):  
Jinko Graham ◽  
James Curran ◽  
B S Weir

Abstract Modern forensic DNA profiles are constructed using microsatellites, short tandem repeats of 2–5 bases. In the absence of genetic data on a crime-specific subpopulation, one tool for evaluating profile evidence is the match probability. The match probability is the conditional probability that a random person would have the profile of interest given that the suspect has it and that these people are different members of the same subpopulation. One issue in evaluating the match probability is population differentiation, which can induce coancestry among subpopulation members. Forensic assessments that ignore coancestry typically overstate the strength of evidence against the suspect. Theory has been developed to account for coancestry; assumptions include a steady-state population and a mutation model in which the allelic state after a mutation event is independent of the prior state. Under these assumptions, the joint allelic probabilities within a subpopulation may be approximated by the moments of a Dirichlet distribution. We investigate the adequacy of this approximation for profiled loci that mutate according to a generalized stepwise model. Simulations suggest that the Dirichlet theory can still overstate the evidence against a suspect with a common microsatellite genotype. However, Dirichlet-based estimators were less biased than the product-rule estimator, which ignores coancestry.


Sign in / Sign up

Export Citation Format

Share Document