The Gibbs and split–merge sampler for population mixture analysis from genetic data with incomplete baselines

Jerome Pella; Michele Masuda

doi:10.1139/f05-224

The Gibbs and splitmerge sampler for population mixture analysis from genetic data with incomplete baselines

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f05-224 ◽

2006 ◽

Vol 63 (3) ◽

pp. 576-596 ◽

Cited By ~ 65

Author(s):

Jerome Pella ◽

Michele Masuda

Keyword(s):

Monte Carlo ◽

Binary Tree ◽

Quantitative Measure ◽

Real Data ◽

Genetic Data ◽

Superior Performance ◽

Data Sets ◽

Linkage Equilibrium ◽

Equilibrium Conditions ◽

Mixture Sample

Although population mixtures often include contributions from novel populations as well as from baseline populations previously sampled, unlabeled mixture individuals can be separated to their sources from genetic data. A Gibbs and splitmerge Markov chain Monte Carlo sampler is described for successively partitioning a genetic mixture sample into plausible subsets of individuals from each of the baseline and extra-baseline populations present. The subsets are selected to satisfy the HardyWeinberg and linkage equilibrium conditions expected for large, panmictic populations. The number of populations present can be inferred from the distribution for counts of subsets per partition drawn by the sampler. To further summarize the sampler's output, co-assignment probabilities of mixture individuals to the same subsets are computed from the partitions and are used to construct a binary tree of their relatedness. The tree graphically displays the clusters of mixture individuals together with a quantitative measure of the evidence supporting their various separate and common sources. The methodology is applied to several simulated and real data sets to illustrate its use and demonstrate the sampler's superior performance.

Download Full-text

A New Extension of Thinning-Based Integer-Valued Autoregressive Models for Count Data

Entropy ◽

10.3390/e23010062 ◽

2020 ◽

Vol 23 (1) ◽

pp. 62

Author(s):

Zhengwei Liu ◽

Fukang Zhu

Keyword(s):

Likelihood Estimation ◽

Real Data ◽

Autoregressive Models ◽

Superior Performance ◽

Data Sets ◽

Binomial Thinning ◽

Free Case ◽

Two Parameters ◽

Conditional Maximum ◽

Thinning Operator

The thinning operators play an important role in the analysis of integer-valued autoregressive models, and the most widely used is the binomial thinning. Inspired by the theory about extended Pascal triangles, a new thinning operator named extended binomial is introduced, which is a general case of the binomial thinning. Compared to the binomial thinning operator, the extended binomial thinning operator has two parameters and is more flexible in modeling. Based on the proposed operator, a new integer-valued autoregressive model is introduced, which can accurately and flexibly capture the dispersed features of counting time series. Two-step conditional least squares (CLS) estimation is investigated for the innovation-free case and the conditional maximum likelihood estimation is also discussed. We have also obtained the asymptotic property of the two-step CLS estimator. Finally, three overdispersed or underdispersed real data sets are considered to illustrate a superior performance of the proposed model.

Download Full-text

Markov Chain Monte Carlo Methods for State-Space Models with Point Process Observations

Neural Computation ◽

10.1162/neco_a_00281 ◽

2012 ◽

Vol 24 (6) ◽

pp. 1462-1486 ◽

Cited By ~ 10

Author(s):

Ke Yuan ◽

Mark Girolami ◽

Mahesan Niranjan

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

State Space ◽

Point Process ◽

Large Data ◽

State Space Models ◽

Superior Performance ◽

Mcmc Methods ◽

Data Sets

This letter considers how a number of modern Markov chain Monte Carlo (MCMC) methods can be applied for parameter estimation and inference in state-space models with point process observations. We quantified the efficiencies of these MCMC methods on synthetic data, and our results suggest that the Reimannian manifold Hamiltonian Monte Carlo method offers the best performance. We further compared such a method with a previously tested variational Bayes method on two experimental data sets. Results indicate similar performance on the large data sets and superior performance on small ones. The work offers an extensive suite of MCMC algorithms evaluated on an important class of models for physiological signal analysis.

Download Full-text

No-U-Turn sampling for phylogenetic trees

10.1101/2021.03.16.435623 ◽

2021 ◽

Author(s):

Johannes Wahle

Keyword(s):

Monte Carlo ◽

Phylogenetic Trees ◽

Sequence Data ◽

Real Data ◽

Correlated Data ◽

Data Sets ◽

Acceptance Probability ◽

New States ◽

Gradient Based ◽

Efficient Exploration

The inference of phylogenetic trees from sequence data has become a staple in evolutionary research. Bayesian inference of such trees is predominantly based on the Metropolis-Hastings algorithm. For high dimensional and correlated data this algorithm is known to be inefficient. There are gradient based algorithms to speed up such inference. Building on recent research which uses gradient based approaches for the inference of phylogenetic trees in a Bayesian framework, I present an algorithm which is capable of performing No-U-Turn sampling for phylogenetic trees. As an extension to Hamiltonian Monte Carlo methods, No-U-Turn sampling comes with the same benefits, such as proposing distant new states with a high acceptance probability, but eliminates the need to manually tune hyper parameters. Evaluated on real data sets, the new sampler shows that it converges faster to the target distribution. The results also indicate that a higher number of topologies are traversed during sampling by the new algorithm in comparison to traditional Markov Chain Monte Carlo approaches. This new algorithm leads to a more efficient exploration of the posterior distribution of phylogenetic tree topologies.

Download Full-text

A multiresolution framework to characterize single-cell state landscapes

Nature Communications ◽

10.1038/s41467-020-18416-6 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 1

Author(s):

Shahin Mohammadi ◽

Jose Davila-Velderrain ◽

Manolis Kellis

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Real Data ◽

Cell Types ◽

Cellular Heterogeneity ◽

Superior Performance ◽

Data Sets ◽

Structural Representation ◽

Archetypal Analysis ◽

Cell State

Abstract Dissecting the cellular heterogeneity embedded in single-cell transcriptomic data is challenging. Although many methods and approaches exist, identifying cell states and their underlying topology is still a major challenge. Here, we introduce the concept of multiresolution cell-state decomposition as a practical approach to simultaneously capture both fine- and coarse-grain patterns of variability. We implement this concept in ACTIONet, a comprehensive framework that combines archetypal analysis and manifold learning to provide a ready-to-use analytical approach for multiresolution single-cell state characterization. ACTIONet provides a robust, reproducible, and highly interpretable single-cell analysis platform that couples dominant pattern discovery with a corresponding structural representation of the cell state landscape. Using multiple synthetic and real data sets, we demonstrate ACTIONet’s superior performance relative to existing alternatives. We use ACTIONet to integrate and annotate cells across three human cortex data sets. Through integrative comparative analysis, we define a consensus vocabulary and a consistent set of gene signatures discriminating against the transcriptomic cell types and subtypes of the human prefrontal cortex.

Download Full-text

Bayesian Inference For The Segmented Weibull Distribution

Revista Colombiana de Estadística ◽

10.15446/rce.v42n2.76815 ◽

2019 ◽

Vol 42 (2) ◽

pp. 225-243

Author(s):

Emilio A. Coelho-Barros ◽

Jorge A. Achcar ◽

Edson Z. Martinez ◽

Nasser Davarzani ◽

Heike I. Grabsch

Keyword(s):

Monte Carlo ◽

Survival Data ◽

Real Data ◽

Good Alternative ◽

Change Points ◽

Data Sets ◽

Survival Times ◽

Clinical Value ◽

Proposed Model ◽

Censored Observations

In this paper, we introduce a Bayesian approach for segmented Weibull distributions which could be a good alternative to analyze medical survival data in the presence of censored observations and covariates. With the obtained Bayesian estimated change-points we could get an excellent fit of the proposed model to any data sets. With the proposed methodology, it is also possible to identify survival times intervals where a covariate could have significantly different efects when compared to other lifetime intervals, an important point under a clinical view. The obtained Bayesian estimates are obtained using standard Markov Chain Monte Carlo methods. Some examples with real data sets illustrate the proposed methodology and its potential clinical value.

Download Full-text

Jackknife model averaging prediction methods for complex phenotypes with gene expression levels by integrating external pathway information

10.1101/447706 ◽

2018 ◽

Author(s):

Xinghao Yu ◽

Lishun Xiao ◽

Ping Zeng ◽

Shuiping Huang

Keyword(s):

Predictive Accuracy ◽

Disease Risk ◽

Model Fitting ◽

Model Averaging ◽

Real Data ◽

Genetic Data ◽

Model Specification ◽

High Dimensional ◽

Data Sets ◽

Pathway Information

AbstractMotivationIn the past few years many novel prediction approaches have been proposed and widely employed in high dimensional genetic data for disease risk evaluation. However, those approaches typically ignore in model fitting the important group structures or functional classifications that naturally exists in genetic data.MethodsIn the present study, we applied a novel model averaging approach, called Jackknife Model Averaging Prediction (JMAP), for high dimensional genetic risk prediction while incorporating KEGG pathway information into the model specification. JMAP selects the optimal weights across candidate models by minimizing a cross-validation criterion in a jackknife way. Compared with previous approaches, one of the primary features of JMAP is to allow model weights to vary from 0 to 1 but without the limitation that the summation of weights is equal to one. We evaluated the performance of JMAP using extensive simulation studies and compared it with existing methods. We finally applied JMAP to five real cancer datasets that are publicly available from TCGA.ResultsThe simulations showed that, compared with other existing approaches, JMAP performed best or are among the best methods across a range of scenarios. For example, among 14 out of 16 simulation settings with PVE=0.3, JMAP has an average of 0.075 higher prediction accuracy compared with gsslasso. We further found that in the simulation the model weights for the true candidate models have much smaller chances to be zero compared with those for the null candidate models and are substantially greater in magnitude. In the real data application, JMAP also behaves comparably or better compared with the other methods for both continuous and binary phenotypes. For example, for the COAD, CRC and PAAD data sets, the average gains of predictive accuracy of JMAP are 0.019, 0.064 and 0.052 compared with gsslasso.ConclusionThe proposed method JMAP is a novel method that can provide more accurate phenotypic prediction while incorporating external useful group information.

Download Full-text

Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms

Sensors ◽

10.3390/s19235335 ◽

2019 ◽

Vol 19 (23) ◽

pp. 5335 ◽

Cited By ~ 1

Author(s):

Wei Fang ◽

Dongxu Wei ◽

Ran Zhang

Keyword(s):

Error Bounds ◽

Rapid Development ◽

Principal Component ◽

Real Data ◽

Superior Performance ◽

Sensor Technology ◽

Data Sets ◽

Tensor Factorization ◽

Principal Component Pursuit ◽

Tensor Data

The rapid development of sensor technology gives rise to the emergence of huge amounts of tensor (i.e., multi-dimensional array) data. For various reasons such as sensor failures and communication loss, the tensor data may be corrupted by not only small noises but also gross corruptions. This paper studies the Stable Tensor Principal Component Pursuit (STPCP) which aims to recover a tensor from its corrupted observations. Specifically, we propose a STPCP model based on the recently proposed tubal nuclear norm (TNN) which has shown superior performance in comparison with other tensor nuclear norms. Theoretically, we rigorously prove that under tensor incoherence conditions, the underlying tensor and the sparse corruption tensor can be stably recovered. Algorithmically, we first develop an ADMM algorithm and then accelerate it by designing a new algorithm based on orthogonal tensor factorization. The superiority and efficiency of the proposed algorithms is demonstrated through experiments on both synthetic and real data sets.

Download Full-text

The Generalized Transmuted Poisson-G Family of Distributions: Theory, Characterizations and Applications

Pakistan Journal of Statistics and Operation Research ◽

10.18187/pjsor.v14i4.2527 ◽

2018 ◽

pp. 759-779 ◽

Cited By ~ 3

Author(s):

Haitham Yousof ◽

Ahmed Z Afify ◽

Morad Alizadeh ◽

G. G. Hamedani ◽

S. Jahanshahi ◽

...

Keyword(s):

Monte Carlo ◽

Monte Carlo Simulations ◽

Order Statistics ◽

Real Data ◽

Model Parameters ◽

Data Sets ◽

Continuous Distributions ◽

New Class ◽

Mathematical Properties ◽

Family Of Distributions

In this work, we introduce a new class of continuous distributions called the generalized poissonfamily which extends the quadratic rank transmutation map. We provide some special models for thenew family. Some of its mathematical properties including RÃ©nyi and q-entropies, order statistics andcharacterizations are derived. The estimations of the model parameters is performed by maximumlikelihood method. The Monte Carlo simulations is used for assessing the performance of the maximumlikelihood estimators. The â€¡exibility of the proposed family is illustrated by means of two applicationsto real data sets.

Download Full-text

Pakistan Journal of Statistics and Operation Research ◽

10.18187/pjsor.v15i3.2687 ◽

2019 ◽

pp. 773-796

Author(s):

Mohamed Ibrahim Mohamed

Keyword(s):

Monte Carlo ◽

Real Data ◽

Least Square Method ◽

Least Square ◽

The Other ◽

Breaking Stress ◽

Estimation Methods ◽

Data Sets ◽

Proposed Model ◽

Sufficient Set

In this work, we introduce a new extension of the FrÃ©chet distribution. A sufficient set of the mathematical and statistical properties have been derived. The estimation of the parameters is carried out by considering the different method of estimation. The performances of the proposed estimation methods are studied by Monte Carlo simulations. The potentiality of the proposed model has been analyzed through two data sets. The weighted least square method is the best method for modelling breaking stress data, the least square method is the best method for modelling strengths data, however all other methods performed well for both data sets. On the other hand, the new model gives the best Â…ts among all other Â…fitted extensions of the FrÃ©chet models to these data. So, it could be chosen as the best model for modeling breaking stress and strengths real data.

Download Full-text

The Alpha-Beta Skew Logistic Distribution: Properties and Applications

Statistics Optimization & Information Computing ◽

10.19139/soic-2310-5070-706 ◽

2020 ◽

Vol 8 (1) ◽

pp. 304-317 ◽

Cited By ~ 1

Author(s):

Hamid Esmaeili ◽

Fazlollah Lak ◽

Morad Alizadeh ◽

Mohammad esmail Dehghan monfared

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Maximum Likelihood ◽

Real Data ◽

Logistic Distribution ◽

Data Sets ◽

New Family ◽

Alpha Beta ◽

Family Of Distributions ◽

Skewness And Kurtosis

A new family of skew distributions is introduced by extending the alpha skew logistic distribution proposed by Hazarika-Chakraborty [9]. This family of distributions is called the alpha-beta skew logistic (ABSLG) distribution.Density function, moments, skewness and kurtosis coefficients are derived. The parameters of the new family are estimated by maximum likelihood and moments methods. The performance of the obtained estimators examined via a Monte carlo simulation. Flexibility, usefulness and suitability of ABSLG is illustrated by analyzing two real data sets.

Download Full-text

The Gibbs and splitmerge sampler for population mixture analysis from genetic data with incomplete baselines

A New Extension of Thinning-Based Integer-Valued Autoregressive Models for Count Data

Markov Chain Monte Carlo Methods for State-Space Models with Point Process Observations

No-U-Turn sampling for phylogenetic trees

A multiresolution framework to characterize single-cell state landscapes

Bayesian Inference For The Segmented Weibull Distribution

Jackknife model averaging prediction methods for complex phenotypes with gene expression levels by integrating external pathway information

Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms

The Generalized Transmuted Poisson-G Family of Distributions: Theory, Characterizations and Applications

Burr XII FrÃ©chet Distribution: Properties, Bayesian and Classical Estimation

The Alpha-Beta Skew Logistic Distribution: Properties and Applications

Export Citation Format

The Gibbs and splitmerge sampler for population mixture analysis from genetic data with incomplete baselines

A New Extension of Thinning-Based Integer-Valued Autoregressive Models for Count Data

Markov Chain Monte Carlo Methods for State-Space Models with Point Process Observations

No-U-Turn sampling for phylogenetic trees

A multiresolution framework to characterize single-cell state landscapes

Bayesian Inference For The Segmented Weibull Distribution

Jackknife model averaging prediction methods for complex phenotypes with gene expression levels by integrating external pathway information

Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms

The Generalized Transmuted Poisson-G Family of Distributions: Theory, Characterizations and Applications

Burr XII FrÃ©chet Distribution: Properties, Bayesian and Classical Estimation

The Alpha-Beta Skew Logistic Distribution: Properties and Applications

The Gibbs and splitmerge sampler for population mixture analysis from genetic data with incomplete baselines