The Gibbs and split–merge sampler for population mixture analysis from genetic data with incomplete baselines

2006 ◽  
Vol 63 (3) ◽  
pp. 576-596 ◽  
Author(s):  
Jerome Pella ◽  
Michele Masuda

Although population mixtures often include contributions from novel populations as well as from baseline populations previously sampled, unlabeled mixture individuals can be separated to their sources from genetic data. A Gibbs and split–merge Markov chain Monte Carlo sampler is described for successively partitioning a genetic mixture sample into plausible subsets of individuals from each of the baseline and extra-baseline populations present. The subsets are selected to satisfy the Hardy–Weinberg and linkage equilibrium conditions expected for large, panmictic populations. The number of populations present can be inferred from the distribution for counts of subsets per partition drawn by the sampler. To further summarize the sampler's output, co-assignment probabilities of mixture individuals to the same subsets are computed from the partitions and are used to construct a binary tree of their relatedness. The tree graphically displays the clusters of mixture individuals together with a quantitative measure of the evidence supporting their various separate and common sources. The methodology is applied to several simulated and real data sets to illustrate its use and demonstrate the sampler's superior performance.

Entropy ◽  
2020 ◽  
Vol 23 (1) ◽  
pp. 62
Author(s):  
Zhengwei Liu ◽  
Fukang Zhu

The thinning operators play an important role in the analysis of integer-valued autoregressive models, and the most widely used is the binomial thinning. Inspired by the theory about extended Pascal triangles, a new thinning operator named extended binomial is introduced, which is a general case of the binomial thinning. Compared to the binomial thinning operator, the extended binomial thinning operator has two parameters and is more flexible in modeling. Based on the proposed operator, a new integer-valued autoregressive model is introduced, which can accurately and flexibly capture the dispersed features of counting time series. Two-step conditional least squares (CLS) estimation is investigated for the innovation-free case and the conditional maximum likelihood estimation is also discussed. We have also obtained the asymptotic property of the two-step CLS estimator. Finally, three overdispersed or underdispersed real data sets are considered to illustrate a superior performance of the proposed model.


2012 ◽  
Vol 24 (6) ◽  
pp. 1462-1486 ◽  
Author(s):  
Ke Yuan ◽  
Mark Girolami ◽  
Mahesan Niranjan

This letter considers how a number of modern Markov chain Monte Carlo (MCMC) methods can be applied for parameter estimation and inference in state-space models with point process observations. We quantified the efficiencies of these MCMC methods on synthetic data, and our results suggest that the Reimannian manifold Hamiltonian Monte Carlo method offers the best performance. We further compared such a method with a previously tested variational Bayes method on two experimental data sets. Results indicate similar performance on the large data sets and superior performance on small ones. The work offers an extensive suite of MCMC algorithms evaluated on an important class of models for physiological signal analysis.


2021 ◽  
Author(s):  
Johannes Wahle

The inference of phylogenetic trees from sequence data has become a staple in evolutionary research. Bayesian inference of such trees is predominantly based on the Metropolis-Hastings algorithm. For high dimensional and correlated data this algorithm is known to be inefficient. There are gradient based algorithms to speed up such inference. Building on recent research which uses gradient based approaches for the inference of phylogenetic trees in a Bayesian framework, I present an algorithm which is capable of performing No-U-Turn sampling for phylogenetic trees. As an extension to Hamiltonian Monte Carlo methods, No-U-Turn sampling comes with the same benefits, such as proposing distant new states with a high acceptance probability, but eliminates the need to manually tune hyper parameters. Evaluated on real data sets, the new sampler shows that it converges faster to the target distribution. The results also indicate that a higher number of topologies are traversed during sampling by the new algorithm in comparison to traditional Markov Chain Monte Carlo approaches. This new algorithm leads to a more efficient exploration of the posterior distribution of phylogenetic tree topologies.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Shahin Mohammadi ◽  
Jose Davila-Velderrain ◽  
Manolis Kellis

Abstract Dissecting the cellular heterogeneity embedded in single-cell transcriptomic data is challenging. Although many methods and approaches exist, identifying cell states and their underlying topology is still a major challenge. Here, we introduce the concept of multiresolution cell-state decomposition as a practical approach to simultaneously capture both fine- and coarse-grain patterns of variability. We implement this concept in ACTIONet, a comprehensive framework that combines archetypal analysis and manifold learning to provide a ready-to-use analytical approach for multiresolution single-cell state characterization. ACTIONet provides a robust, reproducible, and highly interpretable single-cell analysis platform that couples dominant pattern discovery with a corresponding structural representation of the cell state landscape. Using multiple synthetic and real data sets, we demonstrate ACTIONet’s superior performance relative to existing alternatives. We use ACTIONet to integrate and annotate cells across three human cortex data sets. Through integrative comparative analysis, we define a consensus vocabulary and a consistent set of gene signatures discriminating against the transcriptomic cell types and subtypes of the human prefrontal cortex.


2019 ◽  
Vol 42 (2) ◽  
pp. 225-243
Author(s):  
Emilio A. Coelho-Barros ◽  
Jorge A. Achcar ◽  
Edson Z. Martinez ◽  
Nasser Davarzani ◽  
Heike I. Grabsch

In this paper, we introduce a Bayesian approach for segmented Weibull distributions which could be a good alternative to analyze medical survival data in the presence of censored observations and covariates. With the obtained Bayesian estimated change-points we could get an excellent fit of the proposed model to any data sets. With the proposed methodology, it is also possible to identify survival times intervals where a covariate could have significantly different efects when compared to other lifetime intervals, an important point under a clinical view. The obtained Bayesian estimates are obtained using standard Markov Chain Monte Carlo methods. Some examples with real data sets illustrate the proposed methodology and its potential clinical value.


2018 ◽  
Author(s):  
Xinghao Yu ◽  
Lishun Xiao ◽  
Ping Zeng ◽  
Shuiping Huang

AbstractMotivationIn the past few years many novel prediction approaches have been proposed and widely employed in high dimensional genetic data for disease risk evaluation. However, those approaches typically ignore in model fitting the important group structures or functional classifications that naturally exists in genetic data.MethodsIn the present study, we applied a novel model averaging approach, called Jackknife Model Averaging Prediction (JMAP), for high dimensional genetic risk prediction while incorporating KEGG pathway information into the model specification. JMAP selects the optimal weights across candidate models by minimizing a cross-validation criterion in a jackknife way. Compared with previous approaches, one of the primary features of JMAP is to allow model weights to vary from 0 to 1 but without the limitation that the summation of weights is equal to one. We evaluated the performance of JMAP using extensive simulation studies and compared it with existing methods. We finally applied JMAP to five real cancer datasets that are publicly available from TCGA.ResultsThe simulations showed that, compared with other existing approaches, JMAP performed best or are among the best methods across a range of scenarios. For example, among 14 out of 16 simulation settings with PVE=0.3, JMAP has an average of 0.075 higher prediction accuracy compared with gsslasso. We further found that in the simulation the model weights for the true candidate models have much smaller chances to be zero compared with those for the null candidate models and are substantially greater in magnitude. In the real data application, JMAP also behaves comparably or better compared with the other methods for both continuous and binary phenotypes. For example, for the COAD, CRC and PAAD data sets, the average gains of predictive accuracy of JMAP are 0.019, 0.064 and 0.052 compared with gsslasso.ConclusionThe proposed method JMAP is a novel method that can provide more accurate phenotypic prediction while incorporating external useful group information.


Sensors ◽  
2019 ◽  
Vol 19 (23) ◽  
pp. 5335 ◽  
Author(s):  
Wei Fang ◽  
Dongxu Wei ◽  
Ran Zhang

The rapid development of sensor technology gives rise to the emergence of huge amounts of tensor (i.e., multi-dimensional array) data. For various reasons such as sensor failures and communication loss, the tensor data may be corrupted by not only small noises but also gross corruptions. This paper studies the Stable Tensor Principal Component Pursuit (STPCP) which aims to recover a tensor from its corrupted observations. Specifically, we propose a STPCP model based on the recently proposed tubal nuclear norm (TNN) which has shown superior performance in comparison with other tensor nuclear norms. Theoretically, we rigorously prove that under tensor incoherence conditions, the underlying tensor and the sparse corruption tensor can be stably recovered. Algorithmically, we first develop an ADMM algorithm and then accelerate it by designing a new algorithm based on orthogonal tensor factorization. The superiority and efficiency of the proposed algorithms is demonstrated through experiments on both synthetic and real data sets.


Author(s):  
Haitham Yousof ◽  
Ahmed Z Afify ◽  
Morad Alizadeh ◽  
G. G. Hamedani ◽  
S. Jahanshahi ◽  
...  

In this work, we introduce a new class of continuous distributions called the generalized poissonfamily which extends the quadratic rank transmutation map. We provide some special models for thenew family. Some of its mathematical properties including Rényi and q-entropies, order statistics andcharacterizations are derived. The estimations of the model parameters is performed by maximumlikelihood method. The Monte Carlo simulations is used for assessing the performance of the maximumlikelihood estimators. The ‡exibility of the proposed family is illustrated by means of two applicationsto real data sets.


Author(s):  
Mohamed Ibrahim Mohamed

In this work, we introduce a new extension of the Fréchet distribution. A sufficient set of the mathematical and statistical properties have been derived. The estimation of the parameters is carried out by considering the different method of estimation. The performances of the proposed estimation methods are studied by Monte Carlo simulations. The potentiality of the proposed model has been analyzed through two data sets. The weighted least square method is the best method for modelling breaking stress data, the least square method is the best method for modelling strengths data, however all other methods performed well for both data sets. On the other hand, the new model gives the best …ts among all other …fitted extensions of the Fréchet models to these data. So, it could be chosen as the best model for modeling breaking stress and strengths real data.


2020 ◽  
Vol 8 (1) ◽  
pp. 304-317 ◽  
Author(s):  
Hamid Esmaeili ◽  
Fazlollah Lak ◽  
Morad Alizadeh ◽  
Mohammad esmail Dehghan monfared

A new family of skew distributions is introduced by extending the alpha skew logistic distribution proposed by Hazarika-Chakraborty [9]. This family of distributions is called the alpha-beta skew logistic (ABSLG) distribution.Density function, moments, skewness and kurtosis coefficients are derived. The parameters of the new family are estimated by maximum likelihood and moments methods. The performance of the obtained estimators examined via a Monte carlo simulation. Flexibility, usefulness and suitability of ABSLG is illustrated by analyzing two real data sets.


Sign in / Sign up

Export Citation Format

Share Document