scholarly journals Likelihood-free nested sampling for biochemical reaction networks

2019 ◽  
Author(s):  
Jan Mikelson ◽  
Mustafa Khammash

The development of mechanistic models of biological systems is a central part of Systems Biology. One major challenge in developing these models is the accurate inference of the model parameters. In the past years, nested sampling methods have gained an increasing amount of attention in the Systems Biology community. Some of the rather attractive features of these methods include that they are easily parallelizable and give an estimation of the variance of the final Bayesian evidence estimate from a single run. Still, the applicability of these methods is limited as they require the likelihood to be available and thus cannot be applied to stochastic systems with intractable likelihoods. In this paper, we present a likelihood-free nested sampling formulation that gives an unbiased estimator of the Bayesian evidence as well as samples from the posterior. Unlike most common nested sampling schemes we propose to use the information about the samples from the final prior volume to aid in the approximation of the Bayesian evidence and show how this allows us to formulate a lower bound on the variance of the obtained estimator. We proceed and use this lower bound to formulate a novel termination criterion for nested sampling approaches. We illustrate how our approach is applied to several realistically sized models with simulated data as well as recently published biological data. The presented method provides a viable alternative to other likelihood-free inference schemes such as Sequential Monte Carlo or Approximate Bayesian Computations methods. We also provide an intuitive and performative C++ implementation of our method.

2019 ◽  
Author(s):  
Fortunato Bianconi ◽  
Lorenzo Tomassoni ◽  
Chiara Antonini ◽  
Paolo Valigi

AbstractComputational modeling is a common tool to quantitatively describe biological processes. However, most model parameters are usually unknown because they cannot be directly measured. Therefore, a key issue in Systems Biology is model calibration, i.e. estimate parameters from experimental data. Existing methodologies for parameter estimation are divided in two classes: frequentist and Bayesian methods. The first ones optimize a cost function while the second ones estimate the parameter posterior distribution through different sampling techniques. Here, we present an innovative Bayesian method, called Conditional Robust Calibration (CRC), for nonlinear model calibration and robustness analysis using omics data. CRC is an iterative algorithm based on the sampling of a proposal distribution and on the definition of multiple objective functions, one for each observable. CRC estimates the probability density function (pdf) of parameters conditioned to the experimental measures and it performs a robustness analysis, quantifying how much each parameter influences the observables behavior. We apply CRC to three Ordinary Differential Equations (ODE) models to test its performances compared to the other state of the art approaches, namely Profile Likelihood (PL), Approximate Bayesian Computation Sequential Monte Carlo (ABC-SMC) and Delayed Rejection Adaptive Metropolis (DRAM). Compared with these methods, CRC finds a robust solution with a reduced computational cost. CRC is developed as a set of Matlab functions (version R2018), whose fundamental source code is freely available at https://github.com/fortunatobianconi/CRC.


2018 ◽  
Author(s):  
Josephine Ann Urquhart ◽  
Akira O'Connor

Receiver operating characteristics (ROCs) are plots which provide a visual summary of a classifier’s decision response accuracy at varying discrimination thresholds. Typical practice, particularly within psychological studies, involves plotting an ROC from a limited number of discrete thresholds before fitting signal detection parameters to the plot. We propose that additional insight into decision-making could be gained through increasing ROC resolution, using trial-by-trial measurements derived from a continuous variable, in place of discrete discrimination thresholds. Such continuous ROCs are not yet routinely used in behavioural research, which we attribute to issues of practicality (i.e. the difficulty of applying standard ROC model-fitting methodologies to continuous data). Consequently, the purpose of the current article is to provide a documented method of fitting signal detection parameters to continuous ROCs. This method reliably produces model fits equivalent to the unequal variance least squares method of model-fitting (Yonelinas et al., 1998), irrespective of the number of data points used in ROC construction. We present the suggested method in three main stages: I) building continuous ROCs, II) model-fitting to continuous ROCs and III) extracting model parameters from continuous ROCs. Throughout the article, procedures are demonstrated in Microsoft Excel, using an example continuous variable: reaction time, taken from a single-item recognition memory. Supplementary MATLAB code used for automating our procedures is also presented in Appendix B, with a validation of the procedure using simulated data shown in Appendix C.


2017 ◽  
Vol 14 (134) ◽  
pp. 20170340 ◽  
Author(s):  
Aidan C. Daly ◽  
Jonathan Cooper ◽  
David J. Gavaghan ◽  
Chris Holmes

Bayesian methods are advantageous for biological modelling studies due to their ability to quantify and characterize posterior variability in model parameters. When Bayesian methods cannot be applied, due either to non-determinism in the model or limitations on system observability, approximate Bayesian computation (ABC) methods can be used to similar effect, despite producing inflated estimates of the true posterior variance. Owing to generally differing application domains, there are few studies comparing Bayesian and ABC methods, and thus there is little understanding of the properties and magnitude of this uncertainty inflation. To address this problem, we present two popular strategies for ABC sampling that we have adapted to perform exact Bayesian inference, and compare them on several model problems. We find that one sampler was impractical for exact inference due to its sensitivity to a key normalizing constant, and additionally highlight sensitivities of both samplers to various algorithmic parameters and model conditions. We conclude with a study of the O'Hara–Rudy cardiac action potential model to quantify the uncertainty amplification resulting from employing ABC using a set of clinically relevant biomarkers. We hope that this work serves to guide the implementation and comparative assessment of Bayesian and ABC sampling techniques in biological models.


2021 ◽  
Vol 29 ◽  
pp. 287-295
Author(s):  
Zhiming Zhou ◽  
Haihui Huang ◽  
Yong Liang

BACKGROUND: In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection. OBJECTIVE: The aim of this paper is to give the model efficient gene selection capability. METHODS: In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification. RESULTS: Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods. CONCLUSIONS: The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.


2018 ◽  
Vol 612 ◽  
pp. A70 ◽  
Author(s):  
J. Olivares ◽  
E. Moraux ◽  
L. M. Sarro ◽  
H. Bouy ◽  
A. Berihuete ◽  
...  

Context. Membership analyses of the DANCe and Tycho + DANCe data sets provide the largest and least contaminated sample of Pleiades candidate members to date. Aims. We aim at reassessing the different proposals for the number surface density of the Pleiades in the light of the new and most complete list of candidate members, and inferring the parameters of the most adequate model. Methods. We compute the Bayesian evidence and Bayes Factors for variations of the classical radial models. These include elliptical symmetry, and luminosity segregation. As a by-product of the model comparison, we obtain posterior distributions for each set of model parameters. Results. We find that the model comparison results depend on the spatial extent of the region used for the analysis. For a circle of 11.5 parsecs around the cluster centre (the most homogeneous and complete region), we find no compelling reason to abandon King’s model, although the Generalised King model introduced here has slightly better fitting properties. Furthermore, we find strong evidence against radially symmetric models when compared to the elliptic extensions. Finally, we find that including mass segregation in the form of luminosity segregation in the J band is strongly supported in all our models. Conclusions. We have put the question of the projected spatial distribution of the Pleiades cluster on a solid probabilistic framework, and inferred its properties using the most exhaustive and least contaminated list of Pleiades candidate members available to date. Our results suggest however that this sample may still lack about 20% of the expected number of cluster members. Therefore, this study should be revised when the completeness and homogeneity of the data can be extended beyond the 11.5 parsecs limit. Such a study will allow for more precise determination of the Pleiades spatial distribution, its tidal radius, ellipticity, number of objects and total mass.


2016 ◽  
Author(s):  
Kassian Kobert ◽  
Alexandros Stamatakis ◽  
Tomáš Flouri

The phylogenetic likelihood function is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in redundant likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, using appropriate data structures, reducing memory usage. We present a fast, novel method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory saving attained by our method. Using empirical and simulated data sets, we show that a prototype implementation of our method yields up to 10-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the phylogenetic likelihood function currently available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation.


2019 ◽  
Vol 23 (6) ◽  
pp. 1331-1347 ◽  
Author(s):  
Miguel Alfonzo ◽  
Dean S. Oliver

Abstract It is common in ensemble-based methods of history matching to evaluate the adequacy of the initial ensemble of models through visual comparison between actual observations and data predictions prior to data assimilation. If the model is appropriate, then the observed data should look plausible when compared to the distribution of realizations of simulated data. The principle of data coverage alone is, however, not an effective method for model criticism, as coverage can often be obtained by increasing the variability in a single model parameter. In this paper, we propose a methodology for determining the suitability of a model before data assimilation, particularly aimed for real cases with large numbers of model parameters, large amounts of data, and correlated observation errors. This model diagnostic is based on an approximation of the Mahalanobis distance between the observations and the ensemble of predictions in high-dimensional spaces. We applied our methodology to two different examples: a Gaussian example which shows that our shrinkage estimate of the covariance matrix is a better discriminator of outliers than the pseudo-inverse and a diagonal approximation of this matrix; and an example using data from the Norne field. In this second test, we used actual production, repeat formation tester, and inverted seismic data to evaluate the suitability of the initial reservoir simulation model and seismic model. Despite the good data coverage, our model diagnostic suggested that model improvement was necessary. After modifying the model, it was validated against the observations and is now ready for history matching to production and seismic data. This shows that the proposed methodology for the evaluation of the adequacy of the model is suitable for large realistic problems.


2015 ◽  
Vol 11 (A29A) ◽  
pp. 205-207
Author(s):  
Philip C. Gregory

AbstractA new apodized Keplerian model is proposed for the analysis of precision radial velocity (RV) data to model both planetary and stellar activity (SA) induced RV signals. A symmetrical Gaussian apodization function with unknown width and center can distinguish planetary signals from SA signals on the basis of the width of the apodization function. The general model for m apodized Keplerian signals also includes a linear regression term between RV and the stellar activity diagnostic In (R'hk), as well as an extra Gaussian noise term with unknown standard deviation. The model parameters are explored using a Bayesian fusion MCMC code. A differential version of the Generalized Lomb-Scargle periodogram provides an additional way of distinguishing SA signals and helps guide the choice of new periods. Sample results are reported for a recent international RV blind challenge which included multiple state of the art simulated data sets supported by a variety of stellar activity diagnostics.


2019 ◽  
Vol 36 (2) ◽  
pp. 586-593
Author(s):  
Boseung Choi ◽  
Yu-Yu Cheng ◽  
Selahattin Cinar ◽  
William Ott ◽  
Matthew R Bennett ◽  
...  

Abstract Motivation Advances in experimental and imaging techniques have allowed for unprecedented insights into the dynamical processes within individual cells. However, many facets of intracellular dynamics remain hidden, or can be measured only indirectly. This makes it challenging to reconstruct the regulatory networks that govern the biochemical processes underlying various cell functions. Current estimation techniques for inferring reaction rates frequently rely on marginalization over unobserved processes and states. Even in simple systems this approach can be computationally challenging, and can lead to large uncertainties and lack of robustness in parameter estimates. Therefore we will require alternative approaches to efficiently uncover the interactions in complex biochemical networks. Results We propose a Bayesian inference framework based on replacing uninteresting or unobserved reactions with time delays. Although the resulting models are non-Markovian, recent results on stochastic systems with random delays allow us to rigorously obtain expressions for the likelihoods of model parameters. In turn, this allows us to extend MCMC methods to efficiently estimate reaction rates, and delay distribution parameters, from single-cell assays. We illustrate the advantages, and potential pitfalls, of the approach using a birth–death model with both synthetic and experimental data, and show that we can robustly infer model parameters using a relatively small number of measurements. We demonstrate how to do so even when only the relative molecule count within the cell is measured, as in the case of fluorescence microscopy. Availability and implementation Accompanying code in R is available at https://github.com/cbskust/DDE_BD. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document