Likelihood-free nested sampling for biochemical reaction networks

Mapping Intimacies ◽

10.1101/564047 ◽

2019 ◽

Author(s):

Jan Mikelson ◽

Mustafa Khammash

Keyword(s):

Systems Biology ◽

Lower Bound ◽

Sequential Monte Carlo ◽

Stochastic Systems ◽

Simulated Data ◽

Biological Data ◽

Model Parameters ◽

Biochemical Reaction Networks ◽

Nested Sampling ◽

Bayesian Evidence

The development of mechanistic models of biological systems is a central part of Systems Biology. One major challenge in developing these models is the accurate inference of the model parameters. In the past years, nested sampling methods have gained an increasing amount of attention in the Systems Biology community. Some of the rather attractive features of these methods include that they are easily parallelizable and give an estimation of the variance of the final Bayesian evidence estimate from a single run. Still, the applicability of these methods is limited as they require the likelihood to be available and thus cannot be applied to stochastic systems with intractable likelihoods. In this paper, we present a likelihood-free nested sampling formulation that gives an unbiased estimator of the Bayesian evidence as well as samples from the posterior. Unlike most common nested sampling schemes we propose to use the information about the samples from the final prior volume to aid in the approximation of the Bayesian evidence and show how this allows us to formulate a lower bound on the variance of the obtained estimator. We proceed and use this lower bound to formulate a novel termination criterion for nested sampling approaches. We illustrate how our approach is applied to several realistically sized models with simulated data as well as recently published biological data. The presented method provides a viable alternative to other likelihood-free inference schemes such as Sequential Monte Carlo or Approximate Bayesian Computations methods. We also provide an intuitive and performative C++ implementation of our method.

Download Full-text

A new Bayesian methodology for nonlinear model calibration in Computational Systems Biology

10.1101/633180 ◽

2019 ◽

Cited By ~ 1

Author(s):

Fortunato Bianconi ◽

Lorenzo Tomassoni ◽

Chiara Antonini ◽

Paolo Valigi

Keyword(s):

Systems Biology ◽

Nonlinear Model ◽

Model Calibration ◽

Sequential Monte Carlo ◽

Profile Likelihood ◽

Computational Cost ◽

Robustness Analysis ◽

Model Parameters ◽

Computational Systems Biology ◽

Proposal Distribution

AbstractComputational modeling is a common tool to quantitatively describe biological processes. However, most model parameters are usually unknown because they cannot be directly measured. Therefore, a key issue in Systems Biology is model calibration, i.e. estimate parameters from experimental data. Existing methodologies for parameter estimation are divided in two classes: frequentist and Bayesian methods. The first ones optimize a cost function while the second ones estimate the parameter posterior distribution through different sampling techniques. Here, we present an innovative Bayesian method, called Conditional Robust Calibration (CRC), for nonlinear model calibration and robustness analysis using omics data. CRC is an iterative algorithm based on the sampling of a proposal distribution and on the definition of multiple objective functions, one for each observable. CRC estimates the probability density function (pdf) of parameters conditioned to the experimental measures and it performs a robustness analysis, quantifying how much each parameter influences the observables behavior. We apply CRC to three Ordinary Differential Equations (ODE) models to test its performances compared to the other state of the art approaches, namely Profile Likelihood (PL), Approximate Bayesian Computation Sequential Monte Carlo (ABC-SMC) and Delayed Rejection Adaptive Metropolis (DRAM). Compared with these methods, CRC finds a robust solution with a reduced computational cost. CRC is developed as a set of Matlab functions (version R2018), whose fundamental source code is freely available at https://github.com/fortunatobianconi/CRC.

Download Full-text

Constructing and model-fitting receiver operator characteristics using continuous data.

10.31234/osf.io/qdgjc ◽

2018 ◽

Author(s):

Josephine Ann Urquhart ◽

Akira O'Connor

Keyword(s):

Signal Detection ◽

Least Squares Method ◽

Model Fitting ◽

Simulated Data ◽

Continuous Variable ◽

Continuous Data ◽

Model Parameters ◽

Operating Characteristics ◽

Unequal Variance ◽

Discrete Discrimination

Receiver operating characteristics (ROCs) are plots which provide a visual summary of a classifier’s decision response accuracy at varying discrimination thresholds. Typical practice, particularly within psychological studies, involves plotting an ROC from a limited number of discrete thresholds before fitting signal detection parameters to the plot. We propose that additional insight into decision-making could be gained through increasing ROC resolution, using trial-by-trial measurements derived from a continuous variable, in place of discrete discrimination thresholds. Such continuous ROCs are not yet routinely used in behavioural research, which we attribute to issues of practicality (i.e. the difficulty of applying standard ROC model-fitting methodologies to continuous data). Consequently, the purpose of the current article is to provide a documented method of fitting signal detection parameters to continuous ROCs. This method reliably produces model fits equivalent to the unequal variance least squares method of model-fitting (Yonelinas et al., 1998), irrespective of the number of data points used in ROC construction. We present the suggested method in three main stages: I) building continuous ROCs, II) model-fitting to continuous ROCs and III) extracting model parameters from continuous ROCs. Throughout the article, procedures are demonstrated in Microsoft Excel, using an example continuous variable: reaction time, taken from a single-item recognition memory. Supplementary MATLAB code used for automating our procedures is also presented in Appendix B, with a validation of the procedure using simulated data shown in Appendix C.

Download Full-text

Comparing two sequential Monte Carlo samplers for exact and approximate Bayesian inference on biological models

Journal of The Royal Society Interface ◽

10.1098/rsif.2017.0340 ◽

2017 ◽

Vol 14 (134) ◽

pp. 20170340 ◽

Cited By ~ 6

Author(s):

Aidan C. Daly ◽

Jonathan Cooper ◽

David J. Gavaghan ◽

Chris Holmes

Keyword(s):

Bayesian Inference ◽

Bayesian Methods ◽

Sequential Monte Carlo ◽

Model Parameters ◽

Biological Models ◽

Exact Inference ◽

Modelling Studies ◽

Approximate Bayesian ◽

Approximate Bayesian Inference ◽

Abc Methods

Bayesian methods are advantageous for biological modelling studies due to their ability to quantify and characterize posterior variability in model parameters. When Bayesian methods cannot be applied, due either to non-determinism in the model or limitations on system observability, approximate Bayesian computation (ABC) methods can be used to similar effect, despite producing inflated estimates of the true posterior variance. Owing to generally differing application domains, there are few studies comparing Bayesian and ABC methods, and thus there is little understanding of the properties and magnitude of this uncertainty inflation. To address this problem, we present two popular strategies for ABC sampling that we have adapted to perform exact Bayesian inference, and compare them on several model problems. We find that one sampler was impractical for exact inference due to its sensitivity to a key normalizing constant, and additionally highlight sensitivities of both samplers to various algorithmic parameters and model conditions. We conclude with a study of the O'Hara–Rudy cardiac action potential model to quantify the uncertainty amplification resulting from employing ABC using a set of clinically relevant biomarkers. We hope that this work serves to guide the implementation and comparative assessment of Bayesian and ABC sampling techniques in biological models.

Download Full-text

Cancer classification and biomarker selection via a penalized logsum network-based logistic regression model

Technology and Health Care ◽

10.3233/thc-218026 ◽

2021 ◽

Vol 29 ◽

pp. 287-295

Author(s):

Zhiming Zhou ◽

Haihui Huang ◽

Yong Liang

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Gene Selection ◽

Simulated Data ◽

Biological Data ◽

Cancer Classification ◽

High Dimensional ◽

Data Set ◽

Biomarker Selection

BACKGROUND: In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection. OBJECTIVE: The aim of this paper is to give the model efficient gene selection capability. METHODS: In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification. RESULTS: Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods. CONCLUSIONS: The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.

Download Full-text

The seven sisters DANCe

Astronomy and Astrophysics ◽

10.1051/0004-6361/201731996 ◽

2018 ◽

Vol 612 ◽

pp. A70 ◽

Cited By ~ 5

Author(s):

J. Olivares ◽

E. Moraux ◽

L. M. Sarro ◽

H. Bouy ◽

A. Berihuete ◽

...

Keyword(s):

Spatial Distribution ◽

Model Comparison ◽

Precise Determination ◽

Model Parameters ◽

Data Sets ◽

Adequate Model ◽

Comparison Results ◽

Bayesian Evidence ◽

Mass Segregation

Context. Membership analyses of the DANCe and Tycho + DANCe data sets provide the largest and least contaminated sample of Pleiades candidate members to date. Aims. We aim at reassessing the different proposals for the number surface density of the Pleiades in the light of the new and most complete list of candidate members, and inferring the parameters of the most adequate model. Methods. We compute the Bayesian evidence and Bayes Factors for variations of the classical radial models. These include elliptical symmetry, and luminosity segregation. As a by-product of the model comparison, we obtain posterior distributions for each set of model parameters. Results. We find that the model comparison results depend on the spatial extent of the region used for the analysis. For a circle of 11.5 parsecs around the cluster centre (the most homogeneous and complete region), we find no compelling reason to abandon King’s model, although the Generalised King model introduced here has slightly better fitting properties. Furthermore, we find strong evidence against radially symmetric models when compared to the elliptic extensions. Finally, we find that including mass segregation in the form of luminosity segregation in the J band is strongly supported in all our models. Conclusions. We have put the question of the projected spatial distribution of the Pleiades cluster on a solid probabilistic framework, and inferred its properties using the most exhaustive and least contaminated list of Pleiades candidate members available to date. Our results suggest however that this sample may still lack about 20% of the expected number of cluster members. Therefore, this study should be revised when the completeness and homogeneity of the data can be extended beyond the 11.5 parsecs limit. Such a study will allow for more precise determination of the Pleiades spatial distribution, its tidal radius, ellipticity, number of objects and total mass.

Download Full-text

Efficient detection of repeating sites to accelerate phylogenetic likelihood calculations

10.1101/035873 ◽

2016 ◽

Cited By ~ 2

Author(s):

Kassian Kobert ◽

Alexandros Stamatakis ◽

Tomáš Flouri

Keyword(s):

Evolutionary Biology ◽

Likelihood Function ◽

Simulated Data ◽

Evolutionary Model ◽

Identical Result ◽

Model Parameters ◽

Data Sets ◽

Efficient Detection ◽

Novel Method ◽

Computational Bottleneck

The phylogenetic likelihood function is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in redundant likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, using appropriate data structures, reducing memory usage. We present a fast, novel method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory saving attained by our method. Using empirical and simulated data sets, we show that a prototype implementation of our method yields up to 10-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the phylogenetic likelihood function currently available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation.

Download Full-text

Evaluating prior predictions of production and seismic data

Computational Geosciences ◽

10.1007/s10596-019-09889-6 ◽

2019 ◽

Vol 23 (6) ◽

pp. 1331-1347 ◽

Cited By ~ 2

Author(s):

Miguel Alfonzo ◽

Dean S. Oliver

Keyword(s):

Data Assimilation ◽

Seismic Data ◽

History Matching ◽

Simulated Data ◽

Model Parameters ◽

Model Diagnostic ◽

Diagonal Approximation ◽

Data Coverage ◽

Reservoir Simulation Model ◽

Using Data

Abstract It is common in ensemble-based methods of history matching to evaluate the adequacy of the initial ensemble of models through visual comparison between actual observations and data predictions prior to data assimilation. If the model is appropriate, then the observed data should look plausible when compared to the distribution of realizations of simulated data. The principle of data coverage alone is, however, not an effective method for model criticism, as coverage can often be obtained by increasing the variability in a single model parameter. In this paper, we propose a methodology for determining the suitability of a model before data assimilation, particularly aimed for real cases with large numbers of model parameters, large amounts of data, and correlated observation errors. This model diagnostic is based on an approximation of the Mahalanobis distance between the observations and the ensemble of predictions in high-dimensional spaces. We applied our methodology to two different examples: a Gaussian example which shows that our shrinkage estimate of the covariance matrix is a better discriminator of outliers than the pseudo-inverse and a diagonal approximation of this matrix; and an example using data from the Norne field. In this second test, we used actual production, repeat formation tester, and inverted seismic data to evaluate the suitability of the initial reservoir simulation model and seismic model. Despite the good data coverage, our model diagnostic suggested that model improvement was necessary. After modifying the model, it was validated against the observations and is now ready for history matching to production and seismic data. This shows that the proposed methodology for the evaluation of the adequacy of the model is suitable for large realistic problems.

Download Full-text

Bayesian Planet Searches for the 10 cm/s Radial Velocity Era

Proceedings of the International Astronomical Union ◽

10.1017/s1743921316002817 ◽

2015 ◽

Vol 11 (A29A) ◽

pp. 205-207

Author(s):

Philip C. Gregory

Keyword(s):

Radial Velocity ◽

State Of The Art ◽

Simulated Data ◽

Model Parameters ◽

Data Sets ◽

Stellar Activity ◽

Bayesian Fusion ◽

Multiple State ◽

Simulated Data Sets ◽

Apodization Function

AbstractA new apodized Keplerian model is proposed for the analysis of precision radial velocity (RV) data to model both planetary and stellar activity (SA) induced RV signals. A symmetrical Gaussian apodization function with unknown width and center can distinguish planetary signals from SA signals on the basis of the width of the apodization function. The general model for m apodized Keplerian signals also includes a linear regression term between RV and the stellar activity diagnostic In (R'hk), as well as an extra Gaussian noise term with unknown standard deviation. The model parameters are explored using a Bayesian fusion MCMC code. A differential version of the Generalized Lomb-Scargle periodogram provides an additional way of distinguishing SA signals and helps guide the choice of new periods. Sample results are reported for a recent international RV blind challenge which included multiple state of the art simulated data sets supported by a variety of stellar activity diagnostics.

Download Full-text

Learning dynamics from large biological data sets: Machine learning meets systems biology

Current Opinion in Systems Biology ◽

10.1016/j.coisb.2020.07.009 ◽

2020 ◽

Vol 22 ◽

pp. 1-7

Author(s):

William Gilpin ◽

Yitong Huang ◽

Daniel B. Forger

Keyword(s):

Machine Learning ◽

Systems Biology ◽

Biological Data ◽

Data Sets ◽

Learning Dynamics

Download Full-text

Bayesian inference of distributed time delay in transcriptional and translational regulation

Bioinformatics ◽

10.1093/bioinformatics/btz574 ◽

2019 ◽

Vol 36 (2) ◽

pp. 586-593

Author(s):

Boseung Choi ◽

Yu-Yu Cheng ◽

Selahattin Cinar ◽

William Ott ◽

Matthew R Bennett ◽

...

Keyword(s):

Bayesian Inference ◽

Stochastic Systems ◽

Imaging Techniques ◽

Reaction Rates ◽

Biochemical Networks ◽

Supplementary Information ◽

Mcmc Methods ◽

Parameter Estimates ◽

Model Parameters ◽

Cell Functions

Abstract Motivation Advances in experimental and imaging techniques have allowed for unprecedented insights into the dynamical processes within individual cells. However, many facets of intracellular dynamics remain hidden, or can be measured only indirectly. This makes it challenging to reconstruct the regulatory networks that govern the biochemical processes underlying various cell functions. Current estimation techniques for inferring reaction rates frequently rely on marginalization over unobserved processes and states. Even in simple systems this approach can be computationally challenging, and can lead to large uncertainties and lack of robustness in parameter estimates. Therefore we will require alternative approaches to efficiently uncover the interactions in complex biochemical networks. Results We propose a Bayesian inference framework based on replacing uninteresting or unobserved reactions with time delays. Although the resulting models are non-Markovian, recent results on stochastic systems with random delays allow us to rigorously obtain expressions for the likelihoods of model parameters. In turn, this allows us to extend MCMC methods to efficiently estimate reaction rates, and delay distribution parameters, from single-cell assays. We illustrate the advantages, and potential pitfalls, of the approach using a birth–death model with both synthetic and experimental data, and show that we can robustly infer model parameters using a relatively small number of measurements. We demonstrate how to do so even when only the relative molecule count within the cell is measured, as in the case of fluorescence microscopy. Availability and implementation Accompanying code in R is available at https://github.com/cbskust/DDE_BD. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text