Robust Design for Coalescent Model Inference

Kris V Parag; Oliver G Pybus

doi:10.1093/sysbio/syz008

Robust Design for Coalescent Model Inference

Systematic Biology ◽

10.1093/sysbio/syz008 ◽

2019 ◽

Vol 68 (5) ◽

pp. 730-743 ◽

Cited By ~ 10

Author(s):

Kris V Parag ◽

Oliver G Pybus

Keyword(s):

Experimental Design ◽

Population Size ◽

Robust Design ◽

Piecewise Constant Function ◽

Parameter Estimates ◽

Model Parameters ◽

Extensive Literature ◽

Coalescent Model ◽

Temporal Sampling ◽

Simulation Based

Abstract The coalescent process describes how changes in the size or structure of a population influence the genealogical patterns of sequences sampled from that population. The estimation of (effective) population size changes from genealogies that are reconstructed from these sampled sequences is an important problem in many biological fields. Often, population size is characterized by a piecewise-constant function, with each piece serving as a population size parameter to be estimated. Estimation quality depends on both the statistical coalescent inference method employed, and on the experimental protocol, which controls variables such as the sampling of sequences through time and space, or the transformation of model parameters. While there is an extensive literature on coalescent inference methodology, there is comparatively little work on experimental design. The research that does exist is largely simulation-based, precluding the development of provable or general design theorems. We examine three key design problems: temporal sampling of sequences under the skyline demographic coalescent model, spatio-temporal sampling under the structured coalescent model, and time discretization for sequentially Markovian coalescent models. In all cases, we prove that 1) working in the logarithm of the parameters to be inferred (e.g., population size) and 2) distributing informative coalescent events uniformly among these log-parameters, is uniquely robust. “Robust” means that the total and maximum uncertainty of our parameter estimates are minimized, and made insensitive to their unknown (true) values. This robust design theorem provides rigorous justification for several existing coalescent experimental design decisions and leads to usable guidelines for future empirical or simulation-based investigations. Given its persistence among models, this theorem may form the basis of an experimental design paradigm for coalescent inference.

Download Full-text

Robust Design for Coalescent Model Inference

10.1101/317438 ◽

2018 ◽

Author(s):

Kris V Parag ◽

Oliver G Pybus

Keyword(s):

Experimental Design ◽

Population Size ◽

Piecewise Constant Function ◽

Model Parameters ◽

Extensive Literature ◽

True Parameter ◽

Coalescent Model ◽

Temporal Sampling ◽

Time Discretisation ◽

Design Paradigm

Abstract—The coalescent process describes how changes in the size of a population influence the genealogical patterns of sequences sampled from that population. The estimation of population size changes from genealogies that are reconstructed from these sequence samples, is an important problem in many biological fields. Often, population size is characterised by a piecewise-constant function, with each piece serving as a population size parameter to be estimated. Estimation quality depends on both the statistical coalescent inference method employed, and on the experimental protocol, which controls variables such as the sampling of sequences through time and space, or the transformation of model parameters. While there is an extensive literature devoted to coalescent inference methodology, there is surprisingly little work on experimental design. The research that does exist is largely simulation based, precluding the development of provable or general design theorems. We examine three key design problems: temporal sampling of sequences under the skyline demographic coalescent model, spatio-temporal sampling for the structured coalescent model, and time discretisation for sequentially Markovian coalescent models. In all cases we prove that (i) working in the logarithm of the parameters to be inferred (e.g. population size), and (ii) distributing informative coalescent events uniformly among these log-parameters, is uniquely robust. ‘Robust’ means that the total and maximum uncertainty of our estimates are minimised, and are also insensitive to their unknown (true) parameter values. Given its persistence among models, this formally derived two-point theorem may form the basis of an experimental design paradigm for coalescent inference.

Download Full-text

The Impact of Global Sensitivities and Design Measures in Model-Based Optimal Experimental Design

Processes ◽

10.3390/pr6040027 ◽

2018 ◽

Vol 6 (4) ◽

pp. 27 ◽

Cited By ~ 16

Author(s):

René Schenkendorf ◽

Xiangzhong Xie ◽

Moritz Rehbein ◽

Stephan Scholl ◽

Ulrike Krewer

Keyword(s):

Experimental Design ◽

Mathematical Models ◽

Chemical Engineering ◽

Protein Kinase Inhibitors ◽

Optimal Experimental Design ◽

Parameter Estimates ◽

Model Parameters ◽

Global Parameter ◽

Model Based ◽

The Impact

In the field of chemical engineering, mathematical models have been proven to be an indispensable tool for process analysis, process design, and condition monitoring. To gain the most benefit from model-based approaches, the implemented mathematical models have to be based on sound principles, and they need to be calibrated to the process under study with suitable model parameter estimates. Often, the model parameters identified by experimental data, however, pose severe uncertainties leading to incorrect or biased inferences. This applies in particular in the field of pharmaceutical manufacturing, where usually the measurement data are limited in quantity and quality when analyzing novel active pharmaceutical ingredients. Optimally designed experiments, in turn, aim to increase the quality of the gathered data in the most efficient way. Any improvement in data quality results in more precise parameter estimates and more reliable model candidates. The applied methods for parameter sensitivity analyses and design criteria are crucial for the effectiveness of the optimal experimental design. In this work, different design measures based on global parameter sensitivities are critically compared with state-of-the-art concepts that follow simplifying linearization principles. The efficient implementation of the proposed sensitivity measures is explicitly addressed to be applicable to complex chemical engineering problems of practical relevance. As a case study, the homogeneous synthesis of 3,4-dihydro-1H-1-benzazepine-2,5-dione, a scaffold for the preparation of various protein kinase inhibitors, is analyzed followed by a more complex model of biochemical reactions. In both studies, the model-based optimal experimental design benefits from global parameter sensitivities combined with proper design measures.

Download Full-text

Multiple-objective criteria for optimal experimental design: application to ferrokinetics

AJP Regulatory Integrative and Comparative Physiology ◽

10.1152/ajpregu.1985.248.3.r378 ◽

1985 ◽

Vol 248 (3) ◽

pp. R378-R386 ◽

Cited By ~ 10

Author(s):

M. H. Nathanson ◽

G. M. Saidel

Keyword(s):

Experimental Design ◽

Optimal Design ◽

Objective Function ◽

Information Matrix ◽

Optimal Experimental Design ◽

Multiple Objective ◽

Parameter Estimates ◽

Model Parameters ◽

Experimental Conditions ◽

New Criterion

Optimal experimental design is used to predict the experimental conditions that will allow the "best" estimates of model parameters. A variety of criteria must be considered before an optimal design is chosen. Maximizing the determinant of the information matrix (D optimality), which tends to produce the most precise simultaneous estimates of all parameters, is commonly considered as the primary criterion. To complement this criterion, we present another whose effect is to reduce the interaction among the parameter estimates so that changes in any one parameter can be more distinct. This new criterion consists of maximizing the determinant of an appropriately scaled information matrix (M optimality). These criteria are applied jointly in a multiple-objective function. To illustrate the use of these concepts, we develop an optimal experimental design of blood sampling schedules using a detailed ferrokinetic model.

Download Full-text

Agglomeration economies, congestion diseconomies, and fertility dynamics in a two-region economy

Letters in Spatial and Resource Sciences ◽

10.1007/s12076-020-00264-z ◽

2021 ◽

Author(s):

Madoka Muroishi ◽

Akira Yakita

Keyword(s):

Population Size ◽

Stability Condition ◽

Fertility Rate ◽

Model Parameters ◽

Initial Population ◽

Interregional Migration ◽

Stable Path ◽

Regional Population ◽

Initial Population Size

AbstractUsing a small, open, two-region economy model populated by two-period-lived overlapping generations, we analyze long-term agglomeration economy and congestion diseconomy effects of young worker concentration on migration and the overall fertility rate. When the migration-stability condition is satisfied, the distribution of young workers between regions is obtainable in each period for a predetermined population size. Results show that migration stability does not guarantee dynamic stability of the economy. The stationary population size stability depends on the model parameters and the initial population size. On a stable trajectory converging to the stationary equilibrium, the overall fertility rate might change non-monotonically with the population size of the economy because of interregional migration. In each period, interregional migration mitigates regional population changes caused by fertility differences on the stable path. Results show that the inter-regional migration-stability condition does not guarantee stability of the population dynamics of the economy.

Download Full-text

Parameter stability and consistency in an alongshore-current model determined with Markov chain Monte Carlo

Journal of Hydroinformatics ◽

10.2166/hydro.2008.016 ◽

2008 ◽

Vol 10 (2) ◽

pp. 153-162 ◽

Cited By ~ 2

Author(s):

B. G. Ruessink

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Parameter Estimates ◽

Model Parameters ◽

Parameter Stability ◽

Small Uncertainty ◽

Best Fit ◽

Stability And Consistency

When a numerical model is to be used as a practical tool, its parameters should preferably be stable and consistent, that is, possess a small uncertainty and be time-invariant. Using data and predictions of alongshore mean currents flowing on a beach as a case study, this paper illustrates how parameter stability and consistency can be assessed using Markov chain Monte Carlo. Within a single calibration run, Markov chain Monte Carlo estimates the parameter posterior probability density function, its mode being the best-fit parameter set. Parameter stability is investigated by stepwise adding new data to a calibration run, while consistency is examined by calibrating the model on different datasets of equal length. The results for the present case study indicate that various tidal cycles with strong (say, >0.5 m/s) currents are required to obtain stable parameter estimates, and that the best-fit model parameters and the underlying posterior distribution are strongly time-varying. This inconsistent parameter behavior may reflect unresolved variability of the processes represented by the parameters, or may represent compensational behavior for temporal violations in specific model assumptions.

Download Full-text

Multi-day flow forecasting using the Kalman filter

Canadian Journal of Civil Engineering ◽

10.1139/l91-037 ◽

1991 ◽

Vol 18 (2) ◽

pp. 320-327 ◽

Cited By ~ 1

Author(s):

Murray A. Fitch ◽

Edward A. McBean

Keyword(s):

Kalman Filter ◽

Prediction Model ◽

Linear Prediction ◽

Measured Data ◽

System Model ◽

Parameter Estimates ◽

Model Parameters ◽

Forecasting Performance ◽

Optimal Forecasting ◽

Linear Prediction Model

A model is developed for the prediction of river flows resulting from combined snowmelt and precipitation. The model employs a Kalman filter to reflect uncertainty both in the measured data and in the system model parameters. The forecasting algorithm is used to develop multi-day forecasts for the Sturgeon River, Ontario. The algorithm is shown to develop good 1-day and 2-day ahead forecasts, but the linear prediction model is found inadequate for longer-term forecasts. Good initial parameter estimates are shown to be essential for optimal forecasting performance. Key words: Kalman filter, streamflow forecast, multi-day, streamflow, Sturgeon River, MISP algorithm.

Download Full-text

Generalised DOPs with Consideration of the Influence Function of Signal-in-Space Errors

Journal of Navigation ◽

10.1017/s0373463311000415 ◽

2011 ◽

Vol 64 (S1) ◽

pp. S3-S18 ◽

Cited By ~ 20

Author(s):

Yuanxi Yang ◽

Jinlong Li ◽

Junyi Xu ◽

Jing Tang

Keyword(s):

Least Squares ◽

Stochastic Models ◽

Influence Function ◽

A Priori ◽

Functional Model ◽

Global Navigation Satellite Systems ◽

Parameter Estimates ◽

Model Parameters ◽

Integrated Navigation ◽

Satellite Systems

Integrated navigation using multiple Global Navigation Satellite Systems (GNSS) is beneficial to increase the number of observable satellites, alleviate the effects of systematic errors and improve the accuracy of positioning, navigation and timing (PNT). When multiple constellations and multiple frequency measurements are employed, the functional and stochastic models as well as the estimation principle for PNT may be different. Therefore, the commonly used definition of “dilution of precision (DOP)” based on the least squares (LS) estimation and unified functional and stochastic models will be not applicable anymore. In this paper, three types of generalised DOPs are defined. The first type of generalised DOP is based on the error influence function (IF) of pseudo-ranges that reflects the geometry strength of the measurements, error magnitude and the estimation risk criteria. When the least squares estimation is used, the first type of generalised DOP is identical to the one commonly used. In order to define the first type of generalised DOP, an IF of signal–in-space (SIS) errors on the parameter estimates of PNT is derived. The second type of generalised DOP is defined based on the functional model with additional systematic parameters induced by the compatibility and interoperability problems among different GNSS systems. The third type of generalised DOP is defined based on Bayesian estimation in which the a priori information of the model parameters is taken into account. This is suitable for evaluating the precision of kinematic positioning or navigation. Different types of generalised DOPs are suitable for different PNT scenarios and an example for the calculation of these DOPs for multi-GNSS systems including GPS, GLONASS, Compass and Galileo is given. New observation equations of Compass and GLONASS that may contain additional parameters for interoperability are specifically investigated. It shows that if the interoperability of multi-GNSS is not fulfilled, the increased number of satellites will not significantly reduce the generalised DOP value. Furthermore, the outlying measurements will not change the original DOP, but will change the first type of generalised DOP which includes a robust error IF. A priori information of the model parameters will also reduce the DOP.

Download Full-text

Information gain from isotopic contrast variation in neutron reflectometry on protein–membrane complex structures

Journal of Applied Crystallography ◽

10.1107/s1600576720005634 ◽

2020 ◽

Vol 53 (3) ◽

pp. 800-810

Author(s):

Frank Heinrich ◽

Paul A. Kienzle ◽

David P. Hoogerheide ◽

Mathias Lösche

Keyword(s):

Experimental Design ◽

Soft Matter ◽

Lipid Membrane ◽

Information Gain ◽

Signal To Noise Ratio ◽

Scattering Length ◽

Model Parameters ◽

Membrane Complex ◽

Simple Model System ◽

Molecular Components

A framework is applied to quantify information gain from neutron or X-ray reflectometry experiments [Treece, Kienzle, Hoogerheide, Majkrzak, Lösche & Heinrich (2019). J. Appl. Cryst. 52, 47–59], in an in-depth investigation into the design of scattering contrast in biological and soft-matter surface architectures. To focus the experimental design on regions of interest, the marginalization of the information gain with respect to a subset of model parameters describing the structure is implemented. Surface architectures of increasing complexity from a simple model system to a protein–lipid membrane complex are simulated. The information gain from virtual surface scattering experiments is quantified as a function of the scattering length density of molecular components of the architecture and the surrounding aqueous bulk solvent. It is concluded that the information gain is mostly determined by the local scattering contrast of a feature of interest with its immediate molecular environment, and experimental design should primarily focus on this region. The overall signal-to-noise ratio of the measured reflectivity modulates the information gain globally and is a second factor to be taken into consideration.

Download Full-text

Inadequate Resources, Experimental Design, and Review of the Informative Feedback Literature

Perceptual and Motor Skills ◽

10.2466/pms.1965.20.1.121 ◽

1965 ◽

Vol 20 (1) ◽

pp. 121-122

Author(s):

Edward A. Bilodeau

Keyword(s):

Experimental Design ◽

Experimental Designs ◽

Extensive Literature ◽

Informative Feedback

A tiny experiment was reported by Dyal (1964) with results apparently contradicting the bulk of an extensive literature he failed to cite. The literature contains far better experimental designs, resources, and discussion of the issues.

Download Full-text

Optimized blood sampling protocols and sequential design of kinetic experiments

AJP Regulatory Integrative and Comparative Physiology ◽

10.1152/ajpregu.1981.240.5.r259 ◽

1981 ◽

Vol 240 (5) ◽

pp. R259-R265 ◽

Cited By ~ 1

Author(s):

J. J. DiStefano

Keyword(s):

Endocrine System ◽

Primary Objective ◽

Blood Sampling ◽

Optimal Designs ◽

Parameter Estimates ◽

Model Parameters ◽

Reverse T3 ◽

Model Parameter ◽

Maximum Accuracy ◽

Sampling Protocols

Design of optimal blood sampling protocols for kinetic experiments is discussed and evaluated, with the aid of several examples--including an endocrine system case study. The criterion of optimality is maximum accuracy of kinetic model parameter estimates. A simple example illustrates why a sequential experiment approach is required; optimal designs depend on the true model parameter values, knowledge of which is usually a primary objective of the experiment, as well as the structure of the model and the measurement error (e.g., assay) variance. The methodology is evaluated from the results of a series of experiments designed to quantify the dynamics of distribution and metabolism of three iodothyronines, T3, T4, and reverse-T3. This analysis indicates that 1) the sequential optimal experiment approach can be effective and efficient in the laboratory, 2) it works in the presence of reasonably controlled biological variation, producing sufficiently robust sampling protocols, and 3) optimal designs can be highly efficient designs in practice, requiring for maximum accuracy a number of blood samples equal to the number of independently adjustable model parameters, no more or less.

Download Full-text