scholarly journals Robust Design for Coalescent Model Inference

2019 ◽  
Vol 68 (5) ◽  
pp. 730-743 ◽  
Author(s):  
Kris V Parag ◽  
Oliver G Pybus

Abstract The coalescent process describes how changes in the size or structure of a population influence the genealogical patterns of sequences sampled from that population. The estimation of (effective) population size changes from genealogies that are reconstructed from these sampled sequences is an important problem in many biological fields. Often, population size is characterized by a piecewise-constant function, with each piece serving as a population size parameter to be estimated. Estimation quality depends on both the statistical coalescent inference method employed, and on the experimental protocol, which controls variables such as the sampling of sequences through time and space, or the transformation of model parameters. While there is an extensive literature on coalescent inference methodology, there is comparatively little work on experimental design. The research that does exist is largely simulation-based, precluding the development of provable or general design theorems. We examine three key design problems: temporal sampling of sequences under the skyline demographic coalescent model, spatio-temporal sampling under the structured coalescent model, and time discretization for sequentially Markovian coalescent models. In all cases, we prove that 1) working in the logarithm of the parameters to be inferred (e.g., population size) and 2) distributing informative coalescent events uniformly among these log-parameters, is uniquely robust. “Robust” means that the total and maximum uncertainty of our parameter estimates are minimized, and made insensitive to their unknown (true) values. This robust design theorem provides rigorous justification for several existing coalescent experimental design decisions and leads to usable guidelines for future empirical or simulation-based investigations. Given its persistence among models, this theorem may form the basis of an experimental design paradigm for coalescent inference.

2018 ◽  
Author(s):  
Kris V Parag ◽  
Oliver G Pybus

Abstract—The coalescent process describes how changes in the size of a population influence the genealogical patterns of sequences sampled from that population. The estimation of population size changes from genealogies that are reconstructed from these sequence samples, is an important problem in many biological fields. Often, population size is characterised by a piecewise-constant function, with each piece serving as a population size parameter to be estimated. Estimation quality depends on both the statistical coalescent inference method employed, and on the experimental protocol, which controls variables such as the sampling of sequences through time and space, or the transformation of model parameters. While there is an extensive literature devoted to coalescent inference methodology, there is surprisingly little work on experimental design. The research that does exist is largely simulation based, precluding the development of provable or general design theorems. We examine three key design problems: temporal sampling of sequences under the skyline demographic coalescent model, spatio-temporal sampling for the structured coalescent model, and time discretisation for sequentially Markovian coalescent models. In all cases we prove that (i) working in the logarithm of the parameters to be inferred (e.g. population size), and (ii) distributing informative coalescent events uniformly among these log-parameters, is uniquely robust. ‘Robust’ means that the total and maximum uncertainty of our estimates are minimised, and are also insensitive to their unknown (true) parameter values. Given its persistence among models, this formally derived two-point theorem may form the basis of an experimental design paradigm for coalescent inference.


Processes ◽  
2018 ◽  
Vol 6 (4) ◽  
pp. 27 ◽  
Author(s):  
René Schenkendorf ◽  
Xiangzhong Xie ◽  
Moritz Rehbein ◽  
Stephan Scholl ◽  
Ulrike Krewer

In the field of chemical engineering, mathematical models have been proven to be an indispensable tool for process analysis, process design, and condition monitoring. To gain the most benefit from model-based approaches, the implemented mathematical models have to be based on sound principles, and they need to be calibrated to the process under study with suitable model parameter estimates. Often, the model parameters identified by experimental data, however, pose severe uncertainties leading to incorrect or biased inferences. This applies in particular in the field of pharmaceutical manufacturing, where usually the measurement data are limited in quantity and quality when analyzing novel active pharmaceutical ingredients. Optimally designed experiments, in turn, aim to increase the quality of the gathered data in the most efficient way. Any improvement in data quality results in more precise parameter estimates and more reliable model candidates. The applied methods for parameter sensitivity analyses and design criteria are crucial for the effectiveness of the optimal experimental design. In this work, different design measures based on global parameter sensitivities are critically compared with state-of-the-art concepts that follow simplifying linearization principles. The efficient implementation of the proposed sensitivity measures is explicitly addressed to be applicable to complex chemical engineering problems of practical relevance. As a case study, the homogeneous synthesis of 3,4-dihydro-1H-1-benzazepine-2,5-dione, a scaffold for the preparation of various protein kinase inhibitors, is analyzed followed by a more complex model of biochemical reactions. In both studies, the model-based optimal experimental design benefits from global parameter sensitivities combined with proper design measures.


1985 ◽  
Vol 248 (3) ◽  
pp. R378-R386 ◽  
Author(s):  
M. H. Nathanson ◽  
G. M. Saidel

Optimal experimental design is used to predict the experimental conditions that will allow the "best" estimates of model parameters. A variety of criteria must be considered before an optimal design is chosen. Maximizing the determinant of the information matrix (D optimality), which tends to produce the most precise simultaneous estimates of all parameters, is commonly considered as the primary criterion. To complement this criterion, we present another whose effect is to reduce the interaction among the parameter estimates so that changes in any one parameter can be more distinct. This new criterion consists of maximizing the determinant of an appropriately scaled information matrix (M optimality). These criteria are applied jointly in a multiple-objective function. To illustrate the use of these concepts, we develop an optimal experimental design of blood sampling schedules using a detailed ferrokinetic model.


Author(s):  
Madoka Muroishi ◽  
Akira Yakita

AbstractUsing a small, open, two-region economy model populated by two-period-lived overlapping generations, we analyze long-term agglomeration economy and congestion diseconomy effects of young worker concentration on migration and the overall fertility rate. When the migration-stability condition is satisfied, the distribution of young workers between regions is obtainable in each period for a predetermined population size. Results show that migration stability does not guarantee dynamic stability of the economy. The stationary population size stability depends on the model parameters and the initial population size. On a stable trajectory converging to the stationary equilibrium, the overall fertility rate might change non-monotonically with the population size of the economy because of interregional migration. In each period, interregional migration mitigates regional population changes caused by fertility differences on the stable path. Results show that the inter-regional migration-stability condition does not guarantee stability of the population dynamics of the economy.


2008 ◽  
Vol 10 (2) ◽  
pp. 153-162 ◽  
Author(s):  
B. G. Ruessink

When a numerical model is to be used as a practical tool, its parameters should preferably be stable and consistent, that is, possess a small uncertainty and be time-invariant. Using data and predictions of alongshore mean currents flowing on a beach as a case study, this paper illustrates how parameter stability and consistency can be assessed using Markov chain Monte Carlo. Within a single calibration run, Markov chain Monte Carlo estimates the parameter posterior probability density function, its mode being the best-fit parameter set. Parameter stability is investigated by stepwise adding new data to a calibration run, while consistency is examined by calibrating the model on different datasets of equal length. The results for the present case study indicate that various tidal cycles with strong (say, >0.5 m/s) currents are required to obtain stable parameter estimates, and that the best-fit model parameters and the underlying posterior distribution are strongly time-varying. This inconsistent parameter behavior may reflect unresolved variability of the processes represented by the parameters, or may represent compensational behavior for temporal violations in specific model assumptions.


1991 ◽  
Vol 18 (2) ◽  
pp. 320-327 ◽  
Author(s):  
Murray A. Fitch ◽  
Edward A. McBean

A model is developed for the prediction of river flows resulting from combined snowmelt and precipitation. The model employs a Kalman filter to reflect uncertainty both in the measured data and in the system model parameters. The forecasting algorithm is used to develop multi-day forecasts for the Sturgeon River, Ontario. The algorithm is shown to develop good 1-day and 2-day ahead forecasts, but the linear prediction model is found inadequate for longer-term forecasts. Good initial parameter estimates are shown to be essential for optimal forecasting performance. Key words: Kalman filter, streamflow forecast, multi-day, streamflow, Sturgeon River, MISP algorithm.


2011 ◽  
Vol 64 (S1) ◽  
pp. S3-S18 ◽  
Author(s):  
Yuanxi Yang ◽  
Jinlong Li ◽  
Junyi Xu ◽  
Jing Tang

Integrated navigation using multiple Global Navigation Satellite Systems (GNSS) is beneficial to increase the number of observable satellites, alleviate the effects of systematic errors and improve the accuracy of positioning, navigation and timing (PNT). When multiple constellations and multiple frequency measurements are employed, the functional and stochastic models as well as the estimation principle for PNT may be different. Therefore, the commonly used definition of “dilution of precision (DOP)” based on the least squares (LS) estimation and unified functional and stochastic models will be not applicable anymore. In this paper, three types of generalised DOPs are defined. The first type of generalised DOP is based on the error influence function (IF) of pseudo-ranges that reflects the geometry strength of the measurements, error magnitude and the estimation risk criteria. When the least squares estimation is used, the first type of generalised DOP is identical to the one commonly used. In order to define the first type of generalised DOP, an IF of signal–in-space (SIS) errors on the parameter estimates of PNT is derived. The second type of generalised DOP is defined based on the functional model with additional systematic parameters induced by the compatibility and interoperability problems among different GNSS systems. The third type of generalised DOP is defined based on Bayesian estimation in which the a priori information of the model parameters is taken into account. This is suitable for evaluating the precision of kinematic positioning or navigation. Different types of generalised DOPs are suitable for different PNT scenarios and an example for the calculation of these DOPs for multi-GNSS systems including GPS, GLONASS, Compass and Galileo is given. New observation equations of Compass and GLONASS that may contain additional parameters for interoperability are specifically investigated. It shows that if the interoperability of multi-GNSS is not fulfilled, the increased number of satellites will not significantly reduce the generalised DOP value. Furthermore, the outlying measurements will not change the original DOP, but will change the first type of generalised DOP which includes a robust error IF. A priori information of the model parameters will also reduce the DOP.


2020 ◽  
Vol 53 (3) ◽  
pp. 800-810
Author(s):  
Frank Heinrich ◽  
Paul A. Kienzle ◽  
David P. Hoogerheide ◽  
Mathias Lösche

A framework is applied to quantify information gain from neutron or X-ray reflectometry experiments [Treece, Kienzle, Hoogerheide, Majkrzak, Lösche & Heinrich (2019). J. Appl. Cryst. 52, 47–59], in an in-depth investigation into the design of scattering contrast in biological and soft-matter surface architectures. To focus the experimental design on regions of interest, the marginalization of the information gain with respect to a subset of model parameters describing the structure is implemented. Surface architectures of increasing complexity from a simple model system to a protein–lipid membrane complex are simulated. The information gain from virtual surface scattering experiments is quantified as a function of the scattering length density of molecular components of the architecture and the surrounding aqueous bulk solvent. It is concluded that the information gain is mostly determined by the local scattering contrast of a feature of interest with its immediate molecular environment, and experimental design should primarily focus on this region. The overall signal-to-noise ratio of the measured reflectivity modulates the information gain globally and is a second factor to be taken into consideration.


1965 ◽  
Vol 20 (1) ◽  
pp. 121-122
Author(s):  
Edward A. Bilodeau

A tiny experiment was reported by Dyal (1964) with results apparently contradicting the bulk of an extensive literature he failed to cite. The literature contains far better experimental designs, resources, and discussion of the issues.


1981 ◽  
Vol 240 (5) ◽  
pp. R259-R265 ◽  
Author(s):  
J. J. DiStefano

Design of optimal blood sampling protocols for kinetic experiments is discussed and evaluated, with the aid of several examples--including an endocrine system case study. The criterion of optimality is maximum accuracy of kinetic model parameter estimates. A simple example illustrates why a sequential experiment approach is required; optimal designs depend on the true model parameter values, knowledge of which is usually a primary objective of the experiment, as well as the structure of the model and the measurement error (e.g., assay) variance. The methodology is evaluated from the results of a series of experiments designed to quantify the dynamics of distribution and metabolism of three iodothyronines, T3, T4, and reverse-T3. This analysis indicates that 1) the sequential optimal experiment approach can be effective and efficient in the laboratory, 2) it works in the presence of reasonably controlled biological variation, producing sufficiently robust sampling protocols, and 3) optimal designs can be highly efficient designs in practice, requiring for maximum accuracy a number of blood samples equal to the number of independently adjustable model parameters, no more or less.


Sign in / Sign up

Export Citation Format

Share Document